[R] Doubt about Student t distribution simulation

2006-08-04 Thread Jose Claudio Faria
Dear R list,

I would like to illustrate the origin of the Student t distribution using R.

So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t 
distribution with (sample.size - 1) degree free, what is wrong with the 
simulation below? I think that the theoretical curve should agree with 
the relative frequencies of the t values calculated:

#== begin options=
# parameters
   mu= 10
   sigma =  5

# size of sample
   n = 3

# repetitions
   nsim = 1

# histogram parameter
   nchist = 150
#== end options===

t   = numeric()
pop = rnorm(1, mean = mu, sd = sigma)

for (i in 1:nsim) {
   amo.i = sample(pop, n, replace = TRUE)
   t[i]  = (mean(amo.i) - mu) / (sigma / sqrt(n))
}

win.graph(w = 5, h = 7)
split.screen(c(2,1))
screen(1)
hist(t,
  main = histogram,
  breaks   = nchist,
  col  = lightgray,
  xlab = '', ylab = Fi,
  font.lab = 2, font = 2)

screen(2)
hist(t,
  probability = T,
  main= 'f.d.p and histogram',
  breaks  = nchist,
  col = 'lightgray',
  xlab= 't', ylab = 'f(t)',
  font.lab= 2, font = 2)

x = t
curve(dt(x, df = n-1), add = T, col = red, lwd = 2)

Many thanks for any help,
___
Jose Claudio Faria
Brasil/Bahia/Ilheus/UESC/DCET
Estatística Experimental/Prof. Adjunto
mails: [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doubt about Student t distribution simulation

2006-08-04 Thread John Fox
Dear Jose,

The problem is that you're using the population standard deviation (sigma)
rather than the sample SD of each sample [i.e., t[i]  = (mean(amo.i) - mu) /
(sd(amo.i) / sqrt(n)) ], so your values should be normally distributed, as
they appear to be.

A couple of smaller points: (1) Even after this correction, you're sampling
from a discrete population (albeit with replacement) and so the values won't
be exactly t-distributed. You could draw the samples directly from N(mu,
sigma) instead. (2) It would be preferable to make a quantile-comparison
plot against the t-distribution, since you'd get a better picture of what's
going on in the tails.

I hope this helps,
 John 


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Jose 
 Claudio Faria
 Sent: Friday, August 04, 2006 3:09 PM
 To: R-help@stat.math.ethz.ch
 Subject: [R] Doubt about Student t distribution simulation
 
 Dear R list,
 
 I would like to illustrate the origin of the Student t 
 distribution using R.
 
 So, if (sample.mean - pop.mean) / standard.error(sample.mean) 
 has t distribution with (sample.size - 1) degree free, what 
 is wrong with the simulation below? I think that the 
 theoretical curve should agree with the relative frequencies 
 of the t values calculated:
 
 #== begin options=
 # parameters
mu= 10
sigma =  5
 
 # size of sample
n = 3
 
 # repetitions
nsim = 1
 
 # histogram parameter
nchist = 150
 #== end options===
 
 t   = numeric()
 pop = rnorm(1, mean = mu, sd = sigma)
 
 for (i in 1:nsim) {
amo.i = sample(pop, n, replace = TRUE)
t[i]  = (mean(amo.i) - mu) / (sigma / sqrt(n)) }
 
 win.graph(w = 5, h = 7)
 split.screen(c(2,1))
 screen(1)
 hist(t,
   main = histogram,
   breaks   = nchist,
   col  = lightgray,
   xlab = '', ylab = Fi,
   font.lab = 2, font = 2)
 
 screen(2)
 hist(t,
   probability = T,
   main= 'f.d.p and histogram',
   breaks  = nchist,
   col = 'lightgray',
   xlab= 't', ylab = 'f(t)',
   font.lab= 2, font = 2)
 
 x = t
 curve(dt(x, df = n-1), add = T, col = red, lwd = 2)
 
 Many thanks for any help,
 ___
 Jose Claudio Faria
 Brasil/Bahia/Ilheus/UESC/DCET
 Estatística Experimental/Prof. Adjunto
 mails: [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doubt about Student t distribution simulation

2006-08-04 Thread Peter Dalgaard
Jose Claudio Faria [EMAIL PROTECTED] writes:

 Dear R list,
 
 I would like to illustrate the origin of the Student t distribution using R.
 
 So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t 
 distribution with (sample.size - 1) degree free, what is wrong with the 
 simulation below? I think that the theoretical curve should agree with 
 the relative frequencies of the t values calculated:
 
 #== begin options=
 # parameters
mu= 10
sigma =  5
 
 # size of sample
n = 3
 
 # repetitions
nsim = 1
 
 # histogram parameter
nchist = 150
 #== end options===
 
 t   = numeric()
 pop = rnorm(1, mean = mu, sd = sigma)
 
 for (i in 1:nsim) {
amo.i = sample(pop, n, replace = TRUE)
t[i]  = (mean(amo.i) - mu) / (sigma / sqrt(n))

At the very least, you need a sample-based standard error: sd(amo.i),
not sigma. Also, resampling from pop is not really what the
t-distribution is based on, but I don't think that matters much.


 }
 
 win.graph(w = 5, h = 7)
 split.screen(c(2,1))
 screen(1)
 hist(t,
   main = histogram,
   breaks   = nchist,
   col  = lightgray,
   xlab = '', ylab = Fi,
   font.lab = 2, font = 2)
 
 screen(2)
 hist(t,
   probability = T,
   main= 'f.d.p and histogram',
   breaks  = nchist,
   col = 'lightgray',
   xlab= 't', ylab = 'f(t)',
   font.lab= 2, font = 2)
 
 x = t
 curve(dt(x, df = n-1), add = T, col = red, lwd = 2)
 
 Many thanks for any help,
 ___
 Jose Claudio Faria
 Brasil/Bahia/Ilheus/UESC/DCET
 Estatística Experimental/Prof. Adjunto
 mails: [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doubt about Student t distribution simulation

2006-08-04 Thread Sundar Dorai-Raj
Hi, Jose/John,

Here's an example to help Jose and highlights John's advice. Also 
includes set.seed which should be included in all simulations posted to 
R-help.

set.seed(42)
mu - 10
sigma -  5
n - 3
nsim - 1
m - matrix(rnorm(n * nsim, mu, sigma), nsim, n)
t - apply(m, 1, function(x) (mean(x) - mu)/(sd(x)/sqrt(n)))

library(lattice)
qqmath(t, distribution = function(x) qt(x, n - 1),
panel = function(x, ...) {
  panel.qqmath(x, col = darkblue, ...)
  panel.qqmathline(x, col = darkred, ...)
})


With n = 3, expect a few outliers.

--sundar


John Fox wrote:
 Dear Jose,
 
 The problem is that you're using the population standard deviation (sigma)
 rather than the sample SD of each sample [i.e., t[i]  = (mean(amo.i) - mu) /
 (sd(amo.i) / sqrt(n)) ], so your values should be normally distributed, as
 they appear to be.
 
 A couple of smaller points: (1) Even after this correction, you're sampling
 from a discrete population (albeit with replacement) and so the values won't
 be exactly t-distributed. You could draw the samples directly from N(mu,
 sigma) instead. (2) It would be preferable to make a quantile-comparison
 plot against the t-distribution, since you'd get a better picture of what's
 going on in the tails.
 
 I hope this helps,
  John 
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox 
  
 
 
-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Jose 
Claudio Faria
Sent: Friday, August 04, 2006 3:09 PM
To: R-help@stat.math.ethz.ch
Subject: [R] Doubt about Student t distribution simulation

Dear R list,

I would like to illustrate the origin of the Student t 
distribution using R.

So, if (sample.mean - pop.mean) / standard.error(sample.mean) 
has t distribution with (sample.size - 1) degree free, what 
is wrong with the simulation below? I think that the 
theoretical curve should agree with the relative frequencies 
of the t values calculated:

#== begin options=
# parameters
   mu= 10
   sigma =  5

# size of sample
   n = 3

# repetitions
   nsim = 1

# histogram parameter
   nchist = 150
#== end options===

t   = numeric()
pop = rnorm(1, mean = mu, sd = sigma)

for (i in 1:nsim) {
   amo.i = sample(pop, n, replace = TRUE)
   t[i]  = (mean(amo.i) - mu) / (sigma / sqrt(n)) }

win.graph(w = 5, h = 7)
split.screen(c(2,1))
screen(1)
hist(t,
  main = histogram,
  breaks   = nchist,
  col  = lightgray,
  xlab = '', ylab = Fi,
  font.lab = 2, font = 2)

screen(2)
hist(t,
  probability = T,
  main= 'f.d.p and histogram',
  breaks  = nchist,
  col = 'lightgray',
  xlab= 't', ylab = 'f(t)',
  font.lab= 2, font = 2)

x = t
curve(dt(x, df = n-1), add = T, col = red, lwd = 2)

Many thanks for any help,
___
Jose Claudio Faria
Brasil/Bahia/Ilheus/UESC/DCET
Estatística Experimental/Prof. Adjunto
mails: [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doubt about Student t distribution simulation

2006-08-04 Thread Jose Claudio Faria
Dears John, Peter and Sundar,

Many thanks for the quick answers!!!

.. and sorry for all..

[]s
___
Jose Claudio Faria
Brasil/Bahia/Ilheus/UESC/DCET
Estatística Experimental/Prof. Adjunto
mails: [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

John Fox escreveu:
 Dear Jose,
 
 The problem is that you're using the population standard deviation (sigma)
 rather than the sample SD of each sample [i.e., t[i]  = (mean(amo.i) - mu) /
 (sd(amo.i) / sqrt(n)) ], so your values should be normally distributed, as
 they appear to be.
 
 A couple of smaller points: (1) Even after this correction, you're sampling
 from a discrete population (albeit with replacement) and so the values won't
 be exactly t-distributed. You could draw the samples directly from N(mu,
 sigma) instead. (2) It would be preferable to make a quantile-comparison
 plot against the t-distribution, since you'd get a better picture of what's
 going on in the tails.
 
 I hope this helps,
  John 
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox 
  
 
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Jose 
 Claudio Faria
 Sent: Friday, August 04, 2006 3:09 PM
 To: R-help@stat.math.ethz.ch
 Subject: [R] Doubt about Student t distribution simulation

 Dear R list,

 I would like to illustrate the origin of the Student t 
 distribution using R.

 So, if (sample.mean - pop.mean) / standard.error(sample.mean) 
 has t distribution with (sample.size - 1) degree free, what 
 is wrong with the simulation below? I think that the 
 theoretical curve should agree with the relative frequencies 
 of the t values calculated:

 #== begin options=
 # parameters
mu= 10
sigma =  5

 # size of sample
n = 3

 # repetitions
nsim = 1

 # histogram parameter
nchist = 150
 #== end options===

 t   = numeric()
 pop = rnorm(1, mean = mu, sd = sigma)

 for (i in 1:nsim) {
amo.i = sample(pop, n, replace = TRUE)
t[i]  = (mean(amo.i) - mu) / (sigma / sqrt(n)) }

 win.graph(w = 5, h = 7)
 split.screen(c(2,1))
 screen(1)
 hist(t,
   main = histogram,
   breaks   = nchist,
   col  = lightgray,
   xlab = '', ylab = Fi,
   font.lab = 2, font = 2)

 screen(2)
 hist(t,
   probability = T,
   main= 'f.d.p and histogram',
   breaks  = nchist,
   col = 'lightgray',
   xlab= 't', ylab = 'f(t)',
   font.lab= 2, font = 2)

 x = t
 curve(dt(x, df = n-1), add = T, col = red, lwd = 2)

 Many thanks for any help,
 ___
 Jose Claudio Faria
 Brasil/Bahia/Ilheus/UESC/DCET
 Estatística Experimental/Prof. Adjunto
 mails: [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 Esta mensagem foi verificada pelo E-mail Protegido Terra.
 Scan engine: McAfee VirusScan / Atualizado em 04/08/2006 / Versão: 4.4.00/4822
 Proteja o seu e-mail Terra: http://mail.terra.com.br/
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Doubt about Student t distribution simulation

2006-08-04 Thread John Fox
Dear Sundar,

Try qq.plot(t, dist=t, df=n-1) from the car package, which include a
95-percent point-wise confidence envelope that helps you judge how extreme
the outliers are relative to expectations.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: Sundar Dorai-Raj [mailto:[EMAIL PROTECTED] 
 Sent: Friday, August 04, 2006 3:27 PM
 To: John Fox
 Cc: [EMAIL PROTECTED]; R-help@stat.math.ethz.ch
 Subject: Re: [R] Doubt about Student t distribution simulation
 
 Hi, Jose/John,
 
 Here's an example to help Jose and highlights John's advice. 
 Also includes set.seed which should be included in all 
 simulations posted to R-help.
 
 set.seed(42)
 mu - 10
 sigma -  5
 n - 3
 nsim - 1
 m - matrix(rnorm(n * nsim, mu, sigma), nsim, n) t - 
 apply(m, 1, function(x) (mean(x) - mu)/(sd(x)/sqrt(n)))
 
 library(lattice)
 qqmath(t, distribution = function(x) qt(x, n - 1),
 panel = function(x, ...) {
   panel.qqmath(x, col = darkblue, ...)
   panel.qqmathline(x, col = darkred, ...)
 })
 
 
 With n = 3, expect a few outliers.
 
 --sundar
 
 
 John Fox wrote:
  Dear Jose,
  
  The problem is that you're using the population standard deviation 
  (sigma) rather than the sample SD of each sample [i.e., t[i]  = 
  (mean(amo.i) - mu) /
  (sd(amo.i) / sqrt(n)) ], so your values should be normally 
  distributed, as they appear to be.
  
  A couple of smaller points: (1) Even after this correction, you're 
  sampling from a discrete population (albeit with 
 replacement) and so 
  the values won't be exactly t-distributed. You could draw 
 the samples 
  directly from N(mu,
  sigma) instead. (2) It would be preferable to make a 
  quantile-comparison plot against the t-distribution, since 
 you'd get a 
  better picture of what's going on in the tails.
  
  I hope this helps,
   John
  
  
  John Fox
  Department of Sociology
  McMaster University
  Hamilton, Ontario
  Canada L8S 4M4
  905-525-9140x23604
  http://socserv.mcmaster.ca/jfox
  
  
  
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Jose Claudio 
 Faria
 Sent: Friday, August 04, 2006 3:09 PM
 To: R-help@stat.math.ethz.ch
 Subject: [R] Doubt about Student t distribution simulation
 
 Dear R list,
 
 I would like to illustrate the origin of the Student t distribution 
 using R.
 
 So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t 
 distribution with (sample.size - 1) degree free, what is wrong with 
 the simulation below? I think that the theoretical curve 
 should agree 
 with the relative frequencies of the t values calculated:
 
 #== begin options=
 # parameters
mu= 10
sigma =  5
 
 # size of sample
n = 3
 
 # repetitions
nsim = 1
 
 # histogram parameter
nchist = 150
 #== end options===
 
 t   = numeric()
 pop = rnorm(1, mean = mu, sd = sigma)
 
 for (i in 1:nsim) {
amo.i = sample(pop, n, replace = TRUE)
t[i]  = (mean(amo.i) - mu) / (sigma / sqrt(n)) }
 
 win.graph(w = 5, h = 7)
 split.screen(c(2,1))
 screen(1)
 hist(t,
   main = histogram,
   breaks   = nchist,
   col  = lightgray,
   xlab = '', ylab = Fi,
   font.lab = 2, font = 2)
 
 screen(2)
 hist(t,
   probability = T,
   main= 'f.d.p and histogram',
   breaks  = nchist,
   col = 'lightgray',
   xlab= 't', ylab = 'f(t)',
   font.lab= 2, font = 2)
 
 x = t
 curve(dt(x, df = n-1), add = T, col = red, lwd = 2)
 
 Many thanks for any help,
 ___
 Jose Claudio Faria
 Brasil/Bahia/Ilheus/UESC/DCET
 Estatística Experimental/Prof. Adjunto
 mails: [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.