[R] Doubt about Student t distribution simulation
Dear R list, I would like to illustrate the origin of the Student t distribution using R. So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t distribution with (sample.size - 1) degree free, what is wrong with the simulation below? I think that the theoretical curve should agree with the relative frequencies of the t values calculated: #== begin options= # parameters mu= 10 sigma = 5 # size of sample n = 3 # repetitions nsim = 1 # histogram parameter nchist = 150 #== end options=== t = numeric() pop = rnorm(1, mean = mu, sd = sigma) for (i in 1:nsim) { amo.i = sample(pop, n, replace = TRUE) t[i] = (mean(amo.i) - mu) / (sigma / sqrt(n)) } win.graph(w = 5, h = 7) split.screen(c(2,1)) screen(1) hist(t, main = histogram, breaks = nchist, col = lightgray, xlab = '', ylab = Fi, font.lab = 2, font = 2) screen(2) hist(t, probability = T, main= 'f.d.p and histogram', breaks = nchist, col = 'lightgray', xlab= 't', ylab = 'f(t)', font.lab= 2, font = 2) x = t curve(dt(x, df = n-1), add = T, col = red, lwd = 2) Many thanks for any help, ___ Jose Claudio Faria Brasil/Bahia/Ilheus/UESC/DCET Estatística Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doubt about Student t distribution simulation
Dear Jose, The problem is that you're using the population standard deviation (sigma) rather than the sample SD of each sample [i.e., t[i] = (mean(amo.i) - mu) / (sd(amo.i) / sqrt(n)) ], so your values should be normally distributed, as they appear to be. A couple of smaller points: (1) Even after this correction, you're sampling from a discrete population (albeit with replacement) and so the values won't be exactly t-distributed. You could draw the samples directly from N(mu, sigma) instead. (2) It would be preferable to make a quantile-comparison plot against the t-distribution, since you'd get a better picture of what's going on in the tails. I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jose Claudio Faria Sent: Friday, August 04, 2006 3:09 PM To: R-help@stat.math.ethz.ch Subject: [R] Doubt about Student t distribution simulation Dear R list, I would like to illustrate the origin of the Student t distribution using R. So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t distribution with (sample.size - 1) degree free, what is wrong with the simulation below? I think that the theoretical curve should agree with the relative frequencies of the t values calculated: #== begin options= # parameters mu= 10 sigma = 5 # size of sample n = 3 # repetitions nsim = 1 # histogram parameter nchist = 150 #== end options=== t = numeric() pop = rnorm(1, mean = mu, sd = sigma) for (i in 1:nsim) { amo.i = sample(pop, n, replace = TRUE) t[i] = (mean(amo.i) - mu) / (sigma / sqrt(n)) } win.graph(w = 5, h = 7) split.screen(c(2,1)) screen(1) hist(t, main = histogram, breaks = nchist, col = lightgray, xlab = '', ylab = Fi, font.lab = 2, font = 2) screen(2) hist(t, probability = T, main= 'f.d.p and histogram', breaks = nchist, col = 'lightgray', xlab= 't', ylab = 'f(t)', font.lab= 2, font = 2) x = t curve(dt(x, df = n-1), add = T, col = red, lwd = 2) Many thanks for any help, ___ Jose Claudio Faria Brasil/Bahia/Ilheus/UESC/DCET Estatística Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doubt about Student t distribution simulation
Jose Claudio Faria [EMAIL PROTECTED] writes: Dear R list, I would like to illustrate the origin of the Student t distribution using R. So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t distribution with (sample.size - 1) degree free, what is wrong with the simulation below? I think that the theoretical curve should agree with the relative frequencies of the t values calculated: #== begin options= # parameters mu= 10 sigma = 5 # size of sample n = 3 # repetitions nsim = 1 # histogram parameter nchist = 150 #== end options=== t = numeric() pop = rnorm(1, mean = mu, sd = sigma) for (i in 1:nsim) { amo.i = sample(pop, n, replace = TRUE) t[i] = (mean(amo.i) - mu) / (sigma / sqrt(n)) At the very least, you need a sample-based standard error: sd(amo.i), not sigma. Also, resampling from pop is not really what the t-distribution is based on, but I don't think that matters much. } win.graph(w = 5, h = 7) split.screen(c(2,1)) screen(1) hist(t, main = histogram, breaks = nchist, col = lightgray, xlab = '', ylab = Fi, font.lab = 2, font = 2) screen(2) hist(t, probability = T, main= 'f.d.p and histogram', breaks = nchist, col = 'lightgray', xlab= 't', ylab = 'f(t)', font.lab= 2, font = 2) x = t curve(dt(x, df = n-1), add = T, col = red, lwd = 2) Many thanks for any help, ___ Jose Claudio Faria Brasil/Bahia/Ilheus/UESC/DCET Estatística Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doubt about Student t distribution simulation
Hi, Jose/John, Here's an example to help Jose and highlights John's advice. Also includes set.seed which should be included in all simulations posted to R-help. set.seed(42) mu - 10 sigma - 5 n - 3 nsim - 1 m - matrix(rnorm(n * nsim, mu, sigma), nsim, n) t - apply(m, 1, function(x) (mean(x) - mu)/(sd(x)/sqrt(n))) library(lattice) qqmath(t, distribution = function(x) qt(x, n - 1), panel = function(x, ...) { panel.qqmath(x, col = darkblue, ...) panel.qqmathline(x, col = darkred, ...) }) With n = 3, expect a few outliers. --sundar John Fox wrote: Dear Jose, The problem is that you're using the population standard deviation (sigma) rather than the sample SD of each sample [i.e., t[i] = (mean(amo.i) - mu) / (sd(amo.i) / sqrt(n)) ], so your values should be normally distributed, as they appear to be. A couple of smaller points: (1) Even after this correction, you're sampling from a discrete population (albeit with replacement) and so the values won't be exactly t-distributed. You could draw the samples directly from N(mu, sigma) instead. (2) It would be preferable to make a quantile-comparison plot against the t-distribution, since you'd get a better picture of what's going on in the tails. I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jose Claudio Faria Sent: Friday, August 04, 2006 3:09 PM To: R-help@stat.math.ethz.ch Subject: [R] Doubt about Student t distribution simulation Dear R list, I would like to illustrate the origin of the Student t distribution using R. So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t distribution with (sample.size - 1) degree free, what is wrong with the simulation below? I think that the theoretical curve should agree with the relative frequencies of the t values calculated: #== begin options= # parameters mu= 10 sigma = 5 # size of sample n = 3 # repetitions nsim = 1 # histogram parameter nchist = 150 #== end options=== t = numeric() pop = rnorm(1, mean = mu, sd = sigma) for (i in 1:nsim) { amo.i = sample(pop, n, replace = TRUE) t[i] = (mean(amo.i) - mu) / (sigma / sqrt(n)) } win.graph(w = 5, h = 7) split.screen(c(2,1)) screen(1) hist(t, main = histogram, breaks = nchist, col = lightgray, xlab = '', ylab = Fi, font.lab = 2, font = 2) screen(2) hist(t, probability = T, main= 'f.d.p and histogram', breaks = nchist, col = 'lightgray', xlab= 't', ylab = 'f(t)', font.lab= 2, font = 2) x = t curve(dt(x, df = n-1), add = T, col = red, lwd = 2) Many thanks for any help, ___ Jose Claudio Faria Brasil/Bahia/Ilheus/UESC/DCET Estatística Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doubt about Student t distribution simulation
Dears John, Peter and Sundar, Many thanks for the quick answers!!! .. and sorry for all.. []s ___ Jose Claudio Faria Brasil/Bahia/Ilheus/UESC/DCET Estatística Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] John Fox escreveu: Dear Jose, The problem is that you're using the population standard deviation (sigma) rather than the sample SD of each sample [i.e., t[i] = (mean(amo.i) - mu) / (sd(amo.i) / sqrt(n)) ], so your values should be normally distributed, as they appear to be. A couple of smaller points: (1) Even after this correction, you're sampling from a discrete population (albeit with replacement) and so the values won't be exactly t-distributed. You could draw the samples directly from N(mu, sigma) instead. (2) It would be preferable to make a quantile-comparison plot against the t-distribution, since you'd get a better picture of what's going on in the tails. I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jose Claudio Faria Sent: Friday, August 04, 2006 3:09 PM To: R-help@stat.math.ethz.ch Subject: [R] Doubt about Student t distribution simulation Dear R list, I would like to illustrate the origin of the Student t distribution using R. So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t distribution with (sample.size - 1) degree free, what is wrong with the simulation below? I think that the theoretical curve should agree with the relative frequencies of the t values calculated: #== begin options= # parameters mu= 10 sigma = 5 # size of sample n = 3 # repetitions nsim = 1 # histogram parameter nchist = 150 #== end options=== t = numeric() pop = rnorm(1, mean = mu, sd = sigma) for (i in 1:nsim) { amo.i = sample(pop, n, replace = TRUE) t[i] = (mean(amo.i) - mu) / (sigma / sqrt(n)) } win.graph(w = 5, h = 7) split.screen(c(2,1)) screen(1) hist(t, main = histogram, breaks = nchist, col = lightgray, xlab = '', ylab = Fi, font.lab = 2, font = 2) screen(2) hist(t, probability = T, main= 'f.d.p and histogram', breaks = nchist, col = 'lightgray', xlab= 't', ylab = 'f(t)', font.lab= 2, font = 2) x = t curve(dt(x, df = n-1), add = T, col = red, lwd = 2) Many thanks for any help, ___ Jose Claudio Faria Brasil/Bahia/Ilheus/UESC/DCET Estatística Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Esta mensagem foi verificada pelo E-mail Protegido Terra. Scan engine: McAfee VirusScan / Atualizado em 04/08/2006 / Versão: 4.4.00/4822 Proteja o seu e-mail Terra: http://mail.terra.com.br/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Doubt about Student t distribution simulation
Dear Sundar, Try qq.plot(t, dist=t, df=n-1) from the car package, which include a 95-percent point-wise confidence envelope that helps you judge how extreme the outliers are relative to expectations. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: Sundar Dorai-Raj [mailto:[EMAIL PROTECTED] Sent: Friday, August 04, 2006 3:27 PM To: John Fox Cc: [EMAIL PROTECTED]; R-help@stat.math.ethz.ch Subject: Re: [R] Doubt about Student t distribution simulation Hi, Jose/John, Here's an example to help Jose and highlights John's advice. Also includes set.seed which should be included in all simulations posted to R-help. set.seed(42) mu - 10 sigma - 5 n - 3 nsim - 1 m - matrix(rnorm(n * nsim, mu, sigma), nsim, n) t - apply(m, 1, function(x) (mean(x) - mu)/(sd(x)/sqrt(n))) library(lattice) qqmath(t, distribution = function(x) qt(x, n - 1), panel = function(x, ...) { panel.qqmath(x, col = darkblue, ...) panel.qqmathline(x, col = darkred, ...) }) With n = 3, expect a few outliers. --sundar John Fox wrote: Dear Jose, The problem is that you're using the population standard deviation (sigma) rather than the sample SD of each sample [i.e., t[i] = (mean(amo.i) - mu) / (sd(amo.i) / sqrt(n)) ], so your values should be normally distributed, as they appear to be. A couple of smaller points: (1) Even after this correction, you're sampling from a discrete population (albeit with replacement) and so the values won't be exactly t-distributed. You could draw the samples directly from N(mu, sigma) instead. (2) It would be preferable to make a quantile-comparison plot against the t-distribution, since you'd get a better picture of what's going on in the tails. I hope this helps, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jose Claudio Faria Sent: Friday, August 04, 2006 3:09 PM To: R-help@stat.math.ethz.ch Subject: [R] Doubt about Student t distribution simulation Dear R list, I would like to illustrate the origin of the Student t distribution using R. So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t distribution with (sample.size - 1) degree free, what is wrong with the simulation below? I think that the theoretical curve should agree with the relative frequencies of the t values calculated: #== begin options= # parameters mu= 10 sigma = 5 # size of sample n = 3 # repetitions nsim = 1 # histogram parameter nchist = 150 #== end options=== t = numeric() pop = rnorm(1, mean = mu, sd = sigma) for (i in 1:nsim) { amo.i = sample(pop, n, replace = TRUE) t[i] = (mean(amo.i) - mu) / (sigma / sqrt(n)) } win.graph(w = 5, h = 7) split.screen(c(2,1)) screen(1) hist(t, main = histogram, breaks = nchist, col = lightgray, xlab = '', ylab = Fi, font.lab = 2, font = 2) screen(2) hist(t, probability = T, main= 'f.d.p and histogram', breaks = nchist, col = 'lightgray', xlab= 't', ylab = 'f(t)', font.lab= 2, font = 2) x = t curve(dt(x, df = n-1), add = T, col = red, lwd = 2) Many thanks for any help, ___ Jose Claudio Faria Brasil/Bahia/Ilheus/UESC/DCET Estatística Experimental/Prof. Adjunto mails: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.