At 12:49 PM 11/21/01 -0500, Ronny Richardson wrote:
>As I understand it, the Central Limit Theorem (CLT) guarantees that the
>distribution of sample means is normally distributed regardless of the
>distribution of the underlying data as long as the sample size is large
>enough and the population standard deviation is known.

nope ... clt says nothing of the kind
it says that regardless of the shape of the target population ... as n 
increases, the shape of the sampling distribution of means is better and 
better APPROXIMATED by the normal distribution

that is, even if the target population is quite different from normal ... 
if we take decent sized samples ... we can say and not be TOO wrong that 
the sampling distribution of means looks something like a normal ...

here is a quick simulation taking samples of n=50 (based on 10000 samples) 
from a chi square distribution with 1 df

                                .
                              ..::..
                            :::::::::.
                          .::::::::::::.
                         .::::::::::::::..
                       .::::::::::::::::::.
                     ..::::::::::::::::::::::..
               .....:::::::::::::::::::::::::::::............ .
          +---------+---------+---------+---------+---------+-------C51
       0.30      0.60      0.90      1.20      1.50      1.80

even though the chi square distribution is radically + skewed, the sampling 
distribution looks pretty darn close to a normal distribution ... but it 
never will be exactly one ...


it does NOT say that it will GET to and BECOME a normal distribution

if the population is not normal ... the sampling distribution will not be 
normal regardless of n ... but, it could be that your EYES could not tell 
the difference


>It seems to me that most statistics books I see over optimistically invoke
>the CLT not when n is over 30 and the population standard deviation is
>known but anytime n is over 30. This seems inappropriate to me or am I
>overlooking something?

you are mixing two metaphors ...

if we know the sd of the population ... then we know the real sampling 
error ... ie, standard error of the mean ... if we do NOT know the 
population sd, and substitute our estimate of that from the sample, then we 
are only estimating the standard error of the mean

thus ... knowing or not knowing the population sd helps us to know or only 
to estimate the real standard error ... but this is unconnected with shape 
of sampling distribution

shape of sampling distribution is partly a function of shape of population 
AND random sample size ...


>When the population standard deviation is not know (which is almost all the
>time) it seems to me that the Student t (t) distribution is more
>appropriate. However, t requires that the underlying data be normal, or at
>least not too non-normal. My expectations is that most data sets are not
>nearly "normal enough" to make using t appropriate.
>
>So, if we do not know the population standard deviation and we cannot
>assume a normal population, what should we be doing-as opposed to just
>using the CLT as most business statistics books do?
>
>Ronny Richardson
>
>
>Ronny Richardson
>
>
>=================================================================
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>                   http://jse.stat.ncsu.edu/
>=================================================================

_________________________________________________________
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to