Ronny Richardson wrote:

> As I understand it, the Central Limit Theorem (CLT) guarantees that the
> distribution of sample means is normally distributed regardless of the
> distribution of the underlying data as long as the sample size is large
> enough and the population standard deviation is known.

Not quite. The CLT states that the sample mean is _approximately_
normal if the sample size is large enough. This will be true regardless
of whether you know the population standard deviation or not.

> It seems to me that most statistics books I see over optimistically invoke
> the CLT not when n is over 30 and the population standard deviation is
> known but anytime n is over 30. This seems inappropriate to me or am I
> overlooking something?

It is indeed a common shortcut used in many introductory texts to imply
that magic happens whenever n > 30. Again, knowing the standard deviation
has nothing to do with it.

> When the population standard deviation is not know (which is almost all the
> time) it seems to me that the Student t (t) distribution is more
> appropriate. However, t requires that the underlying data be normal, or at
> least not too non-normal. My expectations is that most data sets are not
> nearly "normal enough" to make using t appropriate.

Not really. I suspect that you are muddling two very different concepts:
Applying the Central Limit Theorem and approximating the t-distribution
by a normal distribution. The latter is defensible whenever n > 30 (or
when you have 30 or more degrees of freedom), since most of the time
the underlying distribution is not normal, so you approximate anyway.
The former requires separate justification (which is often not done in
introductory texts). It is unfortunate that the same rule of thumb (n > 30)
is used for both concepts, and that explanations are often not given.

> So, if we do not know the population standard deviation and we cannot
> assume a normal population, what should we be doing-as opposed to just
> using the CLT as most business statistics books do?

That's a difficult question to answer. First, how non-normal is your
underlying population?  Standard tests of hypotheses on the mean are quite
robust with regard to normality. On the other hand, is the sample mean
really a useful measure in your context? Nonparametric methods _may_ be
called for, but it will depend on the situation.

Hope that helps




=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to