Title: RE: When Can We Really Use CLT & Student t

It has been a long time; so if I am wrong, please fan the flames gently.

The derivation of the t distribution is from the ratio of a Normal(0,1) over the square root of a ChiSquare divided by its degrees of freedom.

        t =  [(x-bar - mu) /sigma] / sqrt{[(n-1)S-squared / sigma-squared] / n-1}

which simplifies to  t = [(x-bar - mu) / S]

The CLT allows for the numerator to be approximately Normal(0,1) regardless of the distribution ox X, but does NOT allow for the denominator to be approximately ChiSquare.  This is the rub in using the t distribution when the original distribution of X is UNknown.

What many authors do, I believe, is employ the Law of Large Numbers, and say that for n sufficiently large, the probability approaches 0 that | sigma - s | is different from 0.  That is sigma and s may be interchanged with "minimal" probability of any change.  And so the ratio  [(x-bar - mu) / s] may be interchanged with [(x-bar - mu) / sigma] = Z.  Thus through dual approximations [(x-bar - mu) / s] has an approximate Normal(0,1) distribution.

Howard Kaplon


-----Original Message-----
From: Ronny Richardson [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, November 21, 2001 12:50 PM
To: [EMAIL PROTECTED]
Subject: When Can We Really Use CLT & Student t


As I understand it, the Central Limit Theorem (CLT) guarantees that the
distribution of sample means is normally distributed regardless of the
distribution of the underlying data as long as the sample size is large
enough and the population standard deviation is known.

It seems to me that most statistics books I see over optimistically invoke
the CLT not when n is over 30 and the population standard deviation is
known but anytime n is over 30. This seems inappropriate to me or am I
overlooking something?

When the population standard deviation is not know (which is almost all the
time) it seems to me that the Student t (t) distribution is more
appropriate. However, t requires that the underlying data be normal, or at
least not too non-normal. My expectations is that most data sets are not
nearly "normal enough" to make using t appropriate.

So, if we do not know the population standard deviation and we cannot
assume a normal population, what should we be doing-as opposed to just
using the CLT as most business statistics books do?

Ronny Richardson

Reply via email to