Bruce et al,

I tend to think of this issue within the context of ANOVA, with z and t
being special cases... It appears that the discussion is focused on the
numerator of the test statistic...

The F distribution is defined as the ratio of two independent chi-squares
divided by their degrees of freedom, respectively... The CLT takes care of
the numerator... but what about the denominator?  If the populations are
not normal, then the pooled estimate of variance (MSwg) will not be
chi-square distributed... Also, Lindquist (1953) showed that normality of
populations makes the numerator and denominator of the F-statistic
independent, another important condition...

All said and done, ANOVA appears to be robust to violations of normality
when the treatment populations have the same/similar shape...

Bill




On Fri, 19 Jan 2001, Bruce Weaver wrote:

> Dr. Dawson has touched on something here that I've always found a bit 
> puzzling--the oft stated ANOVA assumption that the populations from which 
> you sample must be normal.  I've always had a bit of trouble seeing why 
> that is the case.  I'll try to explain why by approaching it gradually.
> 
> Everyone agrees, I think, that if you have a population of scores that is
> normally distributed, a z-score calculated as X-Xbar/SD can be referred to
> a table of the standard normal distribution. 
> 
> If the "population" from which I pull a score is a population of sample
> means (i.e., the sampling distribution of the mean), I simply change the 
> formula for z or t to:
> 
>          X-bar - mu(X-bar)
> z or t = -----------------
>            SE(X-bar)
> 
> This is the z or t-test for a single sample.  Provided that the sampling
> distribution of X-bar is normal (or near enough), I can still refer z to
> the standard normal, or t to the appropriate t-distribution.  Now this is
> where the CLT comes into play.  It gives the conditions under which the
> sampling distribution of X-bar is normal (or near enough): 
> 
> 1) If the population of raw scores is normal, the sampling distribution 
> of X-bar will be normal for any sample size;
> 2) If the population of raw scores is reasonably symmetrical, a sample 
> size of 30-50 will probably ensure that the sampling distribution of 
> X-bar is close enough to normal
> 3) For a raw score population of just about any shape, the sampling 
> distribution of X-bar converges on the normal as sample size increases.  
> (One source I read suggested that for sample sizes of 300 or greater, the 
> sampling distribution of the mean will be normal for any raw score 
> distribution.  But as Dr. Dawson suggested above, there may be exceptions 
> to this.)
> 
> So if I have a combination of sample size and shape of raw score 
> population that results in a close-enough-to-normal sampling distribution 
> of X-bar, I can refer z to the standard normal, or t to the t-distribution.
> 
> When I teach, I always try to emphasize the general format for any z- or
> t-ratio: 
> 
>          statistic - parameter|H_0 
> z or t = -------------------------
>              SE(statistic)
> 
> If the sampling distribution of the statistic is normal, then z can be 
> referred to the standard normal, and t can be referred to the appropriate 
> t-distribution.
> 
> In the simple case described above, statistic = X-bar, parameter = 
> mu(X-bar), and SE(statistic) = SE(X-bar).  In the case of a t-test for 2 
> independent samples, for example:
> 
>       statistic = X-bar1 - Xbar2
>       paramter = mu(X-bar1) - mu(X-bar2)
>       SE(statistic) = SE(X-bar1 - X-bar2)
> 
> The paramter for this test is often = 0, but not always.  (That's why I 
> always include the right hand portion of the numerator when writing the 
> formula.)
> 
> Now as far as I can see, nothing has changed from the simpler one-sample
> test:  There is no requirement of normality for the two raw score
> populations from which I have sampled.  The only requirements, as I
> understand them are: 
> 
> 1) the raw score populations should be similar in shape (e.g., both 
> fairly symmetrical; or if skewed, both skewed in the same direction)
> 
> 2) the raw score populations should have equal (within reason) variances 
> so that the pooled variance estimate is a reasonable thing to use
> 
> 3) the sampling distribution of (X-bar1 - X-bar2) must be reasonably 
> close to normal
> 
> If both raw score populations are normal, then condition 3 will be met 
> regardless of sample size.  But if both raw score populations are skewed 
> (in the same direction), condition 3 can still be met given large enough 
> samples.  (Having equal sized samples helps wrt violations of point 
> number 2.)
> 
> So, why is it, according to so many textbook authors, that if I add a 3rd
> population to the mix and do a one-way ANOVA, I suddenly need to have
> raw-score populations that are normal?  This had never made a great deal
> of sense to me.  Does the CLT no longer apply because I've added a 3rd
> population?  I think not.  Given large enough samples (and similarly
> shaped populations with more or less equal variances), the F-statistic I
> calucate can still be referred to the appropriate F-distribution, I should
> think. 
> 
> By the way, other good examples are the large sample z-test versions of 
> various non-parametric tests (e.g., Mann-Whitney U).  The important thing 
> for those tests is that the sampling distrubution of the statistic (e.g., 
> the sampling distribution of U) is normal when the numbers are large 
> enough.  I don't recall ever seeing anyone claim that the underlying 
> raw-score populations had to be normal.
> 
> Oops!  This rant ended up being a bit longer than I anticipated.  Looking 
> forward to the comments of others.
> 
> Cheers,
> -- 
> Bruce Weaver
> New e-mail: [EMAIL PROTECTED] (formerly [EMAIL PROTECTED]) 
> Homepage:   http://www.angelfire.com/wv/bwhomedir/
> 
> 
> 
> 
> =================================================================
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>                   http://jse.stat.ncsu.edu/
> =================================================================
> 



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to