Voltolini wrote:
> 
> Hi, I am Biologist preparing a class on experiments in ecology including
> a short and simple text about how to use and to choose the most commom
> statistical tests (chi-square, t tests, ANOVA, correlation and regression).
> 
> I am planning to include the idea that testing the assumptions for
> parametric tests (normality and homocedasticity) is very important
> to decide between a parametric (e.g., ANOVA) or the non parametric
> test (e. g. Kruskal-Wallis). I am using the Shapiro-Wilk and the Levene
> test for the assumption testing  but..........

        It's not that simple.  Some points:

        (1)  normality is rarely important, provided the sample sizes are
largish. The larger the less important.

        (2)  The Shapiro-Wilk test is far too sensitive with large samples and
not sensitive enough for small samples. This is not the fault of Shapiro
and Wilk, it's a flaw in the idea of testing for normality.  The
question that such a test answers is "is there enough evidence to
conclude that population is even slightly non-normal?" whereas what we
*ought* to be asking  is "do we have reason to believe that the
population is approximately normal?"  Levene's test has the same
problem, as fairly severe heteroscedasticity can be worked around with a
conservative assumption of degrees of freedom - which is essentially
costless if the samples are large. 
        In each case, the criterion of "detectability at p=0.05" simply does
not coincide withthe criterion "far enough off assumption to matter"
except sometimes by chance.     

        (3) Approximate symmetry is usually important to the *relevance* of
mean-based testing, no matter how big the sample size is.  Unless the
sum of the data (or of population elements) is of primary importance, or
unless the distribution is symmetirc (so that almost all measures of
location coincide) you should not assume that the mean is a good measure
of location.  The median need not be either! 

        (4) Most nonparametric tests make assumptions too. The rank-sum test
assumes symmetry; the Wilcoxon-Mann-Whitney and Kruskal-Wallis tersts
are usually taken to assume a pure shift alternative (which is actually
rather unlikely for an asymmetric distribution.)  In fact symmetry will
do instead; Potthoff has shown that the WMW is a test for the median if
distributions are symmetric. If there exists a transformation that
renders the populations equally-distributed or symmetric (eg, lognormal
family) they will work, too. 
        In the absence of some such assumption strange things can happen.  I
have shown (preprint available on request) that the WMW test is
intransitive for "most" Behrens-Fisher families (that is, it can
consistently indicate X>Y>Z>X with p -> 1 as n -> infinity), although
the intransitivity is not pronounced for most realistic distributions
and sample sizes.

        Note - a Behrens-Fisher family is one differing both by location and by
spread but not by shape.

        -Robert Dawson


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to