"Robert J. MacG. Dawson" wrote:
> 
> Voltolini wrote:
> >
> > Hi, I am Biologist preparing a class on experiments in ecology including
> > a short and simple text about how to use and to choose the most commom
> > statistical tests (chi-square, t tests, ANOVA, correlation and regression).
> >
> > I am planning to include the idea that testing the assumptions for
> > parametric tests (normality and homocedasticity) is very important
> > to decide between a parametric (e.g., ANOVA) or the non parametric
> > test (e. g. Kruskal-Wallis). I am using the Shapiro-Wilk and the Levene
> > test for the assumption testing  but..........
> 
>         It's not that simple.  Some points:
> 
>         (1)  normality is rarely important, provided the sample sizes are
> largish. The larger the less important.

The a.r.e won't change with larger samples, so I disagree here.

>         (2)  The Shapiro-Wilk test is far too sensitive with large samples and
> not sensitive enough for small samples. This is not the fault of Shapiro
> and Wilk, it's a flaw in the idea of testing for normality.  The
> question that such a test answers is "is there enough evidence to
> conclude that population is even slightly non-normal?" whereas what we
> *ought* to be asking  is "do we have reason to believe that the
> population is approximately normal?"  

Almost. I'd say "Is the deviation from normality so large as to
appreciably
affect the inferences we're making?", which largely boils down to things
like - 
are our estimates consistent? (the answer will be yes in any reasonable
situation)
are our standard errors approximately correct?
is our significance level something like what we think it is?
are our power properties reasonable?

You want a measure of the degree of deviation from normality. For
example,
the Shapiro-Francia test is based on the squared correlation in the
normal
scores plot, and as n increases, the test detects smaller deviations
from
normality (which isn't what we want) - but the squared correlation
itself
is a measure of the degree of deviation from normality, and may be a
somewhat
helpful guide. As the sample size gets moderate to large, you can more 
easily asses the kind of deviation from normality and make some better
assessment of the likely effect.

Generally speaking, things like one-way ANOVA aren't affected much by 
moderate skewness or thin or somewhat thickish tails. With heavy
skewness 
or extremely heavy tails you'd be better off with a Kruskal-Wallis.

> Levene's test has the same
> problem, as fairly severe heteroscedasticity can be worked around with a
> conservative assumption of degrees of freedom - which is essentially
> costless if the samples are large.



>         In each case, the criterion of "detectability at p=0.05" simply does
> not coincide withthe criterion "far enough off assumption to matter"

Correct

> 
>         (3) Approximate symmetry is usually important to the *relevance* of
> mean-based testing, no matter how big the sample size is.  Unless the
> sum of the data (or of population elements) is of primary importance, or
> unless the distribution is symmetirc (so that almost all measures of
> location coincide) you should not assume that the mean is a good measure
> of location.  The median need not be either!
> 
>         (4) Most nonparametric tests make assumptions too. The rank-sum test
> assumes symmetry;

You mean the signed rank test. The rank-sum is the W-M-W...

> the Wilcoxon-Mann-Whitney and Kruskal-Wallis tersts
> are usually taken to assume a pure shift alternative (which is actually
> rather unlikely for an asymmetric distribution.)  In fact symmetry will
> do instead; Potthoff has shown that the WMW is a test for the median if
> distributions are symmetric. If there exists a transformation that
> renders the populations equally-distributed or symmetric (eg, lognormal
> family) they will work, too.

e.g., the test will work for scale shift alternatives (since the -
monotonic -
log transform would render that as a location shift alternative, but of
course
the monotonic transformation won't affect the rank structure, so it
works
with the original data).

Glen


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to