Re: Are parametric assumptions importat ?

2001-10-18 Thread Glen Barnett


Yes [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...

 Glenn Barnett wrote:

One n in Glen.
 OK, I see what you were getting at - but I still disagree, if it is
 understood that we are talking about large samples.

Your original comment that I was replying to was:
 (1)  normality is rarely important, provided the sample sizes are
 largish. The larger the less important.

And I take some issue with that. I guess it depends on what we mean by large.

 For large effects,
 and large samples, you have far more power than you need; the goal is
 not to get a p-value so small that you need scientific notation to
 express it!

Correct. If the effect is not so large - and many of the people I help deal
with pretty modest effects. Large samples don't always save you - even
with the distribution under the null hypothesis, let alone power.


 If the effect is small, efficiency matters; but a fairly small
 deviation from normality will not have a large effect on efficiency
 either.

Agreed.

 With an effect small enough to be marginally detectable even
 with a large sample, it is likely that a *large* deviation from
 normality will raise much more important questions about which measure
 of location is appropriate.

Yes.

 For smaller samples, your point holds - with the cynical observation
 that the times when it would most benefit us to assume normality are
 precisely the times when we have not got the information that would
 allow us to do so!  I might however quibble that for smaller samples it
 is risky to assume that asymptotic relative efficiency will be a good
 indication of relative efficiency for small N.

In many cases it is. And if the samples are nice and small, even when
it's difficult to do the computations algebraically, we can simulate
from some plausible distributions to look at the properties.

Or do something nonparametric that has good power properties when
the population distribution happens to be close to normal. Permutation
tests, for example.

Glen



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Are parametric assumptions importat ?

2001-10-17 Thread Yes


Glenn Barnett wrote:
 
 But the larger the sample size, the nearer the r.e. will be to the a.r.e.,
 right?
 
 That is, the large sample power properties depend on the original
 distribution, and the CLT does *not* save you from a bad a.r.e.

and again:
 
 If you're sampling from a distribution for which the procedure in question
 has a very low a.r.e., you're essentially throwing away all the information
 in your data by using that procedure. The fact that the sampling distribution
 under the null is reasonable is a useless criterion unless you consider power.

OK, I see what you were getting at - but I still disagree, if it is
understood that we are talking about large samples. For large effects,
and large samples, you have far more power than you need; the goal is
not to get a p-value so small that you need scientific notation to
express it!

If the effect is small, efficiency matters; but a fairly small
deviation from normality will not have a large effect on efficiency
either. With an effect small enough to be marginally detectable even
with a large sample, it is likely that a *large* deviation from
normality will raise much more important questions about which measure
of location is appropriate. 

For smaller samples, your point holds - with the cynical observation
that the times when it would most benefit us to assume normality are
precisely the times when we have not got the information that would
allow us to do so!  I might however quibble that for smaller samples it
is risky to assume that asymptotic relative efficiency will be a good
indication of relative efficiency for small N.

-Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Are parametric assumptions importat ?

2001-10-16 Thread Glen Barnett

Robert J. MacG. Dawson wrote:
 
 Voltolini wrote:
 
  Hi, I am Biologist preparing a class on experiments in ecology including
  a short and simple text about how to use and to choose the most commom
  statistical tests (chi-square, t tests, ANOVA, correlation and regression).
 
  I am planning to include the idea that testing the assumptions for
  parametric tests (normality and homocedasticity) is very important
  to decide between a parametric (e.g., ANOVA) or the non parametric
  test (e. g. Kruskal-Wallis). I am using the Shapiro-Wilk and the Levene
  test for the assumption testing  but..
 
 It's not that simple.  Some points:
 
 (1)  normality is rarely important, provided the sample sizes are
 largish. The larger the less important.

The a.r.e won't change with larger samples, so I disagree here.

 (2)  The Shapiro-Wilk test is far too sensitive with large samples and
 not sensitive enough for small samples. This is not the fault of Shapiro
 and Wilk, it's a flaw in the idea of testing for normality.  The
 question that such a test answers is is there enough evidence to
 conclude that population is even slightly non-normal? whereas what we
 *ought* to be asking  is do we have reason to believe that the
 population is approximately normal?  

Almost. I'd say Is the deviation from normality so large as to
appreciably
affect the inferences we're making?, which largely boils down to things
like - 
are our estimates consistent? (the answer will be yes in any reasonable
situation)
are our standard errors approximately correct?
is our significance level something like what we think it is?
are our power properties reasonable?

You want a measure of the degree of deviation from normality. For
example,
the Shapiro-Francia test is based on the squared correlation in the
normal
scores plot, and as n increases, the test detects smaller deviations
from
normality (which isn't what we want) - but the squared correlation
itself
is a measure of the degree of deviation from normality, and may be a
somewhat
helpful guide. As the sample size gets moderate to large, you can more 
easily asses the kind of deviation from normality and make some better
assessment of the likely effect.

Generally speaking, things like one-way ANOVA aren't affected much by 
moderate skewness or thin or somewhat thickish tails. With heavy
skewness 
or extremely heavy tails you'd be better off with a Kruskal-Wallis.

 Levene's test has the same
 problem, as fairly severe heteroscedasticity can be worked around with a
 conservative assumption of degrees of freedom - which is essentially
 costless if the samples are large.



 In each case, the criterion of detectability at p=0.05 simply does
 not coincide withthe criterion far enough off assumption to matter

Correct

 
 (3) Approximate symmetry is usually important to the *relevance* of
 mean-based testing, no matter how big the sample size is.  Unless the
 sum of the data (or of population elements) is of primary importance, or
 unless the distribution is symmetirc (so that almost all measures of
 location coincide) you should not assume that the mean is a good measure
 of location.  The median need not be either!
 
 (4) Most nonparametric tests make assumptions too. The rank-sum test
 assumes symmetry;

You mean the signed rank test. The rank-sum is the W-M-W...

 the Wilcoxon-Mann-Whitney and Kruskal-Wallis tersts
 are usually taken to assume a pure shift alternative (which is actually
 rather unlikely for an asymmetric distribution.)  In fact symmetry will
 do instead; Potthoff has shown that the WMW is a test for the median if
 distributions are symmetric. If there exists a transformation that
 renders the populations equally-distributed or symmetric (eg, lognormal
 family) they will work, too.

e.g., the test will work for scale shift alternatives (since the -
monotonic -
log transform would render that as a location shift alternative, but of
course
the monotonic transformation won't affect the rank structure, so it
works
with the original data).

Glen


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Are parametric assumptions importat ?

2001-10-16 Thread Robert J. MacG. Dawson


 Glenn Barnett wrote:

  (1)  normality is rarely important, provided the sample sizes are
  largish. The larger the less important.
 
 The a.r.e won't change with larger samples, so I disagree here.


I don't follow. Asymptotic relative efficiency is a limit as sample 
sizes go to infinity; so how does it change or not change with sample
size? Or does that acronym have another expansion that I can't think
of?

I hadn't had efficiency in mind so much as the validity of p-values for
the t test. However, the same point holds for efficiency. For large
samples, I would suggest that the efficiency of both tests is usually
adequate; and a small sample does not tell us enough about the
population distribution to tell much about the relative efficiency
anyway.

When you've got lots of data, you also have a choice of lots of
reliable methods of inference; when you haven't got enough, you also
can't trust the methods that look as though they might help. (Sort of a
metaphor for life, he said cynically.)

You are of course right that when I wrote rank-sum I meant
signed-rank.  (Of course, as Potthoff showed, the rank-sum test *is*
valid under the assumption of symmetry, but that is another story.)

-Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Are parametric assumptions importat ?

2001-10-16 Thread Glen

[EMAIL PROTECTED] (Robert J. MacG. Dawson) wrote in message 
news:[EMAIL PROTECTED]...
 Glenn Barnett wrote:
 
   (1)  normality is rarely important, provided the sample sizes are
   largish. The larger the less important.
  
  The a.r.e won't change with larger samples, so I disagree here.
 
 
   I don't follow. Asymptotic relative efficiency is a limit as sample 
 sizes go to infinity; 

Correct.

 so how does it change or not change with sample
 size? 

Clearly it doesn't change with sample size, since it's asymptotic.

But the larger the sample size, the nearer the r.e. will be to the a.r.e.,
right?

That is, the large sample power properties depend on the original 
distribution, and the CLT does *not* save you from a bad a.r.e.

So, as I said, I disagree with the assertion that:

   (1)  normality is rarely important, provided the sample sizes are
   largish. The larger the less important.

...specifically because, the larger you get, the nearer you get to the a.r.e.

If you're sampling from a distribution for which the procedure in question
has a very low a.r.e., you're essentially throwing away all the information
in your data by using that procedure. The fact that the sampling distribution
under the null is reasonable is a useless criterion unless you consider power.

Glen


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Are parametric assumptions importat ?

2001-10-15 Thread Michael Prager

Voltolini wrote:
  Hi, I am Biologist preparing a class on experiments in ecology including
 a short and simple text about how to use and to choose the most commom
 statistical tests (chi-square, t tests, ANOVA, correlation and regression).
 
 I am planning to include the idea that testing the assumptions for
 parametric tests (normality and homocedasticity) is very important
 to decide between a parametric (e.g., ANOVA) or the non parametric
 test (e. g. Kruskal-Wallis).

Since this is a class on experiments in ecology, how about
having the students do an experiment?  Would a Monte Carlo
simulation of robustness to certain assumptions be too much to
ask of them?  (If so, is there a way you could do some of it to
make the rest easier for them?)  It need not be publishable --
just enough to give them some feeling for the problems involved,
rather than considering the assumptions unimportant except
academically.

I'll never forget my first ecology lab, in which we marked beans
with nail polish and recaptured them from a jar.  The variablity
in ensuing population estimates was an eye-opener, and it
simpact could not have been achieved by a lecture on the
importance of assumptions.

-- 
Mike Prager
NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Are parametric assumptions importat ?

2001-10-14 Thread Rich Ulrich

On 12 Oct 2001 11:14:54 -0700, [EMAIL PROTECTED] (Lise DeShea) wrote:

 Re robustness of the between-subjects ANOVA, I obtained permission from Dr. 
 Rand Wilcox to copy three pages from his book, New Statistical Procedures 
 for the Social Sciences, and place them on a webpage for my students.  He 
 cites research showing that with four groups of 50 observations each and 
 population standard deviations of 4, 1, 1, and 1, the empirical Type I 
 error rate was .088, which is beyond Bradley's liberal limits on sampling 
 variability [.025 to .075].  You can read this excerpt at 

Well, I suggest that a variance difference of 16  to 1 practically
washes out the usual interest in the means.  Isn't that beyond
the pale of the usual illustrations of what is robust?  I may remember
wrong, but it seems to me that Tukey used Monte Carlo with
10% contamination  of a sample, where the contaminant
had excessive variances:  10-fold for the variances?

What can you say about an example like that?

Will a Box-Cox transformation equalize the variances? (no?)
Is there a huge outlier or two?  If not -- if the scores are well
scattered -- all the extreme scores in *both*  directions will be 
in that one group.  And a mean difference  will implicitly
be determined by the scaling:  That is, if you spread out the 
low scores (say), then the group with big variance will have the
lower mean.


 www.uky.edu/~ldesh2/stats.htm -- look for the link to Handout on ANOVA, 
 Sept. 19-20, 2001.  Error rates are much worse when sample sizes are 
 unequal and the smaller groups are paired with the larger sigma -- up to an 
 empirical alpha of .309 when six groups, ranging in size from 6 to 25, have 
 sigmas of 4, 1, 1, 1, 1, 1.
 
 The independent-samples t-test has an inoculation against unequal variances 
 -- make sure you have equal n's of at least 15 per group, and it doesn't 
 matter much what your variances are (Ramsey, 1980, I think).  But the ANOVA 
 doesn't have an inoculation.
 
 I tell my students that the ANOVA is not robust to violation of the equal 
 variances assumption, but that it's a stupid statistic anyway.  All it can 
 say is either, These means are equal, or There's a difference somewhere 
 among these means, but I can't tell you where it is.  I tell them to move 
 along to a good MCP and don't worry about the ANOVA.  Most MCP's don't 
 require a significant F anyway.  And if you have unequal n's, use 
 Games-Howell's MCP to find where the differences are.

Some of us don't like MCPs.  We think that the overall test is
not (or at least, not always) a bad way to start, if a person *really*
can't be more particular about what they want to test.

And if you have unequal Ns, you are stuck with one approximation 
or another, which has to be ugly when the Ns are too unequal; or
else you are stuck with inconsistent statements, where the smaller
difference in means is 'significant' but the larger one is not.
(I am unfamiliar with Games-Howell's MCP.)

Just another opinion.
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Are parametric assumptions importat ?

2001-10-12 Thread Robert J. MacG. Dawson



Voltolini wrote:
 
 Hi, I am Biologist preparing a class on experiments in ecology including
 a short and simple text about how to use and to choose the most commom
 statistical tests (chi-square, t tests, ANOVA, correlation and regression).
 
 I am planning to include the idea that testing the assumptions for
 parametric tests (normality and homocedasticity) is very important
 to decide between a parametric (e.g., ANOVA) or the non parametric
 test (e. g. Kruskal-Wallis). I am using the Shapiro-Wilk and the Levene
 test for the assumption testing  but..

It's not that simple.  Some points:

(1)  normality is rarely important, provided the sample sizes are
largish. The larger the less important.

(2)  The Shapiro-Wilk test is far too sensitive with large samples and
not sensitive enough for small samples. This is not the fault of Shapiro
and Wilk, it's a flaw in the idea of testing for normality.  The
question that such a test answers is is there enough evidence to
conclude that population is even slightly non-normal? whereas what we
*ought* to be asking  is do we have reason to believe that the
population is approximately normal?  Levene's test has the same
problem, as fairly severe heteroscedasticity can be worked around with a
conservative assumption of degrees of freedom - which is essentially
costless if the samples are large. 
In each case, the criterion of detectability at p=0.05 simply does
not coincide withthe criterion far enough off assumption to matter
except sometimes by chance. 

(3) Approximate symmetry is usually important to the *relevance* of
mean-based testing, no matter how big the sample size is.  Unless the
sum of the data (or of population elements) is of primary importance, or
unless the distribution is symmetirc (so that almost all measures of
location coincide) you should not assume that the mean is a good measure
of location.  The median need not be either! 

(4) Most nonparametric tests make assumptions too. The rank-sum test
assumes symmetry; the Wilcoxon-Mann-Whitney and Kruskal-Wallis tersts
are usually taken to assume a pure shift alternative (which is actually
rather unlikely for an asymmetric distribution.)  In fact symmetry will
do instead; Potthoff has shown that the WMW is a test for the median if
distributions are symmetric. If there exists a transformation that
renders the populations equally-distributed or symmetric (eg, lognormal
family) they will work, too. 
In the absence of some such assumption strange things can happen.  I
have shown (preprint available on request) that the WMW test is
intransitive for most Behrens-Fisher families (that is, it can
consistently indicate XYZX with p - 1 as n - infinity), although
the intransitivity is not pronounced for most realistic distributions
and sample sizes.

Note - a Behrens-Fisher family is one differing both by location and by
spread but not by shape.

-Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Are parametric assumptions importat ?

2001-10-12 Thread Dennis Roberts

At 12:59 PM 10/12/01 -0300, you wrote:

While consulting people from depts of statistics about this, a few of them
were arguing that these assumption testing are just a legend and that
there is no problem in not respecting them !

note: you should NOT respect any stat expert who says that there is no 
problem ... and not to worry about the so called classic assumptions

all they are doing is making their consultation with you EASIER for them!

every test you might want to do has 1 or more assumptions about either how 
samples were taken and/or parameters (and other things) about the population

in some cases, violations of one or more of these make little difference in 
the validity of the tests (simulation studies can verify this) ... but, 
in other cases, violations of one or more can lead to serious consequences 
(ie, yielding a much larger type I error rate for example that you thought 
you were working with) ...

there is no easy way to make some blanket statement as to what assumptions 
are important and which are not because ... this depends on a specific test 
(or family of similar tests)

usually, for a particular test ... good texts will enumerate the 
assumptions that are made AND, will give you some mini capsule of the 
impact of violations TO those assumptions




_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Are parametric assumptions importat ?

2001-10-12 Thread Lise DeShea

Re robustness of the between-subjects ANOVA, I obtained
permission from Dr. Rand Wilcox to copy three pages from his book,
New Statistical Procedures for the Social Sciences, and place
them on a webpage for my students. He cites research showing that
with four groups of 50 observations each and population standard
deviations of 4, 1, 1, and 1, the empirical Type I error rate was .088,
which is beyond Bradley's liberal limits on sampling variability [.025 to
.075]. You can read this excerpt at
www.uky.edu/~ldesh2/stats.htm
-- look for the link to Handout on ANOVA, Sept. 19-20, 2001. Error rates are much worse when sample sizes are unequal and the smaller groups are paired with the larger sigma -- up to an empirical alpha of .309 when six groups, ranging in size from 6 to 25, have sigmas of 4, 1, 1, 1, 1, 1.

The independent-samples t-test has an inoculation against unequal variances -- make sure you have equal n's of at least 15 per group, and it doesn't matter much what your variances are (Ramsey, 1980, I think). But the ANOVA doesn't have an inoculation.

I tell my students that the ANOVA is not robust to violation of the equal variances assumption, but that it's a stupid statistic anyway. All it can say is either, These means are equal, or There's a difference somewhere among these means, but I can't tell you where it is. I tell them to move along to a good MCP and don't worry about the ANOVA. Most MCP's don't require a significant F anyway. And if you have unequal n's, use Games-Howell's MCP to find where the differences are.

Cheers.
Lise
~~~
Lise DeShea, Ph.D.
Assistant Professor
Educational and Counseling Psychology Department
University of Kentucky
245 Dickey Hall
Lexington KY 40506
Email: [EMAIL PROTECTED]
Phone: (859) 257-9884
Website for students: www.uky.edu/~ldesh2/stats.htm




RE: Are parametric assumptions importat ?

2001-10-12 Thread Wuensch, Karl L








Lise advised I tell my students that the ANOVA is not
robust to violation of the equal variances assumption, but that it's a stupid
statistic anyway. All it can say is either, These means are may
be nearly equal, or There's a difference somewhere among these
means, but I can't tell you where it is. I tell them to move along
to a good MCP procedure for making multiple comparisons, such as REGWQ and
don't worry about the ANOVA. Most MCP's don't require a significant F
anyway. And if you have unequal n's, use Games-Howell's MCP to find where
the differences are.



Excellent advice, copied to my students (so
they don't hear it only from me).
Now if we could only get our colleagues to listen! ;-)



+

Karl L. Wuensch, Department of Psychology,

East Carolina University, Greenville
NC 27858-4353

Voice: 252-328-4102 Fax:
252-328-6283

[EMAIL PROTECTED] 

http://core.ecu.edu/psyc/wuenschk/klw.htm
















Re: Are parametric assumptions importat ?

2001-10-12 Thread Dennis Roberts

At 01:44 PM 10/12/01 -0400, Lise DeShea wrote:

I tell my students that the ANOVA is not robust to violation of the equal 
variances assumption, but that it's a stupid statistic anyway.  All it can 
say is either, These means are equal, or There's a difference somewhere 
among these means, but I can't tell you where it is.


i don't see that this is anymore stupid that many other null hypothesis 
tests we do ... if you want to think  stupid ... then think that it is 
stupid to think that the null can REALLY be exactly true ... so, the notion 
of doing a TEST to see if you retain or reject ... is rather stupid TOO 
since, we know that the null is NOT exactly true ... before we even do the test


_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=