Re: Are parametric assumptions importat ?
Yes <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > Glenn Barnett wrote: One n in Glen. > OK, I see what you were getting at - but I still disagree, if it is > understood that we are talking about large samples. Your original comment that I was replying to was: > (1) normality is rarely important, provided the sample sizes are > largish. The larger the less important. And I take some issue with that. I guess it depends on what we mean by large. > For large effects, > and large samples, you have far more power than you need; the goal is > not to get a p-value so small that you need scientific notation to > express it! Correct. If the effect is not so large - and many of the people I help deal with pretty modest effects. Large samples don't always save you - even with the distribution under the null hypothesis, let alone power. > > If the effect is small, efficiency matters; but a fairly small > deviation from normality will not have a large effect on efficiency > either. Agreed. > With an effect small enough to be marginally detectable even > with a large sample, it is likely that a *large* deviation from > normality will raise much more important questions about which measure > of location is appropriate. Yes. > For smaller samples, your point holds - with the cynical observation > that the times when it would most benefit us to assume normality are > precisely the times when we have not got the information that would > allow us to do so! I might however quibble that for smaller samples it > is risky to assume that asymptotic relative efficiency will be a good > indication of relative efficiency for small N. In many cases it is. And if the samples are nice and small, even when it's difficult to do the computations algebraically, we can simulate from some plausible distributions to look at the properties. Or do something nonparametric that has good power properties when the population distribution happens to be close to normal. Permutation tests, for example. Glen = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Are parametric assumptions importat ?
Glenn Barnett wrote: > > But the larger the sample size, the nearer the r.e. will be to the a.r.e., > right? > > That is, the large sample power properties depend on the original > distribution, and the CLT does *not* save you from a bad a.r.e. and again: > > If you're sampling from a distribution for which the procedure in question > has a very low a.r.e., you're essentially throwing away all the information > in your data by using that procedure. The fact that the sampling distribution > under the null is reasonable is a useless criterion unless you consider power. OK, I see what you were getting at - but I still disagree, if it is understood that we are talking about large samples. For large effects, and large samples, you have far more power than you need; the goal is not to get a p-value so small that you need scientific notation to express it! If the effect is small, efficiency matters; but a fairly small deviation from normality will not have a large effect on efficiency either. With an effect small enough to be marginally detectable even with a large sample, it is likely that a *large* deviation from normality will raise much more important questions about which measure of location is appropriate. For smaller samples, your point holds - with the cynical observation that the times when it would most benefit us to assume normality are precisely the times when we have not got the information that would allow us to do so! I might however quibble that for smaller samples it is risky to assume that asymptotic relative efficiency will be a good indication of relative efficiency for small N. -Robert Dawson = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Are parametric assumptions importat ?
[EMAIL PROTECTED] (Robert J. MacG. Dawson) wrote in message news:<[EMAIL PROTECTED]>... > Glenn Barnett wrote: > > > > (1) normality is rarely important, provided the sample sizes are > > > largish. The larger the less important. > > > > The a.r.e won't change with larger samples, so I disagree here. > > > I don't follow. Asymptotic relative efficiency is a limit as sample > sizes go to infinity; Correct. > so how does it change or not change "with sample > size"? Clearly it doesn't change with sample size, since it's asymptotic. But the larger the sample size, the nearer the r.e. will be to the a.r.e., right? That is, the large sample power properties depend on the original distribution, and the CLT does *not* save you from a bad a.r.e. So, as I said, I disagree with the assertion that: > > > (1) normality is rarely important, provided the sample sizes are > > > largish. The larger the less important. ...specifically because, the larger you get, the nearer you get to the a.r.e. If you're sampling from a distribution for which the procedure in question has a very low a.r.e., you're essentially throwing away all the information in your data by using that procedure. The fact that the sampling distribution under the null is reasonable is a useless criterion unless you consider power. Glen = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Are parametric assumptions importat ?
Glenn Barnett wrote: > > (1) normality is rarely important, provided the sample sizes are > > largish. The larger the less important. > > The a.r.e won't change with larger samples, so I disagree here. I don't follow. Asymptotic relative efficiency is a limit as sample sizes go to infinity; so how does it change or not change "with sample size"? Or does that acronym have another expansion that I can't think of? I hadn't had efficiency in mind so much as the validity of p-values for the t test. However, the same point holds for efficiency. For large samples, I would suggest that the efficiency of both tests is usually adequate; and a small sample does not tell us enough about the population distribution to tell much about the relative efficiency anyway. When you've got lots of data, you also have a choice of lots of reliable methods of inference; when you haven't got enough, you also can't trust the methods that look as though they might help. ("Sort of a metaphor for life", he said cynically.) You are of course right that when I wrote "rank-sum" I meant "signed-rank". (Of course, as Potthoff showed, the rank-sum test *is* valid under the assumption of symmetry, but that is another story.) -Robert Dawson = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Are parametric assumptions importat ?
"Robert J. MacG. Dawson" wrote: > > Voltolini wrote: > > > > Hi, I am Biologist preparing a class on experiments in ecology including > > a short and simple text about how to use and to choose the most commom > > statistical tests (chi-square, t tests, ANOVA, correlation and regression). > > > > I am planning to include the idea that testing the assumptions for > > parametric tests (normality and homocedasticity) is very important > > to decide between a parametric (e.g., ANOVA) or the non parametric > > test (e. g. Kruskal-Wallis). I am using the Shapiro-Wilk and the Levene > > test for the assumption testing but.. > > It's not that simple. Some points: > > (1) normality is rarely important, provided the sample sizes are > largish. The larger the less important. The a.r.e won't change with larger samples, so I disagree here. > (2) The Shapiro-Wilk test is far too sensitive with large samples and > not sensitive enough for small samples. This is not the fault of Shapiro > and Wilk, it's a flaw in the idea of testing for normality. The > question that such a test answers is "is there enough evidence to > conclude that population is even slightly non-normal?" whereas what we > *ought* to be asking is "do we have reason to believe that the > population is approximately normal?" Almost. I'd say "Is the deviation from normality so large as to appreciably affect the inferences we're making?", which largely boils down to things like - are our estimates consistent? (the answer will be yes in any reasonable situation) are our standard errors approximately correct? is our significance level something like what we think it is? are our power properties reasonable? You want a measure of the degree of deviation from normality. For example, the Shapiro-Francia test is based on the squared correlation in the normal scores plot, and as n increases, the test detects smaller deviations from normality (which isn't what we want) - but the squared correlation itself is a measure of the degree of deviation from normality, and may be a somewhat helpful guide. As the sample size gets moderate to large, you can more easily asses the kind of deviation from normality and make some better assessment of the likely effect. Generally speaking, things like one-way ANOVA aren't affected much by moderate skewness or thin or somewhat thickish tails. With heavy skewness or extremely heavy tails you'd be better off with a Kruskal-Wallis. > Levene's test has the same > problem, as fairly severe heteroscedasticity can be worked around with a > conservative assumption of degrees of freedom - which is essentially > costless if the samples are large. > In each case, the criterion of "detectability at p=0.05" simply does > not coincide withthe criterion "far enough off assumption to matter" Correct > > (3) Approximate symmetry is usually important to the *relevance* of > mean-based testing, no matter how big the sample size is. Unless the > sum of the data (or of population elements) is of primary importance, or > unless the distribution is symmetirc (so that almost all measures of > location coincide) you should not assume that the mean is a good measure > of location. The median need not be either! > > (4) Most nonparametric tests make assumptions too. The rank-sum test > assumes symmetry; You mean the signed rank test. The rank-sum is the W-M-W... > the Wilcoxon-Mann-Whitney and Kruskal-Wallis tersts > are usually taken to assume a pure shift alternative (which is actually > rather unlikely for an asymmetric distribution.) In fact symmetry will > do instead; Potthoff has shown that the WMW is a test for the median if > distributions are symmetric. If there exists a transformation that > renders the populations equally-distributed or symmetric (eg, lognormal > family) they will work, too. e.g., the test will work for scale shift alternatives (since the - monotonic - log transform would render that as a location shift alternative, but of course the monotonic transformation won't affect the rank structure, so it works with the original data). Glen = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Are parametric assumptions importat ?
Voltolini wrote: > >> Hi, I am Biologist preparing a class on experiments in ecology including > a short and simple text about how to use and to choose the most commom > statistical tests (chi-square, t tests, ANOVA, correlation and regression). > > I am planning to include the idea that testing the assumptions for > parametric tests (normality and homocedasticity) is very important > to decide between a parametric (e.g., ANOVA) or the non parametric > test (e. g. Kruskal-Wallis). Since this is a class on experiments in ecology, how about having the students do an experiment? Would a Monte Carlo simulation of robustness to certain assumptions be too much to ask of them? (If so, is there a way you could do some of it to make the rest easier for them?) It need not be publishable -- just enough to give them some feeling for the problems involved, rather than considering the assumptions unimportant except academically. I'll never forget my first ecology lab, in which we marked beans with nail polish and recaptured them from a jar. The variablity in ensuing population estimates was an eye-opener, and it simpact could not have been achieved by a lecture on the importance of assumptions. -- Mike Prager NOAA, Beaufort, NC * Opinions expressed are personal and not represented otherwise. * Any use of tradenames does not constitute a NOAA endorsement. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Are parametric assumptions importat ?
On 12 Oct 2001 11:14:54 -0700, [EMAIL PROTECTED] (Lise DeShea) wrote: > Re robustness of the between-subjects ANOVA, I obtained permission from Dr. > Rand Wilcox to copy three pages from his book, "New Statistical Procedures > for the Social Sciences," and place them on a webpage for my students. He > cites research showing that with four groups of 50 observations each and > population standard deviations of 4, 1, 1, and 1, the empirical Type I > error rate was .088, which is beyond Bradley's liberal limits on sampling > variability [.025 to .075]. You can read this excerpt at Well, I suggest that a variance difference of 16 to 1 practically washes out the usual interest in the means. Isn't that beyond the pale of the usual illustrations of what is robust? I may remember wrong, but it seems to me that Tukey used Monte Carlo with "10% contamination" of a sample, where the contaminant had excessive variances: 10-fold for the variances? What can you say about an example like that? Will a Box-Cox transformation equalize the variances? (no?) Is there a huge outlier or two? If not -- if the scores are well scattered -- all the extreme scores in *both* directions will be in that one group. And a "mean difference" will implicitly be determined by the scaling: That is, if you spread out the low scores (say), then the group with big variance will have the lower mean. > www.uky.edu/~ldesh2/stats.htm -- look for the link to "Handout on ANOVA, > Sept. 19-20, 2001." Error rates are much worse when sample sizes are > unequal and the smaller groups are paired with the larger sigma -- up to an > empirical alpha of .309 when six groups, ranging in size from 6 to 25, have > sigmas of 4, 1, 1, 1, 1, 1. > > The independent-samples t-test has an inoculation against unequal variances > -- make sure you have equal n's of at least 15 per group, and it doesn't > matter much what your variances are (Ramsey, 1980, I think). But the ANOVA > doesn't have an inoculation. > > I tell my students that the ANOVA is not robust to violation of the equal > variances assumption, but that it's a stupid statistic anyway. All it can > say is either, "These means are equal," or "There's a difference somewhere > among these means, but I can't tell you where it is." I tell them to move > along to a good MCP and don't worry about the ANOVA. Most MCP's don't > require a significant F anyway. And if you have unequal n's, use > Games-Howell's MCP to find where the differences are. Some of us don't like MCPs. We think that the overall test is not (or at least, not always) a bad way to start, if a person *really* can't be more particular about what they want to test. And if you have unequal Ns, you are stuck with one approximation or another, which has to be ugly when the Ns are too unequal; or else you are stuck with inconsistent statements, where the smaller difference in means is 'significant' but the larger one is not. (I am unfamiliar with Games-Howell's MCP.) Just another opinion. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Are parametric assumptions importat ?
At 01:44 PM 10/12/01 -0400, Lise DeShea wrote: >I tell my students that the ANOVA is not robust to violation of the equal >variances assumption, but that it's a stupid statistic anyway. All it can >say is either, "These means are equal," or "There's a difference somewhere >among these means, but I can't tell you where it is." i don't see that this is anymore stupid that many other null hypothesis tests we do ... if you want to think " stupid" ... then think that it is stupid to think that the null can REALLY be exactly true ... so, the notion of doing a TEST to see if you retain or reject ... is rather stupid TOO since, we know that the null is NOT exactly true ... before we even do the test _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Are parametric assumptions importat ?
Lise advised "I tell my students that the ANOVA is not robust to violation of the equal variances assumption, but that it's a stupid statistic anyway. All it can say is either, "These means are equal," or "There's a difference somewhere among these means, but I can't tell you where it is." I tell them to move along to a good MCP and don't worry about the ANOVA. Most MCP's don't require a significant F anyway. And if you have unequal n's, use Games-Howell's MCP to find where the differences are." Excellent advice, copied to my students (so they don't hear it only from me). Now if we could only get our colleagues to listen! ;-) + Karl L. Wuensch, Department of Psychology, East Carolina University, Greenville NC 27858-4353 Voice: 252-328-4102 Fax: 252-328-6283 [EMAIL PROTECTED] http://core.ecu.edu/psyc/wuenschk/klw.htm
Re: Are parametric assumptions importat ?
Re robustness of the between-subjects ANOVA, I obtained permission from Dr. Rand Wilcox to copy three pages from his book, "New Statistical Procedures for the Social Sciences," and place them on a webpage for my students. He cites research showing that with four groups of 50 observations each and population standard deviations of 4, 1, 1, and 1, the empirical Type I error rate was .088, which is beyond Bradley's liberal limits on sampling variability [.025 to .075]. You can read this excerpt at www.uky.edu/~ldesh2/stats.htm -- look for the link to "Handout on ANOVA, Sept. 19-20, 2001." Error rates are much worse when sample sizes are unequal and the smaller groups are paired with the larger sigma -- up to an empirical alpha of .309 when six groups, ranging in size from 6 to 25, have sigmas of 4, 1, 1, 1, 1, 1. The independent-samples t-test has an inoculation against unequal variances -- make sure you have equal n's of at least 15 per group, and it doesn't matter much what your variances are (Ramsey, 1980, I think). But the ANOVA doesn't have an inoculation. I tell my students that the ANOVA is not robust to violation of the equal variances assumption, but that it's a stupid statistic anyway. All it can say is either, "These means are equal," or "There's a difference somewhere among these means, but I can't tell you where it is." I tell them to move along to a good MCP and don't worry about the ANOVA. Most MCP's don't require a significant F anyway. And if you have unequal n's, use Games-Howell's MCP to find where the differences are. Cheers. Lise ~~~ Lise DeShea, Ph.D. Assistant Professor Educational and Counseling Psychology Department University of Kentucky 245 Dickey Hall Lexington KY 40506 Email: [EMAIL PROTECTED] Phone: (859) 257-9884 Website for students: www.uky.edu/~ldesh2/stats.htm
Re: Are parametric assumptions importat ?
At 12:59 PM 10/12/01 -0300, you wrote: >While consulting people from depts of statistics about this, a few of them >were arguing that these assumption testing are just a "legend" and that >there is no problem in not respecting them ! note: you should NOT respect any stat expert who says that there is no problem ... and not to worry about the so called "classic" assumptions all they are doing is making their consultation with you EASIER for them! every test you might want to do has 1 or more assumptions about either how samples were taken and/or parameters (and other things) about the population in some cases, violations of one or more of these make little difference in the "validity" of the tests (simulation studies can verify this) ... but, in other cases, violations of one or more can lead to serious consequences (ie, yielding a much larger type I error rate for example that you thought you were working with) ... there is no easy way to make some blanket statement as to what assumptions are important and which are not because ... this depends on a specific test (or family of similar tests) usually, for a particular test ... "good" texts will enumerate the assumptions that are made AND, will give you some mini capsule of the impact of violations TO those assumptions _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Are parametric assumptions importat ?
Voltolini wrote: > > Hi, I am Biologist preparing a class on experiments in ecology including > a short and simple text about how to use and to choose the most commom > statistical tests (chi-square, t tests, ANOVA, correlation and regression). > > I am planning to include the idea that testing the assumptions for > parametric tests (normality and homocedasticity) is very important > to decide between a parametric (e.g., ANOVA) or the non parametric > test (e. g. Kruskal-Wallis). I am using the Shapiro-Wilk and the Levene > test for the assumption testing but.. It's not that simple. Some points: (1) normality is rarely important, provided the sample sizes are largish. The larger the less important. (2) The Shapiro-Wilk test is far too sensitive with large samples and not sensitive enough for small samples. This is not the fault of Shapiro and Wilk, it's a flaw in the idea of testing for normality. The question that such a test answers is "is there enough evidence to conclude that population is even slightly non-normal?" whereas what we *ought* to be asking is "do we have reason to believe that the population is approximately normal?" Levene's test has the same problem, as fairly severe heteroscedasticity can be worked around with a conservative assumption of degrees of freedom - which is essentially costless if the samples are large. In each case, the criterion of "detectability at p=0.05" simply does not coincide withthe criterion "far enough off assumption to matter" except sometimes by chance. (3) Approximate symmetry is usually important to the *relevance* of mean-based testing, no matter how big the sample size is. Unless the sum of the data (or of population elements) is of primary importance, or unless the distribution is symmetirc (so that almost all measures of location coincide) you should not assume that the mean is a good measure of location. The median need not be either! (4) Most nonparametric tests make assumptions too. The rank-sum test assumes symmetry; the Wilcoxon-Mann-Whitney and Kruskal-Wallis tersts are usually taken to assume a pure shift alternative (which is actually rather unlikely for an asymmetric distribution.) In fact symmetry will do instead; Potthoff has shown that the WMW is a test for the median if distributions are symmetric. If there exists a transformation that renders the populations equally-distributed or symmetric (eg, lognormal family) they will work, too. In the absence of some such assumption strange things can happen. I have shown (preprint available on request) that the WMW test is intransitive for "most" Behrens-Fisher families (that is, it can consistently indicate X>Y>Z>X with p -> 1 as n -> infinity), although the intransitivity is not pronounced for most realistic distributions and sample sizes. Note - a Behrens-Fisher family is one differing both by location and by spread but not by shape. -Robert Dawson = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =