Hola! Why not model the distribution as non-normal, if you think it is not normal? There are plenty of alternatives to normal distribution, t for symmetric data with heavier tails, gamma for skew distributions, and so on, no need making a list here. then you can estimate the parameters with maximum likelihood, and if you need asymptotic normal theory for that, with todays computers you can decide for yourselves if a normal approximation to the likelihood is okay - for few parameters, make a graph of the log likelihood, and if it looks aproximately parabolic in the vicinity of the maximum, a normal approxmation will be okay. No need any more for "rules of thumb". The normal approximation will probably not be okay globally, but you are only interested in a region of size proportional to 1/\sqrt(n). If you cannott graph the loglikelihood, or it doesnt look very cuadratic, you can bootstrap the distribution of the maximum likelihood estimator. All of this are cheap tyoday, at least with a decent language like R or S-plus. To repeat, no need for rules of thumb, use your computer to find out what is right for your data.
Kjetil Halvorsen Gaj Vidmar wrote: > > During years of passionate practitioning and round-the-clock chaotic > learning in the field of applied statistics, I have been desperately longing > to learn the funadamentals of mathematical statistics, as well as start > working as statistician. As the later recently came true, I simply had to > make some notable progress in the former as well. Not to extend this > unnecessary introduction any further, let me just state that I simply can > not find adequate words of praise for the role and value of the sci.stat > newsgroups in the whole story. > > Now, to be even more lucky, this week I've been ill and thus found some > peace for studying, while at the same time the discussion on CLT and t vs. z > popped up. As a consequence, please allow me to ask for critiques of this > brief recapitulation of the issue.*** > > (please view in nonproportional font) > > sample size | distribution(s) | population var | appropriate test > ---------------------------------------------------------------------------- > ---------- > large (say, N>30) | normal | known | z (obvious) > large | not normal | known | z (CLT takes care of > numerator) > small | not normal | known | still z, right?? > large | normal | estimated | t (note 1 below) > small | normal | estimated | t (the case of > Student) > small | not normal | estimated | mostly t (note 2 > below) > > Note 1: z before computer era and also OK due to Slutsky's theorem > > Note 2: t-test is very robust (BTW, is Boneau, 1960, Psychological Bulletin > vol. 57, referenced and summarised in Quinn and McNemar, Psychological > Statistics, 4th ed. 1969, with the nice introduction "Boneau, with the > indispesable help of an electronic computer, ...", still an adequate > reference?), whereby: > - skewness, even extreme, is not a big problem > - two-tailed testing increases robusteness > - unequal variances are a serious problem with unequal N's with larger > variance of smaller sample > > Now, what to do if t is inadequate? - This is a whole complex issue in > itself, so just a few thoughts: > - in case of extreme skewness, Mann-Whitney is not a good alternative > (assumes symmetric distrib.), right? > - so Kolmogorov-Smirnov? But where to find truely continuous variables, > especially in social sciences? Plus not very powerful with small N, right? > - so exact permutation test, right? (Permutation Test with General Scores in > StatXact - the manual says in this special case it's called Pitman's test) > - solution for the problematic unequal variances case: take random subsample > of the larger sample of the size of the smaller sample?? Or do kinda > bootstrap - do it, say, 1000 times and take average obtained p??? - Figured > these two out by myself, so surely they are utterly wrong. So transformation > (in real-life cases usually log, or the Box-Cox, which I am yet to > understand)? > > A big thanks for any comment, > > Gaj Vidmar > University of Ljubljana, Faculty of Medicine > Institute of biomedical informatics > > *** I try to be fully aware of how fundamentally wrong is the quest and view > of statistics as collection of recepies, now matter how diverse and advanced > they may be; but the fact stays that masses of people still encounter and/or > are taught statistics precisely in this manner, preferably with the > collection being very limited, extremely outdated and mainly faulty. And I > speak from personal experience in its most extreme form here, but in spite > of having graduated in psychology, I dare at the same time strongestly > oposing any authority whatsoever and wherever who claims thas this is mostly > due to or the case of social sciences! - But fortunately, if there is any > real benefit of Internet to humanity, the wealth of statistics-related > resources ... Anyhow, putting aside nonproductive debates, let me just do my > best to make the living case that the possibility that the aforementioned > approch and circumstances do not always leed to their replication and > proliferation is not zero. - Or, at least, since we all know that even > events with zero probability can happen, ... :) - Yes, to exagerate just a > little, you can start by mastering hand-computed point-biserial correlation, > point&click transformations in SPPS after regression diagnostics the next > year, automate logistic regression with interaction terms the following > year, then speak about somebody named Tufte to your new girlfriend all night > long, and so forth and so forth, and wouldya believe it, last month came > derivation of distribution of minimum of n samples taken from exponential > distribution! And I'll be damned if in 2002 some S-Plus or R simmulations as > part of serious reserach in statistics don't happen! And yes, I can foresee > and promise that - health and means permitting - by retirement (i.e., after > a few decades) even this person doomed to subnormality by the psych degree > will learn enough mathematics to become a Bayesian :) > > ================================================================= > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > ================================================================= ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================