Hola!

Why not model the distribution as non-normal, if you think it is not
normal? There are plenty of alternatives to normal distribution, t for
symmetric data with heavier tails, gamma for skew distributions, and so
on, no need making a list here. then you can estimate the parameters
with maximum likelihood, and if you need asymptotic normal theory for
that, with todays computers you can decide for yourselves if a normal
approximation to the likelihood is okay - for few parameters, make a
graph of the log likelihood, and if it looks aproximately parabolic in
the vicinity of the maximum, a normal approxmation will be okay. No need
any more for "rules of thumb". The normal approximation will probably
not be okay globally, but you are only interested in a region of size
proportional to 1/\sqrt(n).  If you cannott graph the loglikelihood, or
it doesnt look very cuadratic, you can bootstrap the distribution of the
maximum likelihood estimator. All of this are cheap tyoday, at least
with a decent language like R or S-plus. To repeat, no need for rules of
thumb, use your computer to find out what is right for your data.


Kjetil Halvorsen

Gaj Vidmar wrote:
> 
> During years of passionate practitioning and round-the-clock chaotic
> learning in the field of applied statistics, I have been desperately longing
> to learn the funadamentals of mathematical statistics, as well as start
> working as statistician. As the later recently came true, I simply had to
> make some notable progress in the former as well. Not to extend this
> unnecessary introduction any further, let me just state that I simply can
> not find adequate words of praise for the role and value of the sci.stat
> newsgroups in the whole story.
> 
> Now, to be even more lucky, this week I've been ill and thus found some
> peace for studying, while at the same time the discussion on CLT and t vs. z
> popped up. As a consequence, please allow me to ask for critiques of this
> brief recapitulation of the issue.***
> 
> (please view in nonproportional font)
> 
> sample size       | distribution(s) | population var | appropriate test
> ----------------------------------------------------------------------------
> ----------
> large (say, N>30) | normal          | known          | z (obvious)
> large             | not normal      | known          | z (CLT takes care of
> numerator)
> small             | not normal      | known          | still z, right??
> large             | normal          | estimated      | t (note 1 below)
> small             | normal          | estimated      | t (the case of
> Student)
> small             | not normal      | estimated      | mostly t (note 2
> below)
> 
> Note 1: z before computer era and also OK due to Slutsky's theorem
> 
> Note 2: t-test is very robust (BTW, is Boneau, 1960, Psychological Bulletin
> vol. 57, referenced and summarised in Quinn and McNemar, Psychological
> Statistics, 4th ed. 1969, with the nice introduction "Boneau, with the
> indispesable help of an electronic computer, ...", still an adequate
> reference?), whereby:
> - skewness, even extreme, is not a big problem
> - two-tailed testing increases robusteness
> - unequal variances are a serious problem with unequal N's with larger
> variance of smaller sample
> 
> Now, what to do if t is inadequate? - This is a whole complex issue in
> itself, so just a few thoughts:
> - in case of extreme skewness, Mann-Whitney is not a good alternative
> (assumes symmetric distrib.), right?
> - so Kolmogorov-Smirnov? But where to find truely continuous variables,
> especially in social sciences? Plus not very powerful with small N, right?
> - so exact permutation test, right? (Permutation Test with General Scores in
> StatXact - the manual says in this special case it's called Pitman's test)
> - solution for the problematic unequal variances case: take random subsample
> of the larger sample of the size of the smaller sample?? Or do kinda
> bootstrap - do it, say, 1000 times and take average obtained p??? - Figured
> these two out by myself, so surely they are utterly wrong. So transformation
> (in real-life cases usually log, or the Box-Cox, which I am yet to
> understand)?
> 
> A big thanks for any comment,
> 
> Gaj Vidmar
> University of Ljubljana, Faculty of Medicine
> Institute of biomedical informatics
> 
> *** I try to be fully aware of how fundamentally wrong is the quest and view
> of statistics as collection of recepies, now matter how diverse and advanced
> they may be; but the fact stays that masses of people still encounter and/or
> are taught statistics precisely in this manner, preferably with the
> collection being very limited, extremely outdated and mainly faulty. And I
> speak from personal experience in its most extreme form here, but in spite
> of having graduated in psychology, I dare at the same time strongestly
> oposing any authority whatsoever and wherever who claims thas this is mostly
> due to or the case of social sciences! - But fortunately, if there is any
> real benefit of Internet to humanity, the wealth of statistics-related
> resources ... Anyhow, putting aside nonproductive debates, let me just do my
> best to make the living case that the possibility that the aforementioned
> approch and circumstances do not always leed to their replication and
> proliferation is not zero. - Or, at least, since we all know that even
> events with zero probability can happen, ... :) - Yes, to exagerate just a
> little, you can start by mastering hand-computed point-biserial correlation,
> point&click transformations in SPPS after regression diagnostics the next
> year, automate logistic regression with interaction terms the following
> year, then speak about somebody named Tufte to your new girlfriend all night
> long, and so forth and so forth, and wouldya believe it, last month came
> derivation of distribution of minimum of n samples taken from exponential
> distribution! And I'll be damned if in 2002 some S-Plus or R simmulations as
> part of serious reserach in statistics don't happen! And yes, I can foresee
> and promise that - health and means permitting - by retirement (i.e., after
> a few decades) even this person doomed to subnormality by the psych degree
> will learn enough mathematics to become a Bayesian :)
> 
> =================================================================
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>                   http://jse.stat.ncsu.edu/
> =================================================================


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to