I have been reading all the back and forth about hypothesis testing with
some degree of fascination. It's a topic of particular interest to me -
I presented a paper called 'Hypothesis testing and the Westminster
System' at the ISI conference in Helsinki last year.

What I find fascinating is the way that hypothesis testing is regarded
as a technique for finding out 'truth'. Just wave a magic wand, and
truth will appear out of a set of data (and mutter the magic number 0.05
while you are waving it....) Hypothesis testing does nothing of the sort
- of course.

First, hypothesis testing is not restricted to statistics or 'research'.
If you are told some piece of news or gossip, you automatically check it
out for plausibility against your knowledge and experience. (This is
known colloquially as a 'shit filter'.) If you are at a seminar, you
listen to the presenter in the same way. If what you hear is consistent
with your knowledge and experience you accept that it is probably true.
If it is very consistent, you may accept that it IS true. If it is not
consistent, you will question it, conclude that it is probably not true.

IF the news is something that requires some action on your part, you
will act according your assessment of the information.

If the news is important to you, and you cannot decide which way to go
on prior knowledge, you will presumably go and get corroborative
information, hopefully in some sense objective information.

This describes hypothesis testing almost exactly; the difference is a
matter of formalism.

Next - a statistical hypothesis test compares two probability models of
'reality'. If you are interested in the possible difference between two
populations on some numeric variable - for example, between heights of
men and heights of women in some population group - and you choose to
express the difference in terms of means, you are comparing a model
which says
        height of a randomly chosen individual = overall mean + random
fluctuation
with one which says
        height of a randomly chosen individual = overall mean + factor
due to sex + random fluctuation
You then make assumptions about the 'random fluctuations'.

Note that one of these models is embedded within the other - the first
model is a particular case of the second. It is only in this situation
that standard hypothesis testing is applicable.

Neither of these models is 'true' - but either or both may be good
descriptions of the two populations. Good in the sense that if you do
start to randomly select individuals, the results agree acceptably well
with what the model predicts. The role of hypothesis testing is to help
you decide which of these is (PROBABLY) the better model - or if neither
is.

In standard hypothesis testing, one of these models is 'privileged' in
that it is assumed 'true' - that is, if neither model is better, then
you will use the privileged model. In most cases, this means the SIMPLER
model.

More accurately - if you decide that the models are equally good (or
bad) you are saying that you cannot distinguish between them on the
basis of the information and the statistical technique used! To decide
between them you will need either to use a different technique, or more
realistically, some other criterion. For example, in a court case, if
you cannot decide between the models 'Guilty' and 'Innocent', you may
always choose 'Innocent'.

There is no reason why one model is thus privileged. In my paper I
stressed my belief that this approach reflects our (and Fisher's)
cultural heritage rather than any need for it to be that way. One can
for example express the choice as between the embedded model and the
embedded model suggested by the data. For a test on the difference
between two means, this considers the models mu(diff) = 0 and mu(diff) =
xbar. The interesting thing is that this is what we actually do!
although it is dressed up in the language and technique of the general
model mu(diff) not= 0.  (This dressing up is a lot of the reason why
students have trouble with hypothesis testing.)

To conclude: hypothesis testing is NECESSARY. We do it all the time.
Assessment of effect sizes is also necessary, but the two should not be
confused.

Regards,
Alan

--
Alan McLean ([EMAIL PROTECTED])
Department of Econometrics and Business Statistics
Monash University, Caulfield Campus, Melbourne
Tel:  +61 03 9903 2102    Fax: +61 03 9903 2007




===========================================================================
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================

Reply via email to