Some more comments on hypothesis testing:

My impression of the ‘hypothesis test controversy’, which seems to exist
primarily in the areas of psychology, education and the like (this is
coming from someone who has been involved in education for all my
working life, but with a scientific/mathematical background), is that it
is at least partly a consequence of the sheer difficulty of carrying out
quantitative research in those fields. A root of the problem seems to be
definitional. I am referring here to the definition of the variables
involved.

In, say, an agricultural research problem it is usually easy enough to
define the variables. For a very simple example, if one is interested in
comparing two strains of a crop for yield, it is very easy to define the
variable of interest. It is reasonably easy to design an experiment to
vary fairly obvious factors and to carry out the experiment.

In the ‘soft’ sciences it is easy enough to identify a characteristic of
interest – the problem is how to measure it. If I am interested in the
relationship between ability in statistics and ethnic background, for
example, I measure the statistics ability using a test of some sort; I
measure ethnic background by defining a set of ethnicities. There are
literally an infinite number of combinations that I can use – infinitely
many different tests, all purporting to measure ‘statistics ability’
(even if I change only one word in a test, I cannot be absolutely
certain of its effect, so it is a different test!), and a very large
number of definitions of ‘ethnicity’.

This is of course not news to anyone reading this. But I am coming to my
point. Suppose I carry out an ‘experiment’ – I apply the test to a group
of people of varying ethnicity, score them on the test and analyse the
results, including a hypothesis test to decide if statistics ability is
related to ethnicity. This test might be a simple ANOVA, or a
Kruskal-Wallis or a chi square test, depending on how I score the test.

As I said earlier, a hypothesis test only helps the user to decide which
of two models is probably better. The point of the above paragraphs is
this: the definition of the models being compared includes the
definition of the variables used. If I reject the null model (a label I
prefer to ‘null hypothesis’) – that is I decide that the alternative
model is (likely to work) better – I am NOT saying that there is a
relationship between statistics ability and ethnicity. All I am saying
is that there is a relationship between the two variables I used.

Please note that the test is not saying this – I am. The test merely
gives me a measure of the strength of the evidence provided by the data
(‘significant at 1%’ or ‘p-value of .0135’); this measure is only
relevant if the models I have used are appropriate. I can use other
evidence (experience is what we usually use! but there may be related
tests that help) to decide if the model is appropriate.

So there are three levels at which judgement is used to make decisions:
 deciding what variables are to be used to measure the characteristics
of interest, and how any relationship between them relates to the
characteristics
 deciding on the model to be used, and how to test it
 deciding the conclusion for the model

In each of these there is evidence we use to help us make the decision.
The hypothesis test itself provides the test for the third.

Finally (at least for the moment) – whether we choose the null or
alternative model, it IS a decision. In research, accepting the null
means that we decide to accept it at least for the moment, so it is not
necessarily a committed decision. On the other hand, if a line of
investigation is not yielding results, the researcher is likely to not
continue on that line – so it is a decision which does lead to an
action.

For non research applications such as in quality control, accepting the
null model quite clearly is a decision to act on the basis of that. For
example, with a bottle filling machine which is periodically tested as
to the mean contents, the null is that the machine is filling the
bottles correctly. Rejecting the null entails stopping the machine;
accepting it means the machine will not be stopped.

Traditional hypothesis testing does incorporate a decision-theoretic
loss function – the p-value.

Regards again,
Alan


--
Alan McLean ([EMAIL PROTECTED])
Department of Econometrics and Business Statistics
Monash University, Caulfield Campus, Melbourne
Tel:  +61 03 9903 2102    Fax: +61 03 9903 2007




===========================================================================
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================

Reply via email to