Re: questions on hypothesis

2000-10-23 Thread Jerry Dallal

Herman Rubin wrote:
 
> and until recently,
> scientists believed that their models could be exactly right.
 
but, as you wrote in another context
--
3 Oct 1998 08:07:23 -0500; 
Message-ID:6v57ib$[EMAIL PROTECTED]

"Normality is rarely a tenable hypothesis. Its usefulness as a means
of deriving procedures is that it is often the case, as in
regression, that the resulting procedure is robust in the sense of
having desirable properties without it, while nothing better can be
done uniformly."
-


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-23 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
Robert J. MacG. Dawson <[EMAIL PROTECTED]> wrote:


>[EMAIL PROTECTED] wrote (in part):

<> I'm saying that the entire concept of practical significance is not only
<> subjective, but limited to the extent of current knowledge. You may
<> regard a 0.01% effect at this point in time as a trivial and (virtually)
<> artifactual byproduct of hypothesis testing. But if proper controls are
<> in place, then to do so is tantamount to ignoring an effect that, on the
<> balance of probabilities shouldn't be there if all things were equal. I
<> think we need to be cautious in ascribing effects as having little
<> practical significance and hence using this as an argument against
<> hypothesis testing.

>   "Practical significance" is relevant if and only if there is some
>"practice" involved - that is to say, if a real-world decision is going
>to be based on the data.  Such a decision _must_ be based on current
>knowledge, for want of any other; but if the data are preserved, a
>different decision can be based on them in the future if more is known
>then. 

>   (BTW: If a decision *is* to be made, a risk/benefit approach would seem
>more appropriate. Yes, it probably involves subjective decisions; but
>using fixed-level hypothesis testing to avoid that is a little like
>saying "as I might not choose exactly the right size of screwdriver I
>shall hit the screw with a hammer".  If we do take the risks and
>benefits into account in "choosing a p-value", we are not really doing a
>classical hypothesis test, even though the calculations may coincide.)

>   However, if a real-world decision is *not* going to be made, there is
>usually no need to fit the interpretation of marginal data into the
>Procrustean bed of dichotomous interpretation (which is the
>_raison_d'etre_ of the hypothesis test). Until there is overwhelming
>data one way or the other, our knowledge of the situation is in shades
>of gray, and representing it in black and white is a loss of
>information.

This does not seem to be the way that anything is presented in
the scientific literature.  From the standpoint of collecting
information, p-values are of little, if any, value, as they
contribute little to being able to compute, or even approximate,
the likelihood function, which contains the information in the data.

The use of p-values is a carryover from the mistaken "alchemy"
period of statistics. and it has always been misinterpreted,
even by the good ones.  They tried for answers, before the 
appropriate questions had been asked, and until recently,
scientists believed that their models could be exactly right.

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-19 Thread Donald Burrill

On Thu, 19 Oct 2000 [EMAIL PROTECTED] wrote:

> In article <[EMAIL PROTECTED]>,
>   Peter Lewycky <[EMAIL PROTECTED]> wrote:
> > I've often been called upon to do a t-test with 5 animals in one 
> > group and 4 animals in the other. The power is abysmally low and 
> > rarely do I get a p less than 0.05. One of the difficulties that 
> > medical researchers have is with the notion of power and concomitant 
> > sample size.  I make it a point of calculating power especially 
> > where Ho has not been rejected.  It gives the researcher some comfort 
> > in that his therapy may indeed be effective.  All he needs for 0.8 
> > power is 28,141 rats per group. 

> 
> This has got to be one of the funniest things I have read on a stats
> newsgroup.  I'm sure its not really meant to be funny, 

Dunno why you'd be so certain of that.  I've known Peter for a while, 
and certainly would not characterize him as lacking a sense of humour... 

> but the thought of truckloads upon truckload of rats arriving to 
> satisfy power requirements puts a highly amusing spin on the whole 
> thing. :)
> I am stifling an insane cackle because I know statistics is a serious
> business but really

It may, sometimes, be a serious business;  but that's not to say that 
one should _take_ it seriously.
-- DFB.
 --
 Donald F. Burrill[EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 (603) 535-2597
 Department of Mathematics, Boston University[EMAIL PROTECTED]
 111 Cummington Street, room 261, Boston, MA 02215   (617) 353-5288
 184 Nashua Road, Bedford, NH 03110  (603) 471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-19 Thread Jerry Dallal

Thom Baguley wrote:
> 
> Robert J. MacG. Dawson wrote:
> 
> > [EMAIL PROTECTED] wrote:
> > >
> > > In article <[EMAIL PROTECTED]>,
> > >   Jerry Dallal <[EMAIL PROTECTED]> wrote:
> > >
> > > > (1) statistical significance usually is unrelated to practice
> > > > importance.
> > >
> > > I don't think so. I can think of many examples in which statistical
> > > inference plays an invaluable role in practical applications and
> > > instrumentation, or indeed any "practical" application of a theory etc.
> > > Not just in science, but engineering, e.g aircraft design, studying the
> > > brain, electrical enginerring. Certainly there are examples of
> > > statistical nonsense, e.g. polls, but i wouldn't go so far as to say it
> > > is usually like this.
> >
> > Chris: That's not what Jerry means. What he's saying is that if your
> > sample size is large enough, a difference may be statistically
> > significant (a term which has a very precise meaning, especially to the
> > Apostles of the Holy 5%) but not large enough to be practically
> > important. [A hypothetical very large sample might show, let us say,
> > that a very expensive diet supplement reduced one's chances of a heart
> > attack by 1/10 of 1%.]  Alternatively, in an imperfectly-controlled
> > study, it may show an effect that - whether large enough to be of
> > interest or not - is too small to ascribe a cause to. [A moderately
> > large study might show that some ethnic group has a 1% higher rate of
> > heart attacks, with amargin of error of +- .2% . But we might have, for
> > an effect of this size, no way of telling whether it's due to genes,
> > diet, socioeconomic factors, recreational drugs, or whatever.]
> 
> I'd add that I think Jerry meant "unrelated" in the sense of independent rather
> than irrelevant (Jerry can correct me if I'm wrong). You can  get important
> significant effects, unimportant significant effects, important non-significant
> effects and unimportant non-significant effects.
> 
> For what its worth, practical importantance also depends on many factors other
> than effect size. These include mutability, generalizabilty, cost, and so on.
> 
> Thom

Nothing to correct.  You and Robert explained it fine.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-19 Thread Jerry Dallal

[EMAIL PROTECTED] wrote:
> 
> In article <[EMAIL PROTECTED]>,
>   Jerry Dallal <[EMAIL PROTECTED]> wrote:
> > [EMAIL PROTECTED] wrote:
> > >
> > I
> > > said before, I don't think this can be seen as a problem with
> hypothesis
> > > testing; but it is a matter for hypothesis *testers*.
> >
> > Nothing wrong with this, but it might be a good time to review the
> > question that started this thread, namely,
> >
> > "What are the limitations of hypothesis testing using significance
> > tests
> > based on p-values?"
> 
> Has the thread really wandered that much?
> My argument is basically that the misuse of hypothesis testing, from
> which most of the difficulties appear to arise, shouldn't be seen as a
> *limitation* of hypothesis testing. it just doesn't seem logical.

I read the question as saying, "There are hypotheses to be tested. 
What are the limitations of using significance tests based on P
values to do this?"

I stand by my response.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-19 Thread dennis roberts


>This has got to be one of the funniest things I have read on a stats
>newsgroup. I'm sure its not really meant to be funny, but the thought
>of truckloads upon truckload of rats arriving to satisfy power
>requirements puts a highly amusing spin on the whole thing. :)
>I am stifling an insane cackle because I know statistics is a serious
>business but really
>
>Cheers,
>Chris


whether you need 21,000 rats ... or 45 ... or anything below or above ... 
depends to a large extent on what kind of impact the treatment has ... 
sure, if the treatment or experimental condition has but a trivial (but 
real) effect ... then n has to be relatively large ... whether it be rats 
or humans ... but if the impact is rather large ... you don't need 21,000 EVER

many eons ago ... i was doing some studies comparing Ss who had access to 
calculators and Ss who did not ... on the solution of statistical problems 
and, i measured things like #correct ... and time to completion ... and a 
ratio of the two which i called efficiency  now, for the time measure 
... the difference was so large ... i could have detected this difference 
with ns of 3 or 4 in each group ... NO problem ...

so, it all depends



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-19 Thread Chris . Chambers

In article <[EMAIL PROTECTED]>,
  Peter Lewycky <[EMAIL PROTECTED]> wrote:
> I've often been called upon to do a t-test with 5 animals in one group
> and 4 animals in the other. The power is abysmally low and rarely do I
> get a p less than 0.05. One of the difficulties that medical
researcher
> have is with the notion of power and concomitant sample size. I make
it
> a point of calculating power especially where Ho has not been
rejected.
> It gives the researcher some comfort in that his therapy may indeed be
> effective. All he needs for 0.8 power is 28,141 rats per group.


This has got to be one of the funniest things I have read on a stats
newsgroup. I'm sure its not really meant to be funny, but the thought
of truckloads upon truckload of rats arriving to satisfy power
requirements puts a highly amusing spin on the whole thing. :)
I am stifling an insane cackle because I know statistics is a serious
business but really

Cheers,
Chris








Sent via Deja.com http://www.deja.com/
Before you buy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Peter Lewycky

I've often been called upon to do a t-test with 5 animals in one group
and 4 animals in the other. The power is abysmally low and rarely do I
get a p less than 0.05. One of the difficulties that medical researcher
have is with the notion of power and concomitant sample size. I make it
a point of calculating power especially where Ho has not been rejected.
It gives the researcher some comfort in that his therapy may indeed be
effective. All he needs for 0.8 power is 28,141 rats per group.

[EMAIL PROTECTED] wrote:
> 
> In article <[EMAIL PROTECTED]>,
>   [EMAIL PROTECTED] (dennis roberts) wrote:
> 
> >
> > thus, the idea is that 5% and/or 1% were "chosen" due to the tables
> that
> > were available and not, some logical reasoning for these values?
> >
> > i don't see any logic to the notion that 5% and/or 1% ... have any
> special
> > nor simplification properties compared to say ... 9% or 3%
> >
> > given that it appears that these same values apply today ... that is,
> we
> > have been in a "stuck" mode for all these years ... is not very
> comforting
> > given that 5% and/or 1% were opted for because someone had worked out
> these
> > columns in a table
> 
> I agree, and I think perhaps that although the original work focused on
> the 5% and 1% levels for practical reasons, the tradition persists b/c
> it provides a convenient criterion for journal editors in deciding
> between 'important' and 'unimportant' findings. Consequently, to
> increase the chances of being published, researchers sometimes resort to
> terms like "highly significant" in referring to low p values, which is
> really a quite nebulous statement (if not completely misleading- I shall
> leave that determination to the experts). To me, it seems that less
> emphasis on p values per se and more emphasis on power and effect size
> would increase the general quality and replicability of published data.
> 
> Chris
> 
> Sent via Deja.com http://www.deja.com/
> Before you buy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Chris . Chambers

In article <[EMAIL PROTECTED]>,
  Jerry Dallal <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> >
> I
> > said before, I don't think this can be seen as a problem with
hypothesis
> > testing; but it is a matter for hypothesis *testers*.
>
> Nothing wrong with this, but it might be a good time to review the
> question that started this thread, namely,
>
> "What are the limitations of hypothesis testing using significance
> tests
> based on p-values?"

Has the thread really wandered that much?
My argument is basically that the misuse of hypothesis testing, from
which most of the difficulties appear to arise, shouldn't be seen as a
*limitation* of hypothesis testing. it just doesn't seem logical.

Chris


Sent via Deja.com http://www.deja.com/
Before you buy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
Thom Baguley  <[EMAIL PROTECTED]> wrote:
>Robert J. MacG. Dawson wrote:

>> [EMAIL PROTECTED] wrote:

>> > In article <[EMAIL PROTECTED]>,
>> >   Jerry Dallal <[EMAIL PROTECTED]> wrote:

>> > > (1) statistical significance usually is unrelated to practice
>> > > importance.

>> > I don't think so. I can think of many examples in which statistical
>> > inference plays an invaluable role in practical applications and
>> > instrumentation, or indeed any "practical" application of a theory etc.
>> > Not just in science, but engineering, e.g aircraft design, studying the
>> > brain, electrical enginerring. Certainly there are examples of
>> > statistical nonsense, e.g. polls, but i wouldn't go so far as to say it
>> > is usually like this.

>> Chris: That's not what Jerry means. What he's saying is that if your
>> sample size is large enough, a difference may be statistically
>> significant (a term which has a very precise meaning, especially to the
>> Apostles of the Holy 5%) but not large enough to be practically
>> important. [A hypothetical very large sample might show, let us say,
>> that a very expensive diet supplement reduced one's chances of a heart
>> attack by 1/10 of 1%.]  Alternatively, in an imperfectly-controlled
>> study, it may show an effect that - whether large enough to be of
>> interest or not - is too small to ascribe a cause to. [A moderately
>> large study might show that some ethnic group has a 1% higher rate of
>> heart attacks, with amargin of error of +- .2% . But we might have, for
>> an effect of this size, no way of telling whether it's due to genes,
>> diet, socioeconomic factors, recreational drugs, or whatever.]

>I'd add that I think Jerry meant "unrelated" in the sense of independent rather
>than irrelevant (Jerry can correct me if I'm wrong). You can  get important
>significant effects, unimportant significant effects, important non-significant
>effects and unimportant non-significant effects.

>For what its worth, practical importantance also depends on many factors other
>than effect size. These include mutability, generalizabilty, cost, and so on.

This is another reason for not doing something as bad as
significance tests.  

It has been argued that there may be many possible situations
for action based on observations, and that the observations
need to be summarized so that subsequent investigators can
incorporate the studies.  But the significance level, or the
p-value, does not provide this summary; the likelihood 
function does.  Other than the rather ridiculous statement 
of exactly what is accomplished by p-values, of what use is
it, except religion?  





-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Radford Neal

Thom Baguley  <[EMAIL PROTECTED]> wrote:

>> You can get important significant effects, unimportant significant
>> effects, important non-significant effects and unimportant
>> non-significant effects.


Radford Neal wrote:

>I'll go for three out of four of these.  But "important non-significant 
>effects"?
>
>That would be like saying "I think the benefits of this drug are large
>enough to be important, even though I'm not convinced that it has any
>benefit at all".


Richard M. Barton <[EMAIL PROTECTED]> wrote:

> ***I disagree.  It could indicate lack of power.  If your alpha
> level had been higher, or if you had more subjects, you might have
> found statistically significant results.

Yes, if you did an experiment using more subjects you MIGHT obtain
convincing evidence that the drug really does have a benefit.  Or you
might not.  This is no different from what you could have said even
before you did the first experiment.  This POSSIBILITY doesn't justify
saying that you found an "important but non-significant" effect.  If
you're trying to say that the experiment produced some evidence of a
benefit, and that this evidence is enough to persuade you to recommend
use of the drug, even though it's still possible that there is no real
benefit, then I think that p-values are too crude a tool for what you
want to do.  You need to use Bayesian decision theory.

   Radford Neal


Radford M. Neal   [EMAIL PROTECTED]
Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED]
University of Toronto http://www.cs.utoronto.ca/~radford



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Chris . Chambers

In article <[EMAIL PROTECTED]>,
  [EMAIL PROTECTED] (dennis roberts) wrote:

>
> thus, the idea is that 5% and/or 1% were "chosen" due to the tables
that
> were available and not, some logical reasoning for these values?
>
> i don't see any logic to the notion that 5% and/or 1% ... have any
special
> nor simplification properties compared to say ... 9% or 3%
>
> given that it appears that these same values apply today ... that is,
we
> have been in a "stuck" mode for all these years ... is not very
comforting
> given that 5% and/or 1% were opted for because someone had worked out
these
> columns in a table

I agree, and I think perhaps that although the original work focused on
the 5% and 1% levels for practical reasons, the tradition persists b/c
it provides a convenient criterion for journal editors in deciding
between 'important' and 'unimportant' findings. Consequently, to
increase the chances of being published, researchers sometimes resort to
terms like "highly significant" in referring to low p values, which is
really a quite nebulous statement (if not completely misleading- I shall
leave that determination to the experts). To me, it seems that less
emphasis on p values per se and more emphasis on power and effect size
would increase the general quality and replicability of published data.

Chris


Sent via Deja.com http://www.deja.com/
Before you buy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
Jerry Dallal  <[EMAIL PROTECTED]> wrote:
>Many posters to this thread have used the phrase "practical
>significance".  I find it only confuses things.  Just so all of us
>are
>clear on what we're talking about, might we restrict ourselves to
>the terms "statistical signficance" and "practical importance"?

As most people consider statistical significance to be
a measure of importance, I think practical significance
should be maintained.  The bad term is "statistical
significance".
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Herman Rubin

In article <8sill5$gvf$[EMAIL PROTECTED]>,
 <[EMAIL PROTECTED]> wrote:
>In article <[EMAIL PROTECTED]>,
>  [EMAIL PROTECTED] (Robert J. MacG. Dawson) wrote:


.

>> Fair enough: but I would argue that the right question is rarely "if
>> there were no effect whatsoever, and the following model applied, what
>> is the probility that we would observe a value of the following
>> statistic at least as great as what was observed?" and hence that a
>> hypothesis test is rarely the right way to obtain the right answer.
>> Hypothesis testing does what it sets out to do perfectly well- the
>> only question, in most cases, is why one would want that done.

>I agree with this. From what I gauge from your rephrasing of the
>research question, there seems to be no reason why most research
>questions could not be phrased in this manner. Rather, it seems that the
>problems with hypothesis testing result from people misusing it. Like I
>said before, I don't think this can be seen as a problem with hypothesis
>testing; but it is a matter for hypothesis *testers*.

I disagree.  This may be the case for questions of
philosophical belief, but not for action, and publishing an
article, or even discussion with colleagues, is action.

Robert Dawson is quite right; few who understand what
hypothesis testing actually is doing would use it.  Those
who started out using it, more than two centuries ago, had
the mistaken belief that the significance level was, if not
the probability of the truth of the hypothesis, at least a
good indication of that.  The situation, however, is
generally that the hypothesis, stated in terms of the
distribution of the observations, is at least almost always
false.  So why should the probability that we would observe
a value of the statistic at least as great as what was
observed, from a model which we would not believe anyhow,
even be of importance?

This does not mean that we should not do hypothesis
testing.  The null hypothesis might well be the best useful
approximation available, given the observations.  A more
accurate model need not be more useful.  One must consider
all the consequences.

>> Fair enough... I do not argue with your support of proper controls.
>> However, in the real world, insisting on this would be tantamount to
>> ending experimental research in the social sciences and many
>> disciplines within the life sciences. (You may draw your own
>> conclusions as the advisability of this 

>Certainly, one could argue that anyone who wants to test a hypothesis
>needs to adhere to same guidelines. The fact that this frequently
>doesn't happen is, again, the fault of people not principles. One quick
>glance at the social psychology literature, for example, reveals a
>history replete with low power, inadequate controls and spurious
>conclusions based on doubtful stats. (I'm going to annoy somebody here I
>just know it ).

One must also consider the consequences of the action in other
states of nature.  Starting out with classical statistics
makes it much harder to consider the full problem.

Hypothesis testing has become a religion.  The belief that 
there must be something this simple is what is driving it.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Robert J. MacG. Dawson



"Richard M. Barton" wrote:
> 
> --- Radford Neal wrote:
> In article <[EMAIL PROTECTED]>,
> Thom Baguley  <[EMAIL PROTECTED]> wrote:
> 
> > You can get important significant effects, unimportant significant
> > effects, important non-significant effects and unimportant
> > non-significant effects.
> 
> I'll go for three out of four of these.  But "important non-significant
> effects"?
> 
> That would be like saying "I think the benefits of this drug are large
> enough to be important, even though I'm not convinced that it has any
> benefit at all".

Roughly: and if you agree that "convinced" is stronger than "think"
there is no contradiction here. My guess is that early in the
development of new drugs this is often an accurate description of the
researcher's attitude, and the correct response is to do more research.

However, a better phrasing might be

  "I think the benefits of this drug _might_turn_out_to_be_ large
enough to be important, even though I'm not _yet_ convinced that it has
any benefit at all".

In other words, a reasonable interval estimate for the effect size
contains some values of interest and we need more data. (We do need
some  other evidence that these values are plausible, of course; we
cannot go haring off after every conjecture we can't disprove!)
-Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Richard M. Barton

--- Radford Neal wrote:
In article <[EMAIL PROTECTED]>,
Thom Baguley  <[EMAIL PROTECTED]> wrote:

> You can get important significant effects, unimportant significant
> effects, important non-significant effects and unimportant
> non-significant effects.

I'll go for three out of four of these.  But "important non-significant 
effects"?


That would be like saying "I think the benefits of this drug are large
enough to be important, even though I'm not convinced that it has any
benefit at all".


***I disagree.  It could indicate lack of power.  If your alpha level had been higher, 
or if you had more subjects, you might have found statistically significant results. 



Of course, real conclusions are not black-and-white.  We might not be
convinced that the drug has an effect, but the benefit if it does
might be so large that we'll use it on the off-chance that it does
have an effect.  But if you're using the black-and-white language of
"significant" versus "not significant", it makes no sense to say that
an effect is "important but not significant".

   Radford Neal


Radford M. Neal   [EMAIL PROTECTED]
Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED]
University of Toronto http://www.cs.utoronto.ca/~radford



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=
--- end of quote ---



Richard Barton, Statistical Consultant
Dartmouth College
Peter Kiewit Computing Services
6224 Baker/Berry
Hanover, NH 03755

(603)-646-0255


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Radford Neal

In article <[EMAIL PROTECTED]>,
Thom Baguley  <[EMAIL PROTECTED]> wrote:

> You can get important significant effects, unimportant significant
> effects, important non-significant effects and unimportant
> non-significant effects.

I'll go for three out of four of these.  But "important non-significant 
effects"?

That would be like saying "I think the benefits of this drug are large
enough to be important, even though I'm not convinced that it has any
benefit at all".

Of course, real conclusions are not black-and-white.  We might not be
convinced that the drug has an effect, but the benefit if it does
might be so large that we'll use it on the off-chance that it does
have an effect.  But if you're using the black-and-white language of
"significant" versus "not significant", it makes no sense to say that
an effect is "important but not significant".

   Radford Neal


Radford M. Neal   [EMAIL PROTECTED]
Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED]
University of Toronto http://www.cs.utoronto.ca/~radford



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Thom Baguley

Robert J. MacG. Dawson wrote:

> [EMAIL PROTECTED] wrote:
> >
> > In article <[EMAIL PROTECTED]>,
> >   Jerry Dallal <[EMAIL PROTECTED]> wrote:
> >
> > > (1) statistical significance usually is unrelated to practice
> > > importance.
> >
> > I don't think so. I can think of many examples in which statistical
> > inference plays an invaluable role in practical applications and
> > instrumentation, or indeed any "practical" application of a theory etc.
> > Not just in science, but engineering, e.g aircraft design, studying the
> > brain, electrical enginerring. Certainly there are examples of
> > statistical nonsense, e.g. polls, but i wouldn't go so far as to say it
> > is usually like this.
>
> Chris: That's not what Jerry means. What he's saying is that if your
> sample size is large enough, a difference may be statistically
> significant (a term which has a very precise meaning, especially to the
> Apostles of the Holy 5%) but not large enough to be practically
> important. [A hypothetical very large sample might show, let us say,
> that a very expensive diet supplement reduced one's chances of a heart
> attack by 1/10 of 1%.]  Alternatively, in an imperfectly-controlled
> study, it may show an effect that - whether large enough to be of
> interest or not - is too small to ascribe a cause to. [A moderately
> large study might show that some ethnic group has a 1% higher rate of
> heart attacks, with amargin of error of +- .2% . But we might have, for
> an effect of this size, no way of telling whether it's due to genes,
> diet, socioeconomic factors, recreational drugs, or whatever.]

I'd add that I think Jerry meant "unrelated" in the sense of independent rather
than irrelevant (Jerry can correct me if I'm wrong). You can  get important
significant effects, unimportant significant effects, important non-significant
effects and unimportant non-significant effects.

For what its worth, practical importantance also depends on many factors other
than effect size. These include mutability, generalizabilty, cost, and so on.

Thom




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Jerry Dallal

[EMAIL PROTECTED] wrote:
> 
I
> said before, I don't think this can be seen as a problem with hypothesis
> testing; but it is a matter for hypothesis *testers*.

Nothing wrong with this, but it might be a good time to review the
question that started this thread, namely,

"What are the limitations of hypothesis testing using significance
tests
based on p-values?"


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Jerry Dallal

Many posters to this thread have used the phrase "practical
significance".  I find it only confuses things.  Just so all of us
are
clear on what we're talking about, might we restrict ourselves to
the terms "statistical signficance" and "practical importance"?


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread Robert J. MacG. Dawson



[EMAIL PROTECTED] wrote (in part):

> I'm saying that the entire concept of practical significance is not only
> subjective, but limited to the extent of current knowledge. You may
> regard a 0.01% effect at this point in time as a trivial and (virtually)
> artifactual byproduct of hypothesis testing. But if proper controls are
> in place, then to do so is tantamount to ignoring an effect that, on the
> balance of probabilities shouldn't be there if all things were equal. I
> think we need to be cautious in ascribing effects as having little
> practical significance and hence using this as an argument against
> hypothesis testing.

"Practical significance" is relevant if and only if there is some
"practice" involved - that is to say, if a real-world decision is going
to be based on the data.  Such a decision _must_ be based on current
knowledge, for want of any other; but if the data are preserved, a
different decision can be based on them in the future if more is known
then. 

(BTW: If a decision *is* to be made, a risk/benefit approach would seem
more appropriate. Yes, it probably involves subjective decisions; but
using fixed-level hypothesis testing to avoid that is a little like
saying "as I might not choose exactly the right size of screwdriver I
shall hit the screw with a hammer".  If we do take the risks and
benefits into account in "choosing a p-value", we are not really doing a
classical hypothesis test, even though the calculations may coincide.)

However, if a real-world decision is *not* going to be made, there is
usually no need to fit the interpretation of marginal data into the
Procrustean bed of dichotomous interpretation (which is the
_raison_d'etre_ of the hypothesis test). Until there is overwhelming
data one way or the other, our knowledge of the situation is in shades
of gray, and representing it in black and white is a loss of
information.

-Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-18 Thread dennis roberts

At 05:38 PM 10/17/00 -0700, David Heiser wrote:
>The 5% is a historical arifact, the result of statistics being invented
>before electronic computers were invented.

an artifact is some anomaly of the data ... but, how could 5% be considered 
an artifact DUE to the lack of electronic computers?


>The work in the early 1900's was severely restricted by the fact that
>computations of the cummulative probability distribution involved tedious
>paper and pencil calculations, and later on the use of mechanical
>calculators. Available tables only gave the values for 5% and in some cases
>1%.


thus, the idea is that 5% and/or 1% were "chosen" due to the tables that 
were available and not, some logical reasoning for these values?

i don't see any logic to the notion that 5% and/or 1% ... have any special 
nor simplification properties compared to say ... 9% or 3%

given that it appears that these same values apply today ... that is, we 
have been in a "stuck" mode for all these years ... is not very comforting 
given that 5% and/or 1% were opted for because someone had worked out these 
columns in a table





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-17 Thread David Heiser


- Original Message -
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, October 16, 2000 4:24 PM
Subject: Re: questions on hypothesis


> In article <[EMAIL PROTECTED]>,
> > Chris: That's not what Jerry means. What he's saying is that if
> > your sample size is large enough, a difference may be statistically
> > significant (a term which has a very precise meaning, especially to
> > the Apostles of the Holy 5%) but not large enough to be practically
> > important. [A hypothetical very large sample might show, let us say,
> > that a very expensive diet supplement reduced one's chances of a heart
> > attack by 1/10 of 1%.]
>
> Firstly, I think we can thank publication pressures for the church of
> the Holy 5%. I go with Keppel's approach in suspending judgement for mid
> range significance levels (although we should do this for nonsignificant
> results anyway as they are inherently indeterminant).
-
The 5% is a historical arifact, the result of statistics being invented
before electronic computers were invented.

The work in the early 1900's was severely restricted by the fact that
computations of the cummulative probability distribution involved tedious
paper and pencil calculations, and later on the use of mechanical
calculators. Available tables only gave the values for 5% and in some cases
1%. R.A. Fisher in his publications consistently referred to values well
below 1% as being "convincing". To illustrate the fundamental test methods,
he had to rely on available tables and chose 5% in most of his examples.
However he did not consider 5% as being "scientifically convincing".

DAH



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-17 Thread Chris . Chambers

In article <[EMAIL PROTECTED]>,
  [EMAIL PROTECTED] (Robert J. MacG. Dawson) wrote:
>
>
> > Wrt to your example, it seems that the decision you are making about
> > practical importance is purely subjective.
>
>   What exactly do you mean by this? Are you saying that _my_
> example is purely subjective but that others are not, or that the
> entire concept of practical significance is subjective? And, if so, so
> what? Does it then follow that it is more "scientific" to ignore it
> entirely?

I'm saying that the entire concept of practical significance is not only
subjective, but limited to the extent of current knowledge. You may
regard a 0.01% effect at this point in time as a trivial and (virtually)
artifactual byproduct of hypothesis testing. But if proper controls are
in place, then to do so is tantamount to ignoring an effect that, on the
balance of probabilities shouldn't be there if all things were equal. I
think we need to be cautious in ascribing effects as having little
practical significance and hence using this as an argument against
hypothesis testing.

> Fair enough: but I would argue that the right question is rarely "if
> there were no effect whatsoever, and the following model applied, what
> is the probility that we would observe a value of the following
> statistic at least as great as what was observed?" and hence that a
> hypothesis test is rarely the right way to obtain the right answer.
> Hypothesis testing does what it sets out to do perfectly well- the
> only question, in most cases, is why one would want that done.

I agree with this. From what I gauge from your rephrasing of the
research question, there seems to be no reason why most research
questions could not be phrased in this manner. Rather, it seems that the
problems with hypothesis testing result from people misusing it. Like I
said before, I don't think this can be seen as a problem with hypothesis
testing; but it is a matter for hypothesis *testers*.

> Fair enough... I do not argue with your support of proper controls.
> However, in the real world, insisting on this would be tantamount to
> ending experimental research in the social sciences and many
> disciplines within the life sciences. (You may draw your own
> conclusions as the advisability of this 

Certainly, one could argue that anyone who wants to test a hypothesis
needs to adhere to same guidelines. The fact that this frequently
doesn't happen is, again, the fault of people not principles. One quick
glance at the social psychology literature, for example, reveals a
history replete with low power, inadequate controls and spurious
conclusions based on doubtful stats. (I'm going to annoy somebody here I
just know it ).

> - I will venture an opinion that it ain't a-gonna happen, advisable or
> no.) There are always more experimental variables than we can control
> for, and there are often explanatory variables of interest that it
> would be impossible (eg, ethnic background - unless we can emulate the
> aliens on the Monty Python episode who could turn people into
> Scotsmen!) or unethical to randomize.  The best that  one can hope to
> do in such situations is control for nuisance variables whose effects
> judged likely to produce a large effect, and accept that any small
> effect is of unknowable origin.

I fully agree, although I would amend unknowable origin to _presently_
unknowable origin. And I think this really hits the core of the issue:
small effects, no matter where they come from often turn out to be big
effects (or disappears entirely) when greater knowledge allows us to
refine proper control conditions. I think that is a valuable asset of
hypothesis testing. It demands stringent adherence by its users but it
rewards vigilance.

Chris


Sent via Deja.com http://www.deja.com/
Before you buy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-17 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
dennis roberts <[EMAIL PROTECTED]> wrote:
>At 10:06 PM 10/16/00 +, Peter Lewycky wrote:
>>It happens all the time in medicine. If I can show a p value 0.05 or
>>less the researchers are delighted. Whenever I can't produce a p of 0.05
>>or less they start looking for another statistician and will even
>>withhold a paper from publication. 

>gee ... this is too bad ... someone has sold all these folks in medicine a
>bill of goods ... 

>some possibilities are:

>1. those in medicine are not really taking any statistics courses
>2. those in medicine are not really reading statistical material very carefully
>3. those in medicine have had a bad run of luck WHEN taking data analysis
>courses

It is not just in medicine, but you will find that most of
those who have taken a statistical methods course will have
this attitude.  Furthermore, it is hard for most to get
over this.  Many never do.

Another in this line is the belief that things should be
normally distributed.  Some investigators designed an IQ
test with a report ceiling, because they did not have
enough subjects to use the normal distribution to evaluate
higher IQs.  This IS what people in the applied fields have
"learned" in their statistics courses.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-17 Thread Robert J. MacG. Dawson



[EMAIL PROTECTED] wrote:
> 
> In article <[EMAIL PROTECTED]>,
> >   Chris: That's not what Jerry means. What he's saying is that if
> > your sample size is large enough, a difference may be statistically
> > significant (a term which has a very precise meaning, especially to
> > the Apostles of the Holy 5%) but not large enough to be practically
> > important. [A hypothetical very large sample might show, let us say,
> > that a very expensive diet supplement reduced one's chances of a heart
> > attack by 1/10 of 1%.]
> 
> Firstly, I think we can thank publication pressures for the church of
> the Holy 5%. I go with Keppel's approach in suspending judgement for mid
> range significance levels (although we should do this for nonsignificant
> results anyway as they are inherently indeterminant).
> 
> Wrt to your example, it seems that the decision you are making about
> practical importance is purely subjective. 

What exactly do you mean by this? Are you saying that _my_ example
is purely subjective but that others are not, or that the entire concept
of practical significance is subjective? And, if so, so what? Does it 
then follow that it is more "scientific" to ignore it entirely? 

In any number of alternative
> situations a .01% effect could have major implications, practical and
> theoretical.

It might or it might not. I was referring to a hypothetical situation
in which it seemed reasonable to suppose that it didn't. 

I regard this as less a fundamental flaw with hypothesis
> testing and more a question of expermental design and asking the right
> questions to begin with.

Fair enough: but I would argue that the right question is rarely "if
there were no effect whatsoever, and the following model applied, what
is the probility that we would observe a value of the following
statistic at least as great as what was observed?" and hence that a
hypothesis test is rarely the right way to obtain the right answer.
Hypothesis testing does what it sets out to do perfectly well- the only
question, in most cases, is why one would want that done.

> >Alternatively, in an imperfectly-controlled
> > study, it may show an effect that - whether large enough to be of
> > interest or not - is too small to ascribe a cause to. [A moderately
> > large study might show that some ethnic group has a 1% higher rate of
> > heart attacks, with amargin of error of +- .2% . But we might have, or
> > an effect of this size, no way of telling whether it's due to genes,
> > diet, socioeconomic factors, recreational drugs, or whatever.]
> 
> Surely the ambiguity of this outcome is the result of the lack of
> experimental control. If the effects of genetics, diet etc. are not
> appropriately controlled, it doesn't matter what sample size is
> used - the outcome will be always be equivocal. What it does suggest is
> that, irrespective of sample size, we must be vigilant in controlling
> for extraneous variables. Is it fair to consider this a flaw of
> hypothesis testing? We can hardly blame the tools for not working
> properly if they are not used correctly.

Fair enough... I do not argue with your support of proper controls.
However,
in the real world, insisting on this would be tantamount to ending
experimental
research in the social sciences and many disciplines within the life
sciences.
(You may draw your own conclusions as the advisability of this  -
I will
venture an opinion that it ain't a-gonna happen, advisable or no.)
There are always more experimental variables than we can control for,
and there are often explanatory variables of interest that it would be
impossible (eg, ethnic
background - unless we can emulate the aliens on the Monty Python
episode who
could turn people into Scotsmen!) or unethical to randomize.  The best
that 
one can hope to do in such situations is control for nuisance variables
whose
effects judged likely to produce a large effect, and accept that any
small effect
is of unknowable origin. 

-Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-16 Thread dennis roberts

At 10:06 PM 10/16/00 +, Peter Lewycky wrote:
>It happens all the time in medicine. If I can show a p value 0.05 or
>less the researchers are delighted. Whenever I can't produce a p of 0.05
>or less they start looking for another statistician and will even
>withhold a paper from publication. 

gee ... this is too bad ... someone has sold all these folks in medicine a
bill of goods ... 

some possibilities are:

1. those in medicine are not really taking any statistics courses
2. those in medicine are not really reading statistical material very carefully
3. those in medicine have had a bad run of luck WHEN taking data analysis
courses


so, if someone in medicine looks at a paper with findings, and ... the p
value is ok ... REGARDLESS OF THE DESIGN of the study or the way the
investigation was carried out ... then the findings are meaningful ... but
if the p value is less than that magical cutoff ... even if the study seems
sound ... then it is not worthy of the time of day?


==
dennis roberts, penn state university
educational psychology, 8148632401
http://roberts.ed.psu.edu/users/droberts/drober~1.htm


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-16 Thread chris_david_c

In article <[EMAIL PROTECTED]>,
>   Chris: That's not what Jerry means. What he's saying is that if
> your sample size is large enough, a difference may be statistically
> significant (a term which has a very precise meaning, especially to
> the Apostles of the Holy 5%) but not large enough to be practically
> important. [A hypothetical very large sample might show, let us say,
> that a very expensive diet supplement reduced one's chances of a heart
> attack by 1/10 of 1%.]

Firstly, I think we can thank publication pressures for the church of
the Holy 5%. I go with Keppel's approach in suspending judgement for mid
range significance levels (although we should do this for nonsignificant
results anyway as they are inherently indeterminant).

Wrt to your example, it seems that the decision you are making about
practical importance is purely subjective. In any number of alternative
situations a .01% effect could have major implications, practical and
theoretical.I regard this as less a fundamental flaw with hypothesis
testing and more a question of expermental design and asking the right
questions to begin with.

>Alternatively, in an imperfectly-controlled
> study, it may show an effect that - whether large enough to be of
> interest or not - is too small to ascribe a cause to. [A moderately
> large study might show that some ethnic group has a 1% higher rate of
> heart attacks, with amargin of error of +- .2% . But we might have, or
> an effect of this size, no way of telling whether it's due to genes,
> diet, socioeconomic factors, recreational drugs, or whatever.]

Surely the ambiguity of this outcome is the result of the lack of
experimental control. If the effects of genetics, diet etc. are not
appropriately controlled, it doesn't matter what sample size is
used - the outcome will be always be equivocal. What it does suggest is
that, irrespective of sample size, we must be vigilant in controlling
for extraneous variables. Is it fair to consider this a flaw of
hypothesis testing? We can hardly blame the tools for not working
properly if they are not used correctly.

Chris




Sent via Deja.com http://www.deja.com/
Before you buy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-16 Thread Peter Lewycky

It happens all the time in medicine. If I can show a p value 0.05 or
less the researchers are delighted. Whenever I can't produce a p of 0.05
or less they start looking for another statistician and will even
withhold a paper from publication. 

"Simon, Steve, PhD" wrote:
> 
> In a post to EDSTAT-L, you wrote:
> 
> >I believe you will find that most researchers in the sciences
> >accept the p-value as religion.  In the report of the recent
> >British study on Type 2 diabetes, there was an effect which
> >was stated as "unimportant" because the p-value was .052.
> 
> Do you have a citation for this. It sounds like an excellent teaching
> example.
> 
> Steve Simon, [EMAIL PROTECTED], Standard Disclaimer.
> STATS: STeve's Attempt to Teach Statistics. http://www.cmh.edu/stats
> 
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



RE: questions on hypothesis

2000-10-16 Thread Simon, Steve, PhD

In a post to EDSTAT-L, you wrote:

>I believe you will find that most researchers in the sciences
>accept the p-value as religion.  In the report of the recent
>British study on Type 2 diabetes, there was an effect which
>was stated as "unimportant" because the p-value was .052.

Do you have a citation for this. It sounds like an excellent teaching
example.

Steve Simon, [EMAIL PROTECTED], Standard Disclaimer.
STATS: STeve's Attempt to Teach Statistics. http://www.cmh.edu/stats



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-16 Thread Robert J. MacG. Dawson



[EMAIL PROTECTED] wrote:
> 
> In article <[EMAIL PROTECTED]>,
>   Jerry Dallal <[EMAIL PROTECTED]> wrote:
> 
> > (1) statistical significance usually is unrelated to practice
> > importance.
> 
> I don't think so. I can think of many examples in which statistical
> inference plays an invaluable role in practical applications and
> instrumentation, or indeed any "practical" application of a theory etc.
> Not just in science, but engineering, e.g aircraft design, studying the
> brain, electrical enginerring. Certainly there are examples of
> statistical nonsense, e.g. polls, but i wouldn't go so far as to say it
> is usually like this.

Chris: That's not what Jerry means. What he's saying is that if your
sample size is large enough, a difference may be statistically
significant (a term which has a very precise meaning, especially to the
Apostles of the Holy 5%) but not large enough to be practically
important. [A hypothetical very large sample might show, let us say,
that a very expensive diet supplement reduced one's chances of a heart
attack by 1/10 of 1%.]  Alternatively, in an imperfectly-controlled
study, it may show an effect that - whether large enough to be of
interest or not - is too small to ascribe a cause to. [A moderately
large study might show that some ethnic group has a 1% higher rate of
heart attacks, with amargin of error of +- .2% . But we might have, for
an effect of this size, no way of telling whether it's due to genes,
diet, socioeconomic factors, recreational drugs, or whatever.]

> I *would* argue that without some method to determine the likelihood of
> a difference b/w two conditions you have no chance of determining
> practical importance at all.
> 
> > (2) absence of evidence is not evidence of absence
> 
> Everyone who has done elementary statistics is aware of this edict. But
> what if your power is very high and/or you have very large N? I have
> always found it surprising that we can't turn it around and develop a
> probability that two groups are the same. 

In a frequentist philosophy, we are not allowed to do this, 
because the nature of the two populations has not been randomized in any
well-defined way, so the concept of "probability" does not apply. 
The Bayesian approach, which permits probabilities to be assigned to
statements about parameters, *does* allow us to answer such questions.
However, it depends, in general, on the "prior distribution" of the
parameters that you select. In many cases, this makes it hard to make
definitive statements (though if you have a lot of data it may well be
that all plausible priors produce similar posterior distributions).
However, here - with continuous parameters - the probability that the
parameters of two disjoint groups are _the_same_ is easy to compute -
it's 0. Like the probability of two people being exactly the same
height.
If you want to ask, in a Bayesian framework, for the probability that
two population parameters are equal to within some specified tolerance,
go right ahead. Alternatively, within a frequentist framework, you can
test the hypothesis that the absolute value of the difference is less
than some specified level. 

-Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-15 Thread Rich Ulrich

On Sat, 14 Oct 2000 01:56:32 GMT, [EMAIL PROTECTED]
wrote:

 < snip > 
> > (2) absence of evidence is not evidence of absence
> 
> Everyone who has done elementary statistics is aware of this edict. But
> what if your power is very high and/or you have very large N? I have
> always found it surprising that we can't turn it around and develop a
> probability that two groups are the same. Power or beta is surely
> correlated with the certainty of this approach.
> 
Chris,

What you get when you  "turn it around"  is a set of confidence
limits.   The range of the limits may be arbitrarily narrow, as the N
gets arbitrarily large.

"Bioequivalence"  is a live issue for the (U.S.) Food and Drug
Administration.  Is a generic version of a drug "the same" as the
patented version?  Back in the 1970s ( I think I have this straight),
it was enough to have a "suitably powerful study" and fail to show
that it is different.  What was Officially acceptable was revised in
the 1980s to use Confidence limits; and I think what ought to comprise
acceptable studies is under discussion again, right now.  (I say
"officially" because it is my impression that actual decisions were
made by committees, and were not held to that standard.)

But look at how large an N it takes to show that 3% mortality for a
treatment is different from 5%, or from 4% -  just as the marginal
test, never mind having POWER.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-14 Thread David Heiser


- Original Message -
From: Ting Ting <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, October 13, 2000 10:57 PM
Subject: Re: questions on hypothesis


> >
> > A good example of a simple situation for which exact P values are
> > unavailable is the Behrens-Fisher problem (testing the equality of
> > normal means from normal populations with unequal variances).  Some
> > might say we have approximate solutions that are good enough.
.
I see this as an imprecise statement of a hypothesis.

>From set theory, I can see several different logical constructs, each of
which would arrive at a different probability distribution, and consequently
different p values. It boils down to just what is the hypothesis on the
generator of the data. Is it a statement of logical equality or the value of
a difference function.

Does sample "A" come from process "a" and sample "B" come from process "b",
or do both samples come from process "c"?

The problem is simplified when process "a" and process "b" are known. When
process "a" and "b" are not known, we have that Fisher problem of defining a
set of all "a" parameter values <= to a given p1 value and defining a set of
all "b" parameter values <= to a given p2 value. When the processes are one
parameter processes, every thing is straightforward. (Fisher in his book-set
very nicely used one parameter distributions to illustrate his ideas.)
However for a two parameter process, the Fisher-Berens problem states an
equality (intersection) of mean values and a disjoint of variance values,
which cannot be analytically combined (given the normal distribution
function) in terms of a single p value.

Consequently, one finds in the textbooks, all the different approaches to
establish a "c" process, for which tests can be constructed to determine if
"A" and "B" come from the process "c" or not. The hypothesis being tested is
then based on process "c", not on the original idea.

DAH




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-14 Thread Donald Burrill

On Sat, 14 Oct 2000 [EMAIL PROTECTED] wrote, inter alia:

> I *would* argue that without some method to determine the likelihood of 
> a difference b/w two conditions you have no chance of determining
> practical importance at all.

But hypothesis testing procedures do not establish any such likelihood.  
What they may establish is the likelihood of observing data like these, 
IF the null hypothesis be true.  That is not "the likelihood of a 
difference between two [or more] conditions".

>  But what if your power is very high and/or you have very large N?  I 
> have always found it surprising that we can't turn it around and 
> develop a probability that two groups are the same.  Power or beta is 
> surely correlated with the certainty of this approach.
 
Again, we cannot "determine a probability that two [or more] groups are 
the same".  What we can do is determine the probability (beta) that we 
could NOT reject the null hypothesis, IF the true state of affairs be a 
specified degree of departure from the null hypothesis [of, presumably, 
no difference].
(Or, if you prefer, the probability (power) that we COULD reject 
the null, given that degree of departure from it.)
-- DFB.
 --
 Donald F. Burrill[EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 (603) 535-2597
 Department of Mathematics, Boston University[EMAIL PROTECTED]
 111 Cummington Street, room 261, Boston, MA 02215   (617) 353-5288
 184 Nashua Road, Bedford, NH 03110  (603) 471-7128


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-14 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
San  <[EMAIL PROTECTED]> wrote:
>Would there be some cases which the p-value are so difficult to find
>that it's nearly impossible? Is this a kind of limitation to the
>hypothesis testing using p-value? Is there any substitute for the
>p-value? 
>Thx for ur reply.


This is often the case, but I do not believe it is what 
is being referred to.

One can have very low p-values and very little importance,
and high p-values and great importance.  When it comes to
deciding what action to take, the p-value without other
information may even be misleading.

For example, suppose there are two treatments for a disease.
One is significant at a p-value of .001, and the other
gives a "nonsignificant" p-value of .2.  From the data, I
might very well prefer the one which is not significant.


>Jerry Dallal wrote:

>> I wrote:

>> > (1) statistical significance usually is unrelated to practice
>> > importance.

>> I meant to type "practical importance".


-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-14 Thread Herman Rubin

In article <8s8egf$n5f$[EMAIL PROTECTED]>,
 <[EMAIL PROTECTED]> wrote:
>In article <[EMAIL PROTECTED]>,
>  Jerry Dallal <[EMAIL PROTECTED]> wrote:

>> (1) statistical significance usually is unrelated to practice
>> importance.

>I don't think so. I can think of many examples in which statistical
>inference plays an invaluable role in practical applications and
>instrumentation, or indeed any "practical" application of a theory etc.
>Not just in science, but engineering, e.g aircraft design, studying the
>brain, electrical enginerring. Certainly there are examples of
>statistical nonsense, e.g. polls, but i wouldn't go so far as to say it
>is usually like this.
>I *would* argue that without some method to determine the likelihood of
>a difference b/w two conditions you have no chance of determining
>practical importance at all.

>> (2) absence of evidence is not evidence of absence

>Everyone who has done elementary statistics is aware of this edict. But
>what if your power is very high and/or you have very large N? I have
>always found it surprising that we can't turn it around and develop a
>probability that two groups are the same. Power or beta is surely
>correlated with the certainty of this approach.

I believe you will find that most researchers in the sciences
accept the p-value as religion.  In the report of the recent
British study on Type 2 diabetes, there was an effect which
was stated as "unimportant" because the p-value was .052.

The likelihood function contains all the information in the
data for the purpose of making a decision.  Without having
extraneous information, like the sample size and which test
is being performed, and more, the p-value cannot be obtained
with any amount of work.  And one needs even more to get the
likelihood function from the p-value.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-14 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
Ting Ting  <[EMAIL PROTECTED]> wrote:

>> A good example of a simple situation for which exact P values are
>> unavailable is the Behrens-Fisher problem (testing the equality of
>> normal means from normal populations with unequal variances).  Some
>> might say we have approximate solutions that are good enough.

>would u pls give some more detail examples about this?
>thx

I can give you simple randomized procedures with easily
computable exact p-values, and which only lose in degrees
of freedom compared to known variances.  Also, Linnik has
shown the existence of non-randomized procedures which can
do this.

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-14 Thread Donald Macnaughton

Gene Gallagher wrote:
> Can someone recommend a good book on the history of statistics,
> especially one focusing on Fisher's accomplishments.  Fisher's
> contributions and prickly personality are dealt with tangen-
> tially in Provine's wonderful biography of Sewall Wright.  
> Surely, Fisher has merited one or more scholarly biographies of
> his own.

See 

Box, Joan Fisher (1978), R. A. FISHER: THE LIFE OF A SCIENTIST.
   New York: John Wiley.

Joan Fisher Box is Sir Ronald Aylmer Fisher's daughter.

---
Donald B. Macnaughton   MatStat Research Consulting Inc
[EMAIL PROTECTED]  Toronto, Canada
---


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-13 Thread Ting Ting

> 
> A good example of a simple situation for which exact P values are
> unavailable is the Behrens-Fisher problem (testing the equality of
> normal means from normal populations with unequal variances).  Some
> might say we have approximate solutions that are good enough.
> 
would u pls give some more detail examples about this?
thx


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-13 Thread Chris . Chambers

In article <[EMAIL PROTECTED]>,
  Jerry Dallal <[EMAIL PROTECTED]> wrote:

> (1) statistical significance usually is unrelated to practice
> importance.

I don't think so. I can think of many examples in which statistical
inference plays an invaluable role in practical applications and
instrumentation, or indeed any "practical" application of a theory etc.
Not just in science, but engineering, e.g aircraft design, studying the
brain, electrical enginerring. Certainly there are examples of
statistical nonsense, e.g. polls, but i wouldn't go so far as to say it
is usually like this.
I *would* argue that without some method to determine the likelihood of
a difference b/w two conditions you have no chance of determining
practical importance at all.

> (2) absence of evidence is not evidence of absence

Everyone who has done elementary statistics is aware of this edict. But
what if your power is very high and/or you have very large N? I have
always found it surprising that we can't turn it around and develop a
probability that two groups are the same. Power or beta is surely
correlated with the certainty of this approach.

Chris


Sent via Deja.com http://www.deja.com/
Before you buy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-13 Thread Gene Gallagher


> As to Observational studies --
>
> http://www.cnr.colostate.edu/~anderson/thompson1.html
>
> This is a short article and long bibliography.   The title is direct:
> "326 Articles/Books Questioning the Indiscriminate Use of
> Statistical Hypothesis Tests in Observational Studies"
> (Compiled by William L. Thompson)

This bibliography has many articles apparently discussing Fisher's
views on p values.  Can someone recommend a good book on the history of
statistics, especially one focusing on Fisher's accomplishments.
Fisher's contributions and prickly personality are dealt with
tangentially in Provine's wonderful biography of Sewall Wright.  Surely,
Fisher has merited one or more scholarly biographies of his own.

--
Eugene D. Gallagher
ECOS, UMASS/Boston


Sent via Deja.com http://www.deja.com/
Before you buy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-12 Thread Jerry Dallal

San wrote:
> 
> Would there be some cases which the p-value are so difficult to find
> that it's nearly impossible? 

I'm tempted to say "not under a randomization model" but, yes, there
are many problems for which P values are not readily available. 
Perhaps P values are unavailable for *most* problems--it's just that
we're so good at figuring out new uses for the cases we can solve! 
A good example of a simple situation for which exact P values are
unavailable is the Behrens-Fisher problem (testing the equality of
normal means from normal populations with unequal variances).  Some
might say we have approximate solutions that are good enough.

> Is this a kind of limitation to the
> hypothesis testing using p-value? 

Yes.  Stepwise procedures (regression, in particular) are good
examples.

> Is there any substitute for the
> p-value?

Many.  You could start with likelihood procedures, Bayes methods,
and decision theory.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-12 Thread Donald Burrill

On Thu, 12 Oct 2000, dennis roberts wrote in part:

> one nice full issue of a journal about this general topic of 
> hull hypothesis testing ...
  
 Dealing with problems in naval architecture, one presumes?
-- Don.
 --
 Donald F. Burrill[EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 (603) 535-2597
 Department of Mathematics, Boston University[EMAIL PROTECTED]
 111 Cummington Street, room 261, Boston, MA 02215   (617) 353-5288
 184 Nashua Road, Bedford, NH 03110  (603) 471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-12 Thread San

Would there be some cases which the p-value are so difficult to find
that it's nearly impossible? Is this a kind of limitation to the
hypothesis testing using p-value? Is there any substitute for the
p-value? 
Thx for ur reply.



Jerry Dallal wrote:
> 
> I wrote:
> 
> > (1) statistical significance usually is unrelated to practice
> > importance.
> 
> I meant to type "practical importance".


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-12 Thread dennis roberts

one nice full issue of a journal about this general topic of hull 
hypothesis testing that i came across recently is:

Research in the Schools, Vol 5, Number 2, Fall 1998 ...

you could contact jim mclean at ... jmclean@ etsu.edu ... and inquire about 
obtaining a copy

we are in the process of considering uploading these article files to a 
website ... but, the details have to be worked out ...

this issue hits almost every salient issue with respect to this topic and, 
provides (along with the url that had 326 that rich ulrich sent) ... lots 
of good references on this topic


At 01:42 PM 10/12/00 +, Jerry Dallal wrote:
>I wrote:
>
> > (1) statistical significance usually is unrelated to practice
> > importance.
>
>I meant to type "practical importance".
>
>
>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
>=



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-12 Thread Jerry Dallal

I wrote:

> (1) statistical significance usually is unrelated to practice
> importance.

I meant to type "practical importance".


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-11 Thread Rich Ulrich

< also posted to sci.stat.math, sci.stat.consult  where separate
versions of the same question were posted. >

On Wed, 11 Oct 2000 23:25:05 +0800, San <[EMAIL PROTECTED]>
wrote:

> What are the limitations of hypothesis testing using significance tests
> based on p-values?
> 
> Can someone suggest me where I can find some reference book related to
> the topics above?
> thank you

As to Observational studies --

http://www.cnr.colostate.edu/~anderson/thompson1.html

This is a short article and long bibliography.   The title is direct: 
"326 Articles/Books Questioning the Indiscriminate Use of
Statistical Hypothesis Tests in Observational Studies"
(Compiled by William L. Thompson)

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



questions on hypothesis

2000-10-11 Thread San

What are the limitations of hypothesis testing using significance tests
based on p-values?

Can someone suggest me where I can find some reference book related to
the topics above?
thank you


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: questions on hypothesis

2000-10-11 Thread Jerry Dallal

San wrote:
> 
> What are the limitations of hypothesis testing using significance tests
> based on p-values?
> 

(1) statistical significance usually is unrelated to practice
importance.
(2) absence of evidence is not evidence of absence
http://www.bmj.com/cgi/content/full/311/7003/485


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=