Re: Interpreting p-value = .99

2001-12-03 Thread Stan Brown

Jerry Dallal <[EMAIL PROTECTED]> wrote in
>"Robert J. MacG. Dawson" wrote:
>> > > But I don't see why either the advertiser or the consumer advocate
>> > would, or should, do a two-tailed test.
>> The idea is that the "product" of these tests is a p-value to be used
>> in support of an argument. The evidence for the proposal is not made any
>> stronger by the tester's wish for a certain outcome; so the tester
>> should not  artificially halve the reported p-value.
>> Superficially, the idea of halving your p-values, doubling your chance
>> of reporting a "statistically significant" result in your favored
>> direction if there is really nothing there, and as a bonus, doing a
>> David-and-Uriah job ("And he wrote in the letter, saying, Set ye Uriah
>> in the forefront of the hottest battle, and retire ye from him, that he
>> may be smitten, and die.") on any possible finding in the other
>> direction, may seem attractive. A moment's thought should persuade us
>> that it is not ethical.
>> -Robert Dawson
>I'm not sure I understand the argument,

Oh good -- I thought it was just me!

Stan Brown, Oak Road Systems, Cortland County, New York, USA
My reply address is correct as is. The courtesy of providing a correct
reply address is more important to me than time spent deleting spam.

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-12-03 Thread Jerry Dallal

"Robert J. MacG. Dawson" wrote:

> > > But I don't see why either the advertiser or the consumer advocate
> > would, or should, do a two-tailed test.
> The idea is that the "product" of these tests is a p-value to be used
> in support of an argument. The evidence for the proposal is not made any
> stronger by the tester's wish for a certain outcome; so the tester
> should not  artificially halve the reported p-value.
> Superficially, the idea of halving your p-values, doubling your chance
> of reporting a "statistically significant" result in your favored
> direction if there is really nothing there, and as a bonus, doing a
> David-and-Uriah job ("And he wrote in the letter, saying, Set ye Uriah
> in the forefront of the hottest battle, and retire ye from him, that he
> may be smitten, and die.") on any possible finding in the other
> direction, may seem attractive. A moment's thought should persuade us
> that it is not ethical.
> -Robert Dawson

I'm not sure I understand the argument, but it may be irrelevant
regardless. Most consumer protection laws are written to punish the
"instance" and have nothing to do with "statistics" in general or
means in particular. This protects you from my friend the shopkeeper
who puts 1 lb in your 5 lb bag of sugar and 9.1 lbs in mine.

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-12-03 Thread Robert J. MacG. Dawson

Stan Brown wrote:

 I see why the quality controller would want to
> do a two-tailed test: the product should not be outside
> manufacturing parameters in either direction. (Presumably the QC
> person would be testing the pills themselves, not patients taking
> the pills.)

Actually, the quality controller's test is a slight misnomer here,
because we aren't talking in this problem (as you more or less observed)
about standard QC methodology.  Standard QC doctrine, from what I hear,
generally goes for repeatability, and "better than specified" is *not*
good. ("So, how did you do in the QC Methods exam?"  "My score was three
sigmas above the class average... so the prof failed me.")

The question dealt with a situation, though, in which only one
direction of deviation is bad.  Thus, the test might legitimately be
one-sided.  The reason is that the alpha value represents the risk of
unnecessarily stopping the production line, reprinting the labels, or
whatever. You *don't* need to do this if the product works better than
advertised, so a one-sided alpha really is the risk of doing it

> But I don't see why either the advertiser or the consumer advocate
> would, or should, do a two-tailed test. 

The idea is that the "product" of these tests is a p-value to be used
in support of an argument. The evidence for the proposal is not made any
stronger by the tester's wish for a certain outcome; so the tester
should not  artificially halve the reported p-value. 

Superficially, the idea of halving your p-values, doubling your chance
of reporting a "statistically significant" result in your favored
direction if there is really nothing there, and as a bonus, doing a
David-and-Uriah job ("And he wrote in the letter, saying, Set ye Uriah
in the forefront of the hottest battle, and retire ye from him, that he
may be smitten, and die.") on any possible finding in the other
direction, may seem attractive. A moment's thought should persuade us
that it is not ethical.

-Robert Dawson

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-12-02 Thread Rich Ulrich

On Sat, 1 Dec 2001 08:20:45 -0500, [EMAIL PROTECTED] (Stan Brown)

> [cc'd to previous poster]
> Rich Ulrich <[EMAIL PROTECTED]> wrote in
> >I think I could not blame students for floundering about on this one.
> >
> >On Thu, 29 Nov 2001 14:39:35 -0500, [EMAIL PROTECTED] (Stan Brown)
> >wrote:
> >> "The manufacturer of a patent medicine claims that it is 90% 
> >> effective(*) in relieving an allergy for a period of 8 hours. In a 
> >> sample of 200 people who had the allergy, the medicine provided 
> >> relief for 170 people. Determine whether the manufacturer's claim 
> >> was legitimate, to the 0.01 significance level."
> >I have never asked that as a question in statistics, and 
> >it does not have an automatic, idiomatic translation to what I ask.
> How would you have phrased the question, then? Though I took this 
> one from a book, I'm always looking to improve the phrasing of 
> questions I set in quizzes and exams.
 [ snip, rest]

"In a LATER sample of 200 ... relief for ONLY  170 people."

The Query you give after that should not pretend to be ultimate.
Are you willing to ask the students to contemplate that
the new experiment could differ drastically from the original 
sample and its conditions?
"Is this result consistent with the manufacturer's claim?"
 - you might notice that this sounds  'weasel-worded.'

Well, extremely-weasel-worded  *ought*  to be suiting,
for  *proper*  statistical claims from non-randomized trials.  
For the example: I would expect 15%  of a grab-sample
being treated for 'allergy'  would actually have flu or a cold.
Maybe the actual experiment was more sophisticated?

"What do you say about this result? (include a statistical
test using a nominal alpha=.01)."

Also,  "Why do I include the word 'nominal'  here?"  
Ans:  It means 'tabled value'  and it helps to emphasize
that it is hard to frame a non-random trial as a test;  the
problem is not presented with any such framing.

Hope this seems reasonable.

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-12-01 Thread Dennis Roberts

At 08:29 AM 12/1/01 -0500, Stan Brown wrote:

>How I would analyze this claim is that, when the advertiser says
>"90% of people will be helped", that means 90% or more. Surely if we
>did a large controlled study and found 93% were helped, we would not
>turn around and say the advertiser was wrong! But I think that's
>what would happen with a two-tailed test.
>Can you explain a bit further?

would the advertiser feel he/she was wrong if the 90% value was a little 
less too ... within some margin of error from 90? probably not

perhaps you want to say that the advertiser is claiming around 90% or MORE, 
or at LEAST 90% ...

again ... we are getting far too hung up in how some hypothesis is stated 
... is not the more important matter ... what sort of impact is there? if 
that is the case ... testing a null ... ANY null ... is really not going to 
help you

you need to look at the SAMPLE data ... then ask yourself ... what sort of 
a real effect might there be if i got the sample results that i did? if you 
then want to superimpose on this a question ... i wonder if 90 or more 
could have been the truth ... fine

but that is an after thought

this does not call for a hypothesis test

>Stan Brown, Oak Road Systems, Cortland County, New York, USA
>My reply address is correct as is. The courtesy of providing a correct
>reply address is more important to me than time spent deleting spam.
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at

dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-12-01 Thread Stan Brown

[cc'd to previous poster]

Rich Ulrich <[EMAIL PROTECTED]> wrote in
>I think I could not blame students for floundering about on this one.
>On Thu, 29 Nov 2001 14:39:35 -0500, [EMAIL PROTECTED] (Stan Brown)
>> "The manufacturer of a patent medicine claims that it is 90% 
>> effective(*) in relieving an allergy for a period of 8 hours. In a 
>> sample of 200 people who had the allergy, the medicine provided 
>> relief for 170 people. Determine whether the manufacturer's claim 
>> was legitimate, to the 0.01 significance level."

>I have never asked that as a question in statistics, and 
>it does not have an automatic, idiomatic translation to what I ask.

How would you have phrased the question, then? Though I took this 
one from a book, I'm always looking to improve the phrasing of 
questions I set in quizzes and exams.

>I can expect that it means, "Use a 1% test."  But, for what?

>That claim could NEVER, legitimately, have been *based*  
>on these data.   That is an idea that tries to intrude itself,
>to me, and makes it difficult to address the intended question.

Agreed! My idea, in reading that problem, was that the manufacturer 
claimed something for a product that has been on the market for some 
time, and some independent group, such as a newspaper or TV network, 
did a study to test the claim.

> - By the way, it also bothers me that "90% effective"  is 
>apparently translated as "effective for 90% of the people."
>I wondered if the asterisk was supposed to represent "[sic]".

The asterisk led to my note defining it as relieving symptoms for 
90% of people who use it, and asking students to think whether the 
claim would also be true if it relieved symptoms for more than 90%. 
(I think the real-world answer is clearly Yes: If a product is 
claimed to help 90% of people and it actually helps 93%, we do not 
say the claim was false.)

Stan Brown, Oak Road Systems, Cortland County, New York, USA
My reply address is correct as is. The courtesy of providing a correct
reply address is more important to me than time spent deleting spam.

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-12-01 Thread Stan Brown

[cc'd to previous poster; please follow up in newsgroup]

Robert J. MacG. Dawson <[EMAIL PROTECTED]> wrote in
>Stan Brown wrote:
>> "The manufacturer of a patent medicine claims that it is 90%
>> effective(*) in relieving an allergy for a period of 8 hours. In a
>> sample of 200 people who had the allergy, the medicine provided
>> relief for 170 people. Determine whether the manufacturer's claim
>> was legitimate, to the 0.01 significance level."

>   A hypothesis test is set up ahead of time so that it can only 
>give a definite answer of one sort. In this case, we have (at least)
>three distinct possibilities.

I really like your presentation of the three possible tests as 
"advertiser's test", "consumer advocate's test", and "quality 
controller's test". I see why the quality controller would want to 
do a two-tailed test: the product should not be outside 
manufacturing parameters in either direction. (Presumably the QC 
person would be testing the pills themselves, not patients taking 
the pills.)

But I don't see why either the advertiser or the consumer advocate 
would, or should, do a two-tailed test. Alan McLean seemed to agree 
that both would be one-tailed, if I understand him correctly.

>   (1) The "consumer advocate's test": we want a definite result that
>makes the manufacturer look bad, so H0 is the manufacturer's
>claim, Ha is that the claim is wrong, and the p-value is to be used 
>as an indication of reason to believe H0 wrong (if so).  Using a
>one-sided test here is akin to saying "I want all my type I errors to be
>ones that make the manufacturer look bad".  Ethical behaviour here is to
>do a two-sided test and report a result in either direction.  

I don't get this. Why is that ethical behavior?

How I would analyze this claim is that, when the advertiser says 
"90% of people will be helped", that means 90% or more. Surely if we 
did a large controlled study and found 93% were helped, we would not 
turn around and say the advertiser was wrong! But I think that's 
what would happen with a two-tailed test.

Can you explain a bit further?

Stan Brown, Oak Road Systems, Cortland County, New York, USA
My reply address is correct as is. The courtesy of providing a correct
reply address is more important to me than time spent deleting spam.

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-12-01 Thread Stan Brown

Alan McLean <[EMAIL PROTECTED]> wrote in
>Stan, in practical terms, the conclusion 'fail to reject the null' is
>simply not true. You do in reality 'accept the null'. The catch is that
>this is, in the research situation, a tentative acceptance - you
>recognise that you may be wrong, so you carry forward the idea that the
>null may be 'true' but - on the sample evifdence - probably is not.
>On the other hand, this should also be the case when you 'reject the
>null' - the rejection may be wrong, so the rejection is also tentative.
>The difference is that the null has this privileged position.

Thanks -- that makes some sense.

Stan Brown, Oak Road Systems, Cortland County, New York, USA
My reply address is correct as is. The courtesy of providing a correct
reply address is more important to me than time spent deleting spam.

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-11-30 Thread jim clark


On Thu, 29 Nov 2001, Stan Brown wrote:
> But -- and in retrospect I should have seen it coming -- some 
> students framed the hypotheses so that the alternative hypothesis 
> was "the drug is effective as claimed." They had
>   Ho: p <= .9; Ha: p > .9; p-value = .9908.

You might point out to students the possible irrationality of
this framing of the question.  By this reasoning would it not be
the case that the strongest evidence for the claim that p<=.9
would be to have 0 successes (then p would be 1)?  And that the
worst case (i.e., null would be rejected) would be for all cases
to be successful?  This is, quite contrary, I expect, to what we
would normally take to be negative and positive outcomes as far
as the drug company is concerned.  The most (only?) sensible
interpretation of p=.9 is that at least 90% (i.e., p>=.9) would
be successes.

Best wishes

James M. Clark  (204) 786-9757
Department of Psychology(204) 774-4134 Fax
University of Winnipeg  4L05D
Winnipeg, Manitoba  R3B 2E9 [EMAIL PROTECTED]

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-11-30 Thread Rich Ulrich

I think I could not blame students for floundering about on this one.

On Thu, 29 Nov 2001 14:39:35 -0500, [EMAIL PROTECTED] (Stan Brown)

> On a quiz, I set the following problem to my statistics class:
> "The manufacturer of a patent medicine claims that it is 90% 
> effective(*) in relieving an allergy for a period of 8 hours. In a 
> sample of 200 people who had the allergy, the medicine provided 
> relief for 170 people. Determine whether the manufacturer's claim 
> was legitimate, to the 0.01 significance level."
> (The problem was adapted from Spiegel and Stevens, /Schaum's
> Outline: Statistics/, problem 10.6.)
[ snip, rest ]

"Determine whether the manufacturer's claim was legitimate,
to the 0.01 significance level."  

I have never asked that as a question in statistics, and 
it does not have an automatic, idiomatic translation to what I ask.

I can expect that it means, "Use a 1% test."  But, for what?

After I notice that the outcome was poorer than the claim, 
then I wonder if the test is, "Are these data consistent with
the Claim? or do they tend to disprove it?"  That seems 
some distance from the tougher, philosophical question of 
whether, at the time it was made, the claim was legitimate.

That claim could NEVER, legitimately, have been *based*  
on these data.   That is an idea that tries to intrude itself,
to me, and makes it difficult to address the intended question.

 - By the way, it also bothers me that "90% effective"  is 
apparently translated as "effective for 90% of the people."
I wondered if the asterisk was supposed to represent "[sic]".


Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-11-30 Thread Robert J. MacG. Dawson

Stan Brown wrote:
> On a quiz, I set the following problem to my statistics class:
> "The manufacturer of a patent medicine claims that it is 90%
> effective(*) in relieving an allergy for a period of 8 hours. In a
> sample of 200 people who had the allergy, the medicine provided
> relief for 170 people. Determine whether the manufacturer's claim
> was legitimate, to the 0.01 significance level."

A hypothesis test is set up ahead of time so that it can only 
give a definite answer of one sort. In this case, we have (at least)
distinct possibilities.

(0) The "advertiser's test": we want a definite result that makes the
manufacturer look good, so Ha agrees with the manufacturers claim. To do
this the manufacturer's claim must be slightly restated as "our medicine
is *more*  than 90% effective"; as the exact 90% value has prior
probability 0 this is not a problem. H0 is actually the original claim;
and the hoped-for outcome is to reject it because the number of
successes is too large.  The manufacturer is not entitled to do a
1-tailed test just to  shrink the reported p-value. Using a 1-tailed
test is to say "I want all my Type I errors to be ones that let us get
away with inflated claims."  This is what the students did:

> But -- and in retrospect I should have seen it coming -- some
> students framed the hypotheses so that the alternative hypothesis
> was "the drug is effective as claimed." They had
> Ho: p <= .9; Ha: p > .9; p-value = .9908.

Ethical behaviour is to do a two-tailed test, and report/act on a
rejection in either direction.

It is not necessary to say "there is a difference but we don't know in
which direction"; a two-tailed test can legitimately have three outcomes
(reject low, reject high, not enough data). (There is a potential new
type of error in which we reject in the wrong tail; this *ought* to be
called a Type III error were the name not already taken [as a somewhat
misleading in-joke akin to the "Eleventh Commandment"] to mean "testing
the wrong hypothesis" or something similar.  It is easy to show that the
probability of this is low enough to ignore if alpha is even moderately
low, as the distance between tails is twice the distance from the mean
to the tail.)

(1) The "consumer advocate's test": we want a definite result that
makes the manufacturer look bad, so H0 is the manufacturer's
claim, Ha is that the claim is wrong, and the p-value is to be used 
as an indication of reason to believe H0 wrong (if so).  Using a
one-sided test here is akin to saying "I want all my type I errors to be
ones that make the manufacturer look bad".  Ethical behaviour here is to
do a two-sided test and report a result in either direction.  

(2) the "quality controller's test": H0 is the manufacturer's
claim, Ha is that the claim is wrong, and the p-value is to be used 
to balance risks. Here, I think, a one-tailed test is legitimate. 

I claim that the consumer advocate and the manufacturer *should* be
doing the same test in situations 0 and 1. Both should be reporting a
p-value of 0.0184, both should be interpreting it as "the medicine is
less effective than claimed", and the manufacturer should take action by
either improving the product or modifying the advertisements.

-Robert Dawson

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-11-29 Thread Stan Brown

Gus Gassmann <[EMAIL PROTECTED]> wrote in
>Stan Brown wrote:
>> "The manufacturer of a patent medicine claims that it is 90%
>> effective(*) in relieving an allergy for a period of 8 hours. In a
>> sample of 200 people who had the allergy, the medicine provided
>> relief for 170 people. Determine whether the manufacturer's claim
>> was legitimate, to the 0.01 significance level."

>> But -- and in retrospect I should have seen it coming -- some
>> students framed the hypotheses so that the alternative hypothesis
>> was "the drug is effective as claimed." They had
>> Ho: p <= .9; Ha: p > .9; p-value = .9908.
>I don't understand where they get the .9908 from. 

x=170, n=200, p'=.85, Ha: p>.9, alpha=.01
z = -2.357
On TI-83, normalcdf(-2.357,1E99) = .9908; i.e., 99.08% of the area 
is above z = -2.357.

Stan Brown, Oak Road Systems, Cortland County, New York, USA
My reply address is correct as is. The courtesy of providing a correct
reply address is more important to me than time spent deleting spam.

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-11-29 Thread Alan McLean


Stan's two alternatives were correct as stated - they were two one sided
tests, not a one sided and a two sided test.

Stan, in practical terms, the conclusion 'fail to reject the null' is
simply not true. You do in reality 'accept the null'. The catch is that
this is, in the research situation, a tentative acceptance - you
recognise that you may be wrong, so you carry forward the idea that the
null may be 'true' but - on the sample evifdence - probably is not.

On the other hand, this should also be the case when you 'reject the
null' - the rejection may be wrong, so the rejection is also tentative.
The difference is that the null has this privileged position.

In areas like quality control, of course, it is quite clear that you
decide, and act as if, the null is true or is not true.


Gus Gassmann wrote:
> Stan Brown wrote:
> > On a quiz, I set the following problem to my statistics class:
> >
> > "The manufacturer of a patent medicine claims that it is 90%
> > effective(*) in relieving an allergy for a period of 8 hours. In a
> > sample of 200 people who had the allergy, the medicine provided
> > relief for 170 people. Determine whether the manufacturer's claim
> > was legitimate, to the 0.01 significance level."
> >
> > (The problem was adapted from Spiegel and Stevens, /Schaum's
> > Outline: Statistics/, problem 10.6.)
> >
> > I believe a one-tailed test, not a two-tailed test, is appropriate.
> > It would be silly to test for "effectiveness differs from 90%" since
> > no one would object if the medicine helps more than 90% of
> > patients.)
> >
> > Framing the alternative hypothesis as "the manufacturer's claim is
> > not legitimate" gives
> > Ho: p >= .9; Ha: p < .9; p-value = .0092
> > on a one-tailed t-test. Therefore we reject Ho and conclude that the
> > drug is less than 90% effective.
> >
> > But -- and in retrospect I should have seen it coming -- some
> > students framed the hypotheses so that the alternative hypothesis
> > was "the drug is effective as claimed." They had
> > Ho: p <= .9; Ha: p > .9; p-value = .9908.
> I don't understand where they get the .9908 from. Whether you test a
> one-or a two-sided alternative, the test statistic is the same. So the
> p-value for the two-sided version of the test should be simply twice
> the p-value for the one-sided alternative, 0.0184. Hence the paradox
> you speak of is an illusion.
> Unfortunately for you, the two versions of the test lead to different
> conclusions. If the correct p-value is given, I would give full marks
> (perhaps, depending on how much the problem is worth overall,
> subtracting 1 out of 10 marks for the nonsensical form of Ha).
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
> =

Department of Econometrics and Business Statistics
Monash University, Caulfield Campus, Melbourne
Tel:  +61 03 9903 2102Fax: +61 03 9903 2007

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-11-29 Thread Dennis Roberts

forget the statement of the null

build a CI ... perhaps 99% (which would correspond to your .01 sig. test) ...

let that help to determine if the claim seems reasonable or not

in this case ... p hat = .85 .. thus q hat = .15

stan error of a proportion (given SRS was done) is about

stan error of p hats = sqrt ((p hat * q hat) / n) = sqrt (.85 * .15 / 200) 
= about .025

approximate 99% CI would be about p hat +/-  2.58 * .025 = .85 +/- .06

CI would be about .79 to .91 ... so, IF you insist on a hypothesis test ... 
retain the null

personally, i would rather say that the pop. proportion might be between 
(about) .79 to .91 ...

doesn't hold me to .9

problem here is that if you have opted for .05 ... you would have rejected 
... (just barely)

At 02:39 PM 11/29/01 -0500, you wrote:
>On a quiz, I set the following problem to my statistics class:
>"The manufacturer of a patent medicine claims that it is 90%
>effective(*) in relieving an allergy for a period of 8 hours. In a
>sample of 200 people who had the allergy, the medicine provided
>relief for 170 people. Determine whether the manufacturer's claim
>was legitimate, to the 0.01 significance level."
>(The problem was adapted from Spiegel and Stevens, /Schaum's
>Outline: Statistics/, problem 10.6.)
>I believe a one-tailed test, not a two-tailed test, is appropriate.
>It would be silly to test for "effectiveness differs from 90%" since
>no one would object if the medicine helps more than 90% of
>Framing the alternative hypothesis as "the manufacturer's claim is
>not legitimate" gives
> Ho: p >= .9; Ha: p < .9; p-value = .0092
>on a one-tailed t-test. Therefore we reject Ho and conclude that the
>drug is less than 90% effective.
>But -- and in retrospect I should have seen it coming -- some
>students framed the hypotheses so that the alternative hypothesis
>was "the drug is effective as claimed." They had
> Ho: p <= .9; Ha: p > .9; p-value = .9908.
>Now as I understand things it is not formally legitimate to accept
>the null hypothesis: we can only either reject it (and accept Ha) or
>fail to reject it (and draw no conclusion). What I would tell my
>class is this: the best we can say is that p = .9908 is a very
>strong statement that rejecting the null hypothesis would be a Type
>I error. But I'm not completely easy in my mind about that, when
>simply reversing the hypotheses gives p = .0092 and lets us conclude
>that the drug is not 90% effective.
>There seems to be a paradox: The very same data lead either to the
>conclusion "the drug is not effective as claimed" or to no
>conclusion. I could certainly tell my class: "if it makes sense in
>the particular situation, reverse the hypotheses and recompute the
>p-value." Am I being over-formal here, or am I being horribly stupid
>and missing some reason why it _would_ be legitimate to draw a
>conclusion from p=.9908?
>Stan Brown, Oak Road Systems, Cortland County, New York, USA
>My reply address is correct as is. The courtesy of providing a correct
>reply address is more important to me than time spent deleting spam.
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at

dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Re: Interpreting p-value = .99

2001-11-29 Thread Gus Gassmann

Stan Brown wrote:

> On a quiz, I set the following problem to my statistics class:
> "The manufacturer of a patent medicine claims that it is 90%
> effective(*) in relieving an allergy for a period of 8 hours. In a
> sample of 200 people who had the allergy, the medicine provided
> relief for 170 people. Determine whether the manufacturer's claim
> was legitimate, to the 0.01 significance level."
> (The problem was adapted from Spiegel and Stevens, /Schaum's
> Outline: Statistics/, problem 10.6.)
> I believe a one-tailed test, not a two-tailed test, is appropriate.
> It would be silly to test for "effectiveness differs from 90%" since
> no one would object if the medicine helps more than 90% of
> patients.)
> Framing the alternative hypothesis as "the manufacturer's claim is
> not legitimate" gives
> Ho: p >= .9; Ha: p < .9; p-value = .0092
> on a one-tailed t-test. Therefore we reject Ho and conclude that the
> drug is less than 90% effective.
> But -- and in retrospect I should have seen it coming -- some
> students framed the hypotheses so that the alternative hypothesis
> was "the drug is effective as claimed." They had
> Ho: p <= .9; Ha: p > .9; p-value = .9908.

I don't understand where they get the .9908 from. Whether you test a
one-or a two-sided alternative, the test statistic is the same. So the
p-value for the two-sided version of the test should be simply twice
the p-value for the one-sided alternative, 0.0184. Hence the paradox
you speak of is an illusion.

Unfortunately for you, the two versions of the test lead to different
conclusions. If the correct p-value is given, I would give full marks
(perhaps, depending on how much the problem is worth overall,
subtracting 1 out of 10 marks for the nonsensical form of Ha).

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

Interpreting p-value = .99

2001-11-29 Thread Stan Brown

On a quiz, I set the following problem to my statistics class:

"The manufacturer of a patent medicine claims that it is 90% 
effective(*) in relieving an allergy for a period of 8 hours. In a 
sample of 200 people who had the allergy, the medicine provided 
relief for 170 people. Determine whether the manufacturer's claim 
was legitimate, to the 0.01 significance level."

(The problem was adapted from Spiegel and Stevens, /Schaum's
Outline: Statistics/, problem 10.6.)

I believe a one-tailed test, not a two-tailed test, is appropriate. 
It would be silly to test for "effectiveness differs from 90%" since 
no one would object if the medicine helps more than 90% of 

Framing the alternative hypothesis as "the manufacturer's claim is 
not legitimate" gives
Ho: p >= .9; Ha: p < .9; p-value = .0092
on a one-tailed t-test. Therefore we reject Ho and conclude that the 
drug is less than 90% effective.

But -- and in retrospect I should have seen it coming -- some 
students framed the hypotheses so that the alternative hypothesis 
was "the drug is effective as claimed." They had
Ho: p <= .9; Ha: p > .9; p-value = .9908.

Now as I understand things it is not formally legitimate to accept 
the null hypothesis: we can only either reject it (and accept Ha) or 
fail to reject it (and draw no conclusion). What I would tell my 
class is this: the best we can say is that p = .9908 is a very 
strong statement that rejecting the null hypothesis would be a Type 
I error. But I'm not completely easy in my mind about that, when 
simply reversing the hypotheses gives p = .0092 and lets us conclude 
that the drug is not 90% effective.

There seems to be a paradox: The very same data lead either to the 
conclusion "the drug is not effective as claimed" or to no 
conclusion. I could certainly tell my class: "if it makes sense in 
the particular situation, reverse the hypotheses and recompute the 
p-value." Am I being over-formal here, or am I being horribly stupid 
and missing some reason why it _would_ be legitimate to draw a 
conclusion from p=.9908?

Stan Brown, Oak Road Systems, Cortland County, New York, USA
My reply address is correct as is. The courtesy of providing a correct
reply address is more important to me than time spent deleting spam.

Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at