RE: .05 level of significance

2000-10-23 Thread Simon, Steve, PhD

Jerry Dallal posted an interesting web page about p-values:

http://www.tufts.edu/~gdallal/pval.htm

and I have a few comments about this page and the discussion about
significance testing on edstat-l.

First, it is pretty clear to all of us that the p-value does not answer any
questions about practical significance, but you all might find this example
amusing anyway. The British Medical Journal (BMJ) published two papers back
to back on side effects of vaccination. One paper summarized the results
using p-values, and the other using confidence intervals. So I took the
opportunity to submit a letter to the editor via their web pages. You can
read it at

http://www.bmj.com/cgi/eletters/318/7192/1169

although it did not get published in the paper version of BMJ. I computed a
confidence interval for the odds ratio of 1.06 that the one paper only
reported the p-value (0.545). The interval was 0.81 to 1.37. and I argued
that if we accepted the interval of 0.67 to 1.50 as a range of clinical
indifference for the odds ratio, then the resulting confidence interval
would give us some assurance that there was not a clinically important
change in the odds of a side effect.

But then I played some "what if" games. Let's suppose that the rate of the
side effect was 20 times larger in both groups. This leads to a confidence
interval of 1.02 to 1.10 and a p-value of 0.0049. Let's suppose that the
rate of side effects was 20 times smaller in both groups. Then the
confidence interval would be 0.46 to 2.4 and the p-value would be 0.90.

The interesting thing here is that the case with the smallest p-value is the
case where you have the most assurance that there is no clinically
significant increase in the risk of side effects (since the upper confidence
limit is only 1.10). The case where the p-value is the largest is the case
where you have the least assurance that there is no clinically significant
increase (since the upper confidence limit is 2.4).

So you could argue that (at least in this case) the smaller the p-value the
greater the evidence of a statistically significant finding and the lesser
the evidence of a clinically important finding. This is without changing the
sample size or the odds ratio. So, not only does the p-value not inform you
about practical significance, it actually can be completely reversed from
the way that people would be likely to interpret it.

I had to put in some cautionary statements about how you use medical
judgement to define the range of clinical indifference and that I was a
statistician and not a doctor. But that does not detract from my general
point about practical significance.

The second comment is that the issue of practical versus statistical
significance is not the only criticism of p-values. There is an important
issue that Herman Rubin raises frequently about how you need to balance the
risks of various decisions. I probably cannot explain it as well as he can,
but you should probably demand a different level of proof depending on the
nature of the disease and the severity of the proposed therapy. For example,
you might demand a very high level of proof when examining a surgical
intervention for a non-life threatening condition. On the other hand, there
was a recent study that showed that you could decrease the risk of cataracts
by wearing sunglasses. I would demand a lower level of proof for this type
of research, because wearing sunglasses carries far fewer costs and risks
than a typical surgery. Besides, I would look pretty cool in shades, don't
you think?

Finally, I would argue that combining a p-value with either an a priori
power calculation or with a confidence interval (either one implying some
discussion of what a range of clinical indifference might be) overcomes most
of the objections to the use of p-values. In particular, you define the
range of clinical indifference by balancing the severity of the disease
against the cost and side effects of the therapy. Sad to say, very few
reasearchers touch the issue of clinical indifference when they publish
their findings.

Others may disagree with my perspective, and I look forward to further
discussion of this issue.

Steve Simon, [EMAIL PROTECTED], Standard Disclaimer.
STATS: STeve's Attempt to Teach Statistics. http://www.cmh.edu/stats



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-23 Thread John W. Kulig



One of the original questions on this thread had to do with the origin of the
".05" cutoff. I suggested that if naive subjects were placed in a situation in
which they had to detect whether a coin was fair or not, it would correspond
closely to the commonly used .05 level. I just did it with 65 naive subjects
(Intro Psych - mostly freshmen). Three were discarded for not following
instructions or having unreadable answers. I flipped a double-headed coin 10
times, and subjects indicated where in the sequence of Heads they would challenge
the fairness of the coin. The results are as follows - expressed as % and
cumulative % of the 63 I scored.

AFTER thisThis % of my
number ofsample challenged
Heads:fairness:   Cumulative %

1   0   0
21.61  1.61
3  11.2912.90
4  22.5835.48
5  25.8161.29
6  24.1985.48
79.6895.16
83.2398.39
91.61  100.00
10 0100.00
-

>From the binomial, 5 heads in 5 flips is .031, 6 heads in 6 flips is .016. So, a
majority challenged after I got 5 heads in a row.

I suggested that the .05 may be rooted in human cognitive heuristics that evolved
to serve everyday decision making - such as catching cheaters (as opposed to
formal statistical trainig). "Evolutionary psychologists" have marshalled quite a
bit of evidence that many of our cognitive abilities (including deductive logic)
did not evolve "context-free" but to meet the needs people in everyday decisions
. It's a speculative, but not unreasonable, hypothesis.

I learned a few things doing the demo. Because these were my students, they
expressed great reluctance challenging the coin I suggested was fair. After I
detected their reluctance I toned down the "challenge" language and simply asked
them to indicate where in the sequence you'd suspect a non-fair coin. It would be
fun (but alot of work) to have both heads and tails drawn - but in different
proportions. There are a host of other contextual features (including the cost of
making a Type I vs. Type II error) that should matter too.

--
---
John W. Kulig[EMAIL PROTECTED]
Department of Psychology http://oz.plymouth.edu/~kulig
Plymouth State College   tel: (603) 535-2468
Plymouth NH USA 03264fax: (603) 535-2412
---
"What a man often sees he does not wonder at, although he knows
not why it happens; if something occurs which he has not seen before,
he thinks it is a marvel" - Cicero.




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-23 Thread Jerry Dallal

Petr Kuzmic wrote:
> 
> Jerry Dallal wrote:
> [...]
> > http://www.tufts.edu/~gdallal/pval.htm
> > http://www.tufts.edu/~gdallal/p05.htm
> 
> Thanks for sharing these links.  However, a lot of URSs on the "Little
> Handbook of Statistical Practice" website
> (http://www.tufts.edu/~gdallal/LHSP.HTM) have broken links to image
> files.  Ah, the joys of website authoring... [;)]
> 
> Hope this helps,

Thanks.  All fixed.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-23 Thread Petr Kuzmic



Jerry Dallal wrote:
[...]
> http://www.tufts.edu/~gdallal/pval.htm
> http://www.tufts.edu/~gdallal/p05.htm

Thanks for sharing these links.  However, a lot of URSs on the "Little
Handbook of Statistical Practice" website
(http://www.tufts.edu/~gdallal/LHSP.HTM) have broken links to image
files.  Ah, the joys of website authoring... [;)]

Hope this helps,

- Petr Kuzmic

_
P e t r   K u z m i c,  Ph.D.   mailto:[EMAIL PROTECTED]
BioKin Ltd. * Software and Consulting   http://www.biokin.com


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-23 Thread dennis roberts

while this may be the case ... in general ... for some decisions we make 
... we would not even allow this level of snickering to suggest to us that 
something is afoul ... whereas for others ... it would not bother us (or 
should not) if the chances were larger ...

it all depends ...

At 10:36 AM 10/23/00 -0400, David Evans wrote:
>I remember seeing the same thing a year or so ago on this list. I tried
>it for the first time this semester with my "refresher" course in
>statistics for a class of incoming graduate students. I tossed a coin a
>number of times and reported the result as "heads" each time
>irrespective of the actual outcome. At the third call a slight snigger
>went round the room, clearly emerging disbelief at the fourth and
>outright disbelief at the fifth, corresponding to p values of 0.125,
>0.0625, 0.03215 based on a hypothesis of a fair coin and a truthful
>instructor. It appears, indeed, that 0.05 reasonably represents the
>level at which human scepticism begins to emerge.
>
>David Evans
>School of Marine Science
>College of William & Mary
>Gloucester Point, VA
>
>"John W. Kulig" wrote:
> >
> > I have been searching for some "psychological" data on the .05 
> issue - I
> > know it's out there but haven't found it yet. It went something like this:
> > Claim to a friend that you have a fair coin. But the coin is not fair. 
> Flip the
> > coin (you get heads). Flip it again (heads again). Ask the friend if 
> s/he wants
> > to risk $100 (even odds) that the coin is not fair. At what point does the
> > friend (who is otherwise ignorant of p issues) wager a bet that the 
> coin is not
> > fair? I have heard that after 5 or 6 heads the friend is pretty sure 
> it's a bad
> > coin - or at least a trick (at this point we cross .05 on the binomial 
> chart)
> > .05 may be rooted in our general judgment/perception heuristics -
> > understandable in evolutionary terms if we examine the everyday 
> situations we
> > make these judgments in. Of course the relative risks of I versus II would
> > matter (e.g. falsely accusing and starting a brawl vs. losing to a con 
> artist).
> > I will try to locate some research data on this  or I'll flip a few 
> coins
> > in my next statistically naive class.
> >
> > --
> > ---
> > John W. Kulig[EMAIL PROTECTED]
> > Department of Psychology http://oz.plymouth.edu/~kulig
> > Plymouth State College   tel: (603) 535-2468
> > Plymouth NH USA 03264fax: (603) 535-2412
> > ---
> > "What a man often sees he does not wonder at, although he knows
> > not why it happens; if something occurs which he has not seen before,
> > he thinks it is a marvel" - Cicero.
> >
> > =
> > Instructions for joining and leaving this list and remarks about
> > the problem of INAPPROPRIATE MESSAGES are available at
> >   http://jse.stat.ncsu.edu/
> > =
>
>
>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
>=



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-23 Thread David Evans

I remember seeing the same thing a year or so ago on this list. I tried
it for the first time this semester with my "refresher" course in
statistics for a class of incoming graduate students. I tossed a coin a
number of times and reported the result as "heads" each time
irrespective of the actual outcome. At the third call a slight snigger
went round the room, clearly emerging disbelief at the fourth and
outright disbelief at the fifth, corresponding to p values of 0.125,
0.0625, 0.03215 based on a hypothesis of a fair coin and a truthful
instructor. It appears, indeed, that 0.05 reasonably represents the
level at which human scepticism begins to emerge.

David Evans
School of Marine Science
College of William & Mary
Gloucester Point, VA

"John W. Kulig" wrote:
> 
> I have been searching for some "psychological" data on the .05 issue - I
> know it's out there but haven't found it yet. It went something like this:
> Claim to a friend that you have a fair coin. But the coin is not fair. Flip the
> coin (you get heads). Flip it again (heads again). Ask the friend if s/he wants
> to risk $100 (even odds) that the coin is not fair. At what point does the
> friend (who is otherwise ignorant of p issues) wager a bet that the coin is not
> fair? I have heard that after 5 or 6 heads the friend is pretty sure it's a bad
> coin - or at least a trick (at this point we cross .05 on the binomial chart)
> .05 may be rooted in our general judgment/perception heuristics -
> understandable in evolutionary terms if we examine the everyday situations we
> make these judgments in. Of course the relative risks of I versus II would
> matter (e.g. falsely accusing and starting a brawl vs. losing to a con artist).
> I will try to locate some research data on this  or I'll flip a few coins
> in my next statistically naive class.
> 
> --
> ---
> John W. Kulig[EMAIL PROTECTED]
> Department of Psychology http://oz.plymouth.edu/~kulig
> Plymouth State College   tel: (603) 535-2468
> Plymouth NH USA 03264fax: (603) 535-2412
> ---
> "What a man often sees he does not wonder at, although he knows
> not why it happens; if something occurs which he has not seen before,
> he thinks it is a marvel" - Cicero.
> 
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-23 Thread Jerry Dallal

I wrote:
 
> I'm preparing some notes for my students on "Why P=0.05?"
> I'll post them in the next few days (so I don't end up writing
> them twice and piecemeal, to boot!).

I'm writing these notes as I'm teaching, so they are necessarily a
series of first drafts.  I don't have time to polish them if I'm not
to fall behind.  Nevertheless, I consider them good enough to
distribution for class discussion.  Some of the references in "Why
P=0.05" are incomplete. I'll get them on my next trip to the library
unless someone knows of a detailed Fisher bibliography online.  I
wasn't able to locate one myself.

http://www.tufts.edu/~gdallal/pval.htm
http://www.tufts.edu/~gdallal/p05.htm


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-22 Thread Zina Taran

Pardon my interference, but I think there's some confusion regarding the
events here.

When I toss the first round of coins, I get about 1/2 of them heads. No
problema.
Then, when I toss the second time, 1/2 of *those ones that fell heads* (1/4
of the total, .5*.5)
have a chance to be heads again.

and also, about .5 of those that fell tails before have a chance to fall
heads too (1/4 of the total more).
so, we now have the union of two intersections, 1/4 +1/4

Am I on the right track here?

 - Original Message -
From: Bill Jefferys <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, October 22, 2000 12:19 PM
Subject: Re: .05 level of significance


> In article <[EMAIL PROTECTED]>,
> [EMAIL PROTECTED] (Donald Burrill) wrote:
>
> #On Sat, 21 Oct 2000, Bill Jefferys wrote:
>
> #> However, the combined experiment is 400 heads on 800 trials,
> #
> #This however is not the _intersection_ of the two specified events.
>
> Sure it is. It's the event I get by first getting 220 heads on 400
> trials AND THEN tossing 180 heads on 400 trials. If I toss one head
> (p=1/2) and then toss 1 tail (p=1/2) then the probability that I toss
> one head and then toss 1 tail is (1/2*1/2=1/4). That is a correct use of
> probability, and the intersection of the event of first tossing one head
> and the event of second tossing 1 tail is indeed the event of tossing
> one head followed by one tail.
>
> Similarly, the probability of first tossing 220 heads on 400 trials is
> given by the binomial distribution 0.5^400*C^400_220. And the
> probability of next tossing 180 heads on 400 trials is also given by the
> binomial distribution 0.5^400C^400_180. The probability that I
> accomplish both events in that order is the product of these two, is it
> not? So how can you say that these are not independent events, and how
> can you say that the intersection of the two is not as I say?
>
> It's true that the probability of tossing 400 heads on 800 trials in any
> order is not this product, but that is irrelevant.
>
> Do you claim that there is any situation where it is correct to multiply
> p-values?
>
> #> for which the two-tailed p-value is 1.0, not 0.05^2.
> #
> #> Contrary to popular belief, observed p-values are not probabilities.
> #> They cannot be probabilities because they do not obey the rules of the
> #> probability calculus, as the example shows.  They are, well, p-values.
> #
> #Sorry;  the example does not show that.  It shows only that if one uses
> #"combined" (in the phrase "combined event", or equivalent) to mean
> #something other than "intersection", the rules governing the behavior of
> #intersections may not apply to the behavior of combined events.
>
> Show me that it is in general correct to combine p-values by
> multiplication and I might agree with you.
>
> Best wishes, Bill
>
> --
> Bill Jefferys/Department of Astronomy/University of Texas/Austin, TX 78712
> Email: replace 'warthog' with 'clyde' | Homepage: quasar.as.utexas.edu
> I report spammers to [EMAIL PROTECTED]
> Finger for PGP Key: F7 11 FB 82 C6 21 D8 95  2E BD F7 6E 99 89 E1 82
> Unlawful to use this email address for unsolicited ads: USC Title 47 Sec
227
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-22 Thread Bill Jefferys

In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] 
(dennis roberts) wrote:

#so, what does the multiplicative "law" in probability mean then? 

If A is an event and B is an event, then the probability of the event 
A&B is given by P(A&B)=P(A)P(B) in the case of independence (it is 
P(A)P(B|A)=P(B)P(A|B) if the events aren't independent).

#i was merely indicating ... since i have done this in classes ... that if
#you show to students ... a sequence of (using a coin flip as the exemplar)
#... of heads ... in a row ... when it appears that they came about due to
#"random" flipping ... that when the probability of getting THAT particular
#sequence ... by chance alone (given that they assume that it is a fair
#coin) ... gets somewhere in the vicinity of .05 ... .01  approximately
#... that students start perceiving that something is awry ...
#
#are you saying that i have misrepresented the coin flipping example? take
#one coin ... flip ... observe outcome ... flip same coin again ... observe
#outcome .. etc? 

I'm not saying that your coin flip example is wrong, only that it has 
nothing to do with p-values, which are not probabilities. It's surely 
true that if you have a specific sequence of N heads and tails, under 
the assumption that the coin is fair, then the probability of obtaining 
that particular sequence is 0.5^N.

Correct statement: If 0<=x<=1 then the probability that the p-value is 
<=x is x.

Incorrect statement: The p-value that I observed is the probability of 
(something). Except for the trivial and irrelevant fact that a p-value 
is by definition between 0 and 1 and thus could be one of the x's in the 
above correct statement, but that's after the fact and only refers to 
_future_ trials, not to the trial that generated the observed p-value. 

Bill

-- 
Bill Jefferys/Department of Astronomy/University of Texas/Austin, TX 78712
Email: replace 'warthog' with 'clyde' | Homepage: quasar.as.utexas.edu
I report spammers to [EMAIL PROTECTED]
Finger for PGP Key: F7 11 FB 82 C6 21 D8 95  2E BD F7 6E 99 89 E1 82
Unlawful to use this email address for unsolicited ads: USC Title 47 Sec 227


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-22 Thread Bill Jefferys

In article <[EMAIL PROTECTED]>, 
[EMAIL PROTECTED] (Donald Burrill) wrote:

#On Sat, 21 Oct 2000, Bill Jefferys wrote:

#> However, the combined experiment is 400 heads on 800 trials, 
#
#This however is not the _intersection_ of the two specified events. 

Sure it is. It's the event I get by first getting 220 heads on 400 
trials AND THEN tossing 180 heads on 400 trials. If I toss one head 
(p=1/2) and then toss 1 tail (p=1/2) then the probability that I toss 
one head and then toss 1 tail is (1/2*1/2=1/4). That is a correct use of 
probability, and the intersection of the event of first tossing one head 
and the event of second tossing 1 tail is indeed the event of tossing 
one head followed by one tail. 

Similarly, the probability of first tossing 220 heads on 400 trials is 
given by the binomial distribution 0.5^400*C^400_220. And the 
probability of next tossing 180 heads on 400 trials is also given by the 
binomial distribution 0.5^400C^400_180. The probability that I 
accomplish both events in that order is the product of these two, is it 
not? So how can you say that these are not independent events, and how 
can you say that the intersection of the two is not as I say?

It's true that the probability of tossing 400 heads on 800 trials in any 
order is not this product, but that is irrelevant.

Do you claim that there is any situation where it is correct to multiply 
p-values?

#> for which the two-tailed p-value is 1.0, not 0.05^2. 
#
#> Contrary to popular belief, observed p-values are not probabilities. 
#> They cannot be probabilities because they do not obey the rules of the 
#> probability calculus, as the example shows.  They are, well, p-values.
#
#Sorry;  the example does not show that.  It shows only that if one uses 
#"combined" (in the phrase "combined event", or equivalent) to mean 
#something other than "intersection", the rules governing the behavior of 
#intersections may not apply to the behavior of combined events.

Show me that it is in general correct to combine p-values by 
multiplication and I might agree with you.

Best wishes, Bill

-- 
Bill Jefferys/Department of Astronomy/University of Texas/Austin, TX 78712
Email: replace 'warthog' with 'clyde' | Homepage: quasar.as.utexas.edu
I report spammers to [EMAIL PROTECTED]
Finger for PGP Key: F7 11 FB 82 C6 21 D8 95  2E BD F7 6E 99 89 E1 82
Unlawful to use this email address for unsolicited ads: USC Title 47 Sec 227


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-22 Thread Donald Burrill

On Sat, 21 Oct 2000, Bill Jefferys wrote:

> At 12:56 PM -0500 10/20/00, dennis roberts wrote:
> >randomly independent events have the p value being the multiplication of
> >each event's p value ... so ... p for getting a head in a good coin 
> >is .5 ... 2 in a row = .25 ... etc.
> 
> This is wrong.  In general you cannot multiply the p-values from 
> independent events to obtain the p-value of the combined event.

Surely this depends on how you define "the combined event".  If "the 
combined event" is the intersection of two independent events, the 
probabilities do in general multiply, as Dennis asserts.  If some other 
definition is used (as in Bill's example below), then of course one 
cannot expect the multiplication rule to hold.
 
> Example: You toss 220 heads on 400 trials of a fair coin. The 
> two-tailed p-value for this event is almost exactly 0.05 [J.O. Berger 
> and M. Delampady, Statistical Science 2, 317-352 (1987)]. 

I.e., the probability of observing 200 heads or more, or 180 heads or 
fewer, in 400 trials is 0.05.

> Suppose you then independently toss 180 heads on an additional 400 
> trials.  Again, the two-tailed p-value is 0.05. 

Again, the probability of observing 180 heads or fewer, or 220 heads or 
more, in 400 trials is 0.05.   OK so far...

> However, the combined experiment is 400 heads on 800 trials, 

This however is not the _intersection_ of the two specified events. 

> for which the two-tailed p-value is 1.0, not 0.05^2. 

> Contrary to popular belief, observed p-values are not probabilities. 
> They cannot be probabilities because they do not obey the rules of the 
> probability calculus, as the example shows.  They are, well, p-values.

Sorry;  the example does not show that.  It shows only that if one uses 
"combined" (in the phrase "combined event", or equivalent) to mean 
something other than "intersection", the rules governing the behavior of 
intersections may not apply to the behavior of combined events.

The antecedent proposition therefore does not follow.
-- DFB.
 --
 Donald F. Burrill[EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 (603) 535-2597
 Department of Mathematics, Boston University[EMAIL PROTECTED]
 111 Cummington Street, room 261, Boston, MA 02215   (617) 353-5288
 184 Nashua Road, Bedford, NH 03110  (603) 471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-21 Thread dennis roberts

so, what does the multiplicative "law" in probability mean then? 

i was merely indicating ... since i have done this in classes ... that if
you show to students ... a sequence of (using a coin flip as the exemplar)
... of heads ... in a row ... when it appears that they came about due to
"random" flipping ... that when the probability of getting THAT particular
sequence ... by chance alone (given that they assume that it is a fair
coin) ... gets somewhere in the vicinity of .05 ... .01  approximately
... that students start perceiving that something is awry ...

are you saying that i have misrepresented the coin flipping example? take
one coin ... flip ... observe outcome ... flip same coin again ... observe
outcome .. etc? 

At 06:26 PM 10/21/00 -0500, Bill Jefferys wrote:
>At 12:56 PM -0500 10/20/00, dennis roberts wrote:
>>randomly independent events have the p value being the multiplication of
>>each event's p value ... so ... p for getting a head in a good coin 
>>is .5 ... 2 in a row = .25 ... etc.
>
>This is wrong. In general you cannot multiply the p-values from 
>independent events to obtain the p-value of the combined event.
==
dennis roberts, penn state university
educational psychology, 8148632401
http://roberts.ed.psu.edu/users/droberts/drober~1.htm


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-21 Thread Alan Mclean

Jerry Dallal wrote:

> I have a note from Frank Anscombe in my files.  It says, "Cardano.
> See the bit from "De Vita Propria" at the head of Chap. 6 of FN
> David's "Games, Gods, and Gambling (1962).  That shows that the idea
> of a test of significance, informally described, is very ancient."
> I don't have David's book with me, but I do recall that Cardano
> flourished around 1650.

We all do significance tests everyday! When someone tells you something,
you test its truth against your experience. 'You have to convince me of
the truth of that!'

Well, some of us believe anything we are told, maybe most of us believe
anything that is told to us by an 'authority'.  I guess the fundamental
feature of the scientific approach is to insist on evidence (not proof )
before we accept 'it' as (probably) valid.


--
Alan McLean (alan.buseco.monash.edu.au)
Department of Econometrics and Business Statistics
Monash University, Caulfield Campus, Melbourne
Tel:  +61 03 9903 2102Fax: +61 03 9903 2007




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-21 Thread Alan Mclean

Michael Granaas wrote:

> Someone, I think it was on this thread, mentioned Abelson's book
> "Statistics as Principled Argument".  In this book Abelson argues that
> individual studies simply provide pieces of evidence for or against a
> particular hypothesis.  It is the accumulation of the evidence that allows
> us to make a conclusion.  (My appologies to Abelson if I have
> misremembered his arguments.)

It is perfectly true that 'individual studies simply provide pieces of
evidence for or against a
particular hypothesis' - but it is equally true that multiple studies do the
same. Assuming the multiple studies show the same results, the evidence is of
course stronger - but it is still 'only' evidence.

One can legitimately draw a conclusion on one or several studies. One's
confidence (and the confidence of others!) in the conclusion depends on the
strength of the evidence. One well designed, well carried out study with clear
results provides strong evidence which may be enough to convince most people.
Several such studies which support each other provide even stronger evidence.
On the other hand, replications of poorly designed studies leading to unclear
results may give a little more evidence, but not enough to convince people.

In an individual study, the p-value(s) used is a measure of the strength of
the evidence provided by the study - BUT it is totally dependent on the
validity of the design of the study, the choice of variables, the selection of
the sample, the appropriateness of the models used to obtain the p-value. So
it is important, but certainly only one brick in the wall.

And of course treating 5% as some God-given rule of importance is ridiculous.
(It is nearly as bad as the N>30 'law' for treating a sample as 'large'.) But
it is a useful benchmark figure.

Regards,
Alan



--
Alan McLean (alan.buseco.monash.edu.au)
Department of Econometrics and Business Statistics
Monash University, Caulfield Campus, Melbourne
Tel:  +61 03 9903 2102Fax: +61 03 9903 2007




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-21 Thread Bill Jefferys

At 12:56 PM -0500 10/20/00, dennis roberts wrote:
>randomly independent events have the p value being the multiplication of
>each event's p value ... so ... p for getting a head in a good coin 
>is .5 ... 2 in a row = .25 ... etc.

This is wrong. In general you cannot multiply the p-values from 
independent events to obtain the p-value of the combined event.

Example: You toss 220 heads on 400 trials of a fair coin. The two-tailed 
p-value for this event is almost exactly 0.05 [J.O. Berger and M. 
Delampady, Statistical Science 2, 317-352 (1987)]. Suppose you then 
independently toss 180 heads on an additional 400 trials. Again, the 
two-tailed p-value is 0.05. However, the combined experiment is 400 
heads on 800 trials, for which the two-tailed p-value is 1.0, not 0.05^2.

Similar examples can be given for one-tailed p-values.

If you must use p-values and must combine them from independent 
experiments, you need to use the methods of meta-analysis. Not that I 
recommend using either p-values or meta-analysis (I don't).

Contrary to popular belief, observed p-values are not probabilities. 
They cannot be probabilities because they do not obey the rules of the 
probability calculus, as the example shows. They are, well, p-values.

That said, I wonder if you haven't confused p-values and probabilities. 
It is true that if you toss N heads in a row with a fair coin, the 
probability of that event is 0.5^N. It is also true that this 
probability happens to be numerically equal to the one-tailed p-value 
for tossing N heads in a row. So in this particular case it happens that 
the one-tailed p-value for the combined event is numerically equal to 
the product of the individual p-values. However, this has nothing to do 
with combining p-values. It is a consequence of the fortuitous numerical 
equality between the p-value and the probability in this special case,  
and the fact that independent probabilities do multiply to get the joint 
probability. Put another way, there is really no "tail" in this special 
case. The entire contribution to the p-value comes from the probability 
of obtaining the actually observed data, not from outcomes out in the 
tail that might have been observed but were not.

Bill

-- 
Bill Jefferys/Department of Astronomy/University of Texas/Austin, TX 78712
Email: replace 'warthog' with 'clyde' | Homepage: quasar.as.utexas.edu
I report spammers to [EMAIL PROTECTED]
Finger for PGP Key: F7 11 FB 82 C6 21 D8 95  2E BD F7 6E 99 89 E1 82
Unlawful to use this email address for unsolicited ads: USC Title 47 Sec 227


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-21 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
John W. Kulig <[EMAIL PROTECTED]> wrote:
>I have been searching for some "psychological" data on the .05 issue - I
>know it's out there but haven't found it yet. It went something like this:
>Claim to a friend that you have a fair coin. But the coin is not fair. Flip the
>coin (you get heads). Flip it again (heads again). Ask the friend if s/he wants
>to risk $100 (even odds) that the coin is not fair. At what point does the
>friend (who is otherwise ignorant of p issues) wager a bet that the coin is not
>fair? I have heard that after 5 or 6 heads the friend is pretty sure it's a bad
>coin - or at least a trick (at this point we cross .05 on the binomial chart)
>.05 may be rooted in our general judgment/perception heuristics -
>understandable in evolutionary terms if we examine the everyday situations we
>make these judgments in. Of course the relative risks of I versus II would
>matter (e.g. falsely accusing and starting a brawl vs. losing to a con artist).
>I will try to locate some research data on this  or I'll flip a few coins
>in my next statistically naive class.

Is the coin exactly fair?  Could it be exactly fair?
Remember that the coin is a physical object; is it even
possible that the probability of it coming up head by
tossing it in a particular manner is exactly .5?  Is
it even possible that the probabilities of heads in
different tosses is exactly the same?  Is it possible
that the tosses are exactly independent?

Even if the appropriate modifications are made, the bet
problem above calls for a Bayesian approach.  If the
leeway in "fair" is small enough (small relative to the
usual standard deviation), it is robust to treat it as a
point null.  In a sample that small, the entire prior
distribution comes in; with more data, only the
alternative prior density "at" the null, relative to the
prior probability of the null, is of much importance.

If the loss function is changed, the above needs to be
modified.  It is the integrated loss-weighted prior over
the null, and the density times the local loss function
under the alternative, which are important.

The subject of statistics is how people should behave when
facing decision problems under uncertainty, not how they do
behave.  Look at the utility chapter in Raiffa's book,
_Decision Analysis_, to see that people do not behave
consistently when offered composite bets with known
probabilities.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-20 Thread Eric Bohlman

dennis roberts <[EMAIL PROTECTED]> wrote:

[regarding the "point biserial correlation"]
> and it certainly has nothing to do with a "shortcut" formula for 
> calculating r ... it MAY have decades ago but  it has not for the past 
> 20 years ...

While I certainly agree that many textbooks convey the absolutely
misleading impression that the "PBC" is some special form of measure, I
think that the usual formula presented for it is pedagogically useful in a
few ways (not that the typical textbook makes use of them):

1) It demonstrates that a correlation problem in which one variable is
dichotomous is equivalent to a two-group mean-difference problem.

2) It shows that in such a case, the correlation coefficient is a function
of both a standard effect-size measure (Cohen's d) and the relative sizes
of the two groups.

2a) It demonstrates that variations in the relative sizes of the group
will result in variations in the magnitude of the correlation, even if the
effect size is held constant.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-20 Thread Jerry Dallal

Herman Rubin wrote:

> > As I recall, Lagrange (or was it Laplace?) computed the
> exact distribution of the sum of uniform random variables
> so he could use .05 level tests for a sample coming from
> a uniform distribution about 1795.  Physicists use 2 sigma,
> approximately .05.  I do not know of any consideration of
> what happened under the alternative until Neyman and Pearson.
> 
> Significance testing was widely used in the 19th century.
> Student pointed out that significance levels based on
> the normal distribution were wrong when estimated variances
> were used.  It was widely used before Fisher.

I have a note from Frank Anscombe in my files.  It says, "Cardano. 
See the bit from "De Vita Propria" at the head of Chap. 6 of FN
David's "Games, Gods, and Gambling (1962).  That shows that the idea
of a test of significance, informally described, is very ancient." 
I don't have David's book with me, bu t I do recall that Cardano
flourished around 1650. 

The earliest example in my files is in Arbuthnot's "An Argument for
Divine Providence..." from 1710.

According to Frank, Edgeworth formally defined the procedure in
1885, but did not give it the name "significance test". which seems
to have occurred later.  Edgeworth does use the phrase "significant
difference", though.  

I don't not find the phrase significance test or any of its close
relatives in Pearson 1900.

Some early instances of "significance" and "test" used together are
cited at http://members.aol.com/jeff570/s.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-20 Thread Christopher Tong

On 19 Oct 2000, Herman Rubin wrote:

> Physicists use 2 sigma, approximately .05.

Not in particle physics and astrophysics, where 5 sigma is
generally used.  Documention:

C. Seife, 2000:  "CERN's gamble shows perils, rewards of
playing odds".  Science, 289, 2260-2 (29 Sept. 2000).

Quoted in the article is John Bahcall, distinguished physicist at
Princeton:  "Half of all 3-sigma events are wrong".  The article
has numerous examples of even alleged 5-sigma events proving
to be spurious.  On the other hand, "Neutrino mass is taken
seriously even though it's not five sigma currently" according
to P. Igo-Kimenes of CERN.  This is because there are non-statistical
arguments (e.g., physics theories) that argue in favor of it.
Statistical significance alone is not (and shouldn't be) the sole
criteria for scientific conclusions.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-20 Thread Jerry Dallal

dennis roberts wrote:

> thus ... when we spend all this time on debating the usefulness or lack of
> usefulness of a p value ... whether it be the .05 level or ANY other ... we
> are totally ignoring the fact that this p value that is reported ... could
> have been the result of many factors having NOTHING to do with sampling
> error ... and nothing to do with the treatments ...

>From my class notes

100% of all disasters are failures of design, not analysis. 
-- Ron Marks, Toronto, August 16, 1994

To propose that poor design can be corrected by subtle analysis 
techniques is contrary to good scientific thinking.
-- Stuart Pocock (Controlled Clinical Trials, p 58) regarding the
use of retrospective adjustment for trials with historical controls.

Issues of design always trump issues of analysis.
-- GE Dallal, 1999, explaining why it would be wasted effort to
focus on the analysis of data from a study under challenge whose
design was fatally flawed.

Bias dominates variability.
-- John C. Bailler, III, Indianapolis, August 14, 2000


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-20 Thread dennis roberts

randomly independent events have the p value being the multiplication of 
each event's p value ... so ... p for getting a head in a good coin  is 
.5 ... 2 in a row = .25 ... etc.

here is a table up to 10 in a row of the same side

  Row  numheads pvalue

1 1   0.50
2 2   0.25
3 3   0.125000
4 4   0.062500
5 5   0.031250
6 6   0.015625
7 7   0.007813
8 8   0.003906
9 9   0.001953
   1010   0.000977

i have argued before ... that a value of .05 makes SOME sense IF we 
consider the observed data to be "derived" from some model ... like a 
sequence of randomly occurring independent events

if one were to flip a coin and then SHOW the result ... AND then ask Ss to 
give their perceptions about whether what they see could have occurred by 
chance ALONE ... what you will find is that IF you present 4 or 5 or 6 in a 
row all being the same ... these are the areas (increasingly so) where 
there becomes more and more SUSpicion ... that you would see this IF THE 
COIN IS GOOD ... OR THE COIN FLIPPER IS NOT CHEATING ...

thus, the nervousness starts to set in around the .05 .01 areas ... ie, the 
times where the probability of that happening ACCORDING TO THE MODEL ... 
(coin is good) ... STARTS GETTING RATHER REMOTE





At 12:01 PM 10/20/00 -0400, you wrote:
> I have been searching for some "psychological" data on the .05 issue - I
>know it's out there but haven't found it yet. It went something like this:
>Claim to a friend that you have a fair coin. But the coin is not fair. 
>Flip the
>coin (you get heads). Flip it again (heads again). Ask the friend if s/he 
>wants
>to risk $100 (even odds) that the coin is not fair. At what point does the
>friend (who is otherwise ignorant of p issues) wager a bet that the coin 
>is not
>fair? I have heard that after 5 or 6 heads the friend is pretty sure it's 
>a bad
>coin - or at least a trick (at this point we cross .05 on the binomial chart)
>.05 may be rooted in our general judgment/perception heuristics -
>understandable in evolutionary terms if we examine the everyday situations we
>make these judgments in. Of course the relative risks of I versus II would
>matter (e.g. falsely accusing and starting a brawl vs. losing to a con 
>artist).
>I will try to locate some research data on this  or I'll flip a few coins
>in my next statistically naive class.
>
>--
>---
>John W. Kulig[EMAIL PROTECTED]
>Department of Psychology http://oz.plymouth.edu/~kulig
>Plymouth State College   tel: (603) 535-2468
>Plymouth NH USA 03264fax: (603) 535-2412
>---
>"What a man often sees he does not wonder at, although he knows
>not why it happens; if something occurs which he has not seen before,
>he thinks it is a marvel" - Cicero.
>
>
>
>
>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
>=



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-20 Thread Michael Granaas

On Fri, 20 Oct 2000, David Hardman wrote:

> And it's almost too obvious to be worth stating, but let's 
> not forget the role of replication in science. You may get 
> a p value of p < .0001, but if no-one else can replicate it 
> then your result may well be a fluke. Of course, the 
> failures to replicate may not be so easy to publish...!

This is exactly the point that I was going to add to Dennis's comments,
guess David saved me the trouble.

Unfortunately, I think that replication is probably one of the most
overlooked issues in the discussion of hypothesis testing etc.  We
(frequently) teach, and certainly act, as if we can make a decision based
on the weight of a single research effort.  When we behave as if a
scientific knowledge can be arrived at through a single study it is no
wonder that we have so much trouble with p-values.  

In some disciplines we have a near absense of multi-experiment papers.
Admittedly publication pressures are a great problem here, but at least
some of these single study publications are fueled by the myth that you
can reach a scientific conclusion based on a single study.  

I don't know what to do about this, but I do know that I have changed my
teaching so as to encourage students to think about a study as being a
piece of the puzzle, not the solution to the entire puzzle.  Ultimately
new findings need to be replicated under a variety of circumstances to
validate any new knowledge.

Someone, I think it was on this thread, mentioned Abelson's book
"Statistics as Principled Argument".  In this book Abelson argues that
individual studies simply provide pieces of evidence for or against a
particular hypothesis.  It is the accumulation of the evidence that allows
us to make a conclusion.  (My appologies to Abelson if I have
misremembered his arguments.)

Michael

> 
> 
> Dr. David Hardman
> 
> "Rational - Devoid of all delusions save those 
> of observation, experience and reflection." 
> - Ambrose Bierce (The Devil's Dictionary)
> 
> Department of Psychology
> London Guildhall University
> Calcutta House
> Old Castle Street
> London E1 7NT
> 
> Phone:+44 020 73201256
> Fax:  +44 020 73201236
> E-mail:   [EMAIL PROTECTED]
> Internet: http://www.lgu.ac.uk/psychology/hardman.html
> 
> For information on the London Judgment and Decision Making Group
> visit: http://www.lgu.ac.uk/psychology/hardman/ljdm.html
> 
> For information on joining the 'Decision' mailbase list, see
> http://www.mailbase.ac.uk/lists/decision
> 
> 
> 
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
> 

***
Michael M. Granaas
Associate Professor[EMAIL PROTECTED]
Department of Psychology
University of South Dakota Phone: (605) 677-5295
Vermillion, SD  57069  FAX:   (605) 677-6604
***
All views expressed are those of the author and do not necessarily
reflect those of the University of South Dakota, or the South
Dakota Board of Regents.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-20 Thread John W. Kulig

I have been searching for some "psychological" data on the .05 issue - I
know it's out there but haven't found it yet. It went something like this:
Claim to a friend that you have a fair coin. But the coin is not fair. Flip the
coin (you get heads). Flip it again (heads again). Ask the friend if s/he wants
to risk $100 (even odds) that the coin is not fair. At what point does the
friend (who is otherwise ignorant of p issues) wager a bet that the coin is not
fair? I have heard that after 5 or 6 heads the friend is pretty sure it's a bad
coin - or at least a trick (at this point we cross .05 on the binomial chart)
.05 may be rooted in our general judgment/perception heuristics -
understandable in evolutionary terms if we examine the everyday situations we
make these judgments in. Of course the relative risks of I versus II would
matter (e.g. falsely accusing and starting a brawl vs. losing to a con artist).
I will try to locate some research data on this  or I'll flip a few coins
in my next statistically naive class.

--
---
John W. Kulig[EMAIL PROTECTED]
Department of Psychology http://oz.plymouth.edu/~kulig
Plymouth State College   tel: (603) 535-2468
Plymouth NH USA 03264fax: (603) 535-2412
---
"What a man often sees he does not wonder at, although he knows
not why it happens; if something occurs which he has not seen before,
he thinks it is a marvel" - Cicero.




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-20 Thread Robert J. MacG. Dawson



dennis roberts wrote:


> 4. are all the Ss in the study at the end ... compared to the beginning?


See today's "Wizard of Id"  [Friday Oct. 20]...


-Robert Dawson


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-20 Thread David Hardman

And it's almost too obvious to be worth stating, but let's 
not forget the role of replication in science. You may get 
a p value of p < .0001, but if no-one else can replicate it 
then your result may well be a fluke. Of course, the 
failures to replicate may not be so easy to publish...!


On Fri, 20 Oct 2000 10:56:03 -0400 dennis roberts 
<[EMAIL PROTECTED]> wrote:

> what is interesting to me in our discussions of p values ... .05 for 
> example is ... we have failed (generally that is) to put this one piece of 
> information in the context of the total environment of the investigation or 
> study ... we have blown totally out of proportion ... THIS one "fact" to 
> all the other components of the study ... which are FAR more important
> 
> 
> take for example a very simple experimental study where we are doing drug 
> trials ... and have assigned Ss to the experimental, placebo control, and 
> regular control conditions ... and then when the study is over ... we do 
> our ANOVA ... see the printed p value ... then make our decision about the 
> meaningfulness of the results of this study 
> 
> 1. does this p value truly represent what p should be IF the null is true 
> AND nothing but sampling error is producing the result seen?
> 2. have Ss really be assigned at random ... or treatments at random to Ss?
> 3. have the drug therapies been implemented consistently and accurately 
> throughout the study?
> 4. are all the Ss in the study at the end ... compared to the beginning?
> 5. was the dosage level (if that was the thing being examined) really the 
> right one to use?
> 6. if these were humans  are we totally sure that NO S had ANY contact 
> with ANY other S ... thus having the possible contamination effect across 
> experimental conditions?
> 7. have all the data been recorded correctly? if not, would there be ANY 
> way to know if a mistake had been made?
> 8. if humans were involved, and there was some element of self reporting 
> involved in the way the data were collected ... have Ss honestly and 
> accurately reported their data?
> 
> and on and on and on
> 
> 
> there are SO many factors that produce the results ... that we have no way 
> of knowing which of the above or any other ... might have influenced the 
> results ... BUT, the p value only applies IF we are assuming sampling error 
> is the only factor involved ...
> 
> thus ... when we spend all this time on debating the usefulness or lack of 
> usefulness of a p value ... whether it be the .05 level or ANY other ... we 
> are totally ignoring the fact that this p value that is reported ... could 
> have been the result of many factors having NOTHING to do with sampling 
> error ... and nothing to do with the treatments ...
> 
> our persistence on insisting on a p value like .05 as being either the 
> magical or agreed to cut point ... is SO FAR OVERSHADOWED by all these 
> other potential problems ... that it makes the interpretation and 
> DEPENDENCE ON ANY reported p value highly suspect ...
> 
> so here we are, arguing about .03 versus .06 ... when we should be arguing 
> about things like items 2 to 8 ... and then ONLY when we have been able to 
> account for and do away with all of those ... then we MIGHT have a look at 
> the p value and see what it is ...
> 
> but until we do, our essentially total fixation of p values is so highly 
> misplaced attention ... as to be almost downright laughable behavior
> 
> and this is what we are passing along to our students? and this is what we 
> are passing along to our peers via published documents?
> 
> 
> 
> 
> 
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =

Dr. David Hardman

"Rational - Devoid of all delusions save those 
of observation, experience and reflection." 
- Ambrose Bierce (The Devil's Dictionary)

Department of Psychology
London Guildhall University
Calcutta House
Old Castle Street
London E1 7NT

Phone:+44 020 73201256
Fax:  +44 020 73201236
E-mail:   [EMAIL PROTECTED]
Internet: http://www.lgu.ac.uk/psychology/hardman.html

For information on the London Judgment and Decision Making Group
visit: http://www.lgu.ac.uk/psychology/hardman/ljdm.html

For information on joining the 'Decision' mailbase list, see
http://www.mailbase.ac.uk/lists/decision



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-20 Thread dennis roberts

what is interesting to me in our discussions of p values ... .05 for 
example is ... we have failed (generally that is) to put this one piece of 
information in the context of the total environment of the investigation or 
study ... we have blown totally out of proportion ... THIS one "fact" to 
all the other components of the study ... which are FAR more important


take for example a very simple experimental study where we are doing drug 
trials ... and have assigned Ss to the experimental, placebo control, and 
regular control conditions ... and then when the study is over ... we do 
our ANOVA ... see the printed p value ... then make our decision about the 
meaningfulness of the results of this study 

1. does this p value truly represent what p should be IF the null is true 
AND nothing but sampling error is producing the result seen?
2. have Ss really be assigned at random ... or treatments at random to Ss?
3. have the drug therapies been implemented consistently and accurately 
throughout the study?
4. are all the Ss in the study at the end ... compared to the beginning?
5. was the dosage level (if that was the thing being examined) really the 
right one to use?
6. if these were humans  are we totally sure that NO S had ANY contact 
with ANY other S ... thus having the possible contamination effect across 
experimental conditions?
7. have all the data been recorded correctly? if not, would there be ANY 
way to know if a mistake had been made?
8. if humans were involved, and there was some element of self reporting 
involved in the way the data were collected ... have Ss honestly and 
accurately reported their data?

and on and on and on


there are SO many factors that produce the results ... that we have no way 
of knowing which of the above or any other ... might have influenced the 
results ... BUT, the p value only applies IF we are assuming sampling error 
is the only factor involved ...

thus ... when we spend all this time on debating the usefulness or lack of 
usefulness of a p value ... whether it be the .05 level or ANY other ... we 
are totally ignoring the fact that this p value that is reported ... could 
have been the result of many factors having NOTHING to do with sampling 
error ... and nothing to do with the treatments ...

our persistence on insisting on a p value like .05 as being either the 
magical or agreed to cut point ... is SO FAR OVERSHADOWED by all these 
other potential problems ... that it makes the interpretation and 
DEPENDENCE ON ANY reported p value highly suspect ...

so here we are, arguing about .03 versus .06 ... when we should be arguing 
about things like items 2 to 8 ... and then ONLY when we have been able to 
account for and do away with all of those ... then we MIGHT have a look at 
the p value and see what it is ...

but until we do, our essentially total fixation of p values is so highly 
misplaced attention ... as to be almost downright laughable behavior

and this is what we are passing along to our students? and this is what we 
are passing along to our peers via published documents?





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-20 Thread John Hendrickx

In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] 
says...
> 
> Actually, it often strikes me as curious that so many 
> people continue to report results as p < .05, when they 
> could in fact report the actual value.

Well, the exact value isn't really all that relevant, certainly if 
significance is smaller that .001 (who cares if it's .0009 or 9E-32?). 
Most researchers don't care about the exact p-value as long as it's less 
than .01 -- in that case the results are solid, give them two stars **. 
If the p-value is between .05 and .01, the results are significant, but 
keep an eye on them, there's a real chance these results are fluke. One 
star only *.

"A statistician is a person whose lifetime ambition is to be wrong 5% of 
the time". 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-19 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
Jerry Dallal  <[EMAIL PROTECTED]> wrote:
>"Karl L. Wuensch" wrote:

>> The origins of the silly .05 criterion of statistical significance are
>> discussed in the article:

>I disagree with the characterization.  If it were silly, it would
>not have persisted for 75 years and be so widely used today.  Anyone
>can introduce anything, but the persistence of an idea requires
>acceptance and agreement to continue using it by the intended
>audience.  The 0.05 level of signficance is a comfortable level at
>which to conduct scientific research and it does a good job of
>keeping the noise down (junk out of the scientific literature).

Keeping the noise down is the only justification I have
seen; there were major attempts to come up with any grounds
on which to justify .05 or any other p-value from
principles.

As to the persistence for more than 200 years (it is that old),
it was first believed to be a measure of the probability of
correctness of the null.  By the time those using it then, as
now, as religion realized this was not the case, they were too
indoctrinated to change.  It is not that unlike the case of
well-established beliefs; showing the error is not always enough.

>(I am *not* saying 0.05 *should* be used as an *absolute* cutoff. 
>I'm merely saying that if there's a right way to do things, an
>intelligent use of 0.05 seems like a good approximation to that
>right way for a wide range of important problems!)

Even those who try to intelligently consider p-values
recognize that more accurate information should result in
lower p-values; incorrect rejection needs to be balanced
against incorrect acceptance.  If .05 is the appropriate
value for one sample size, it is not for other sizes.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-19 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
Jerry Dallal  <[EMAIL PROTECTED]> wrote:
>David Hardman wrote:


>> I'm not a statistician so don't have a detailed knowledge
>> of the history of significance testing. However, I've found
>> quite useful a brief summary of the 'philosophies about p'
>> found in Wright, D.B. (1997), "Understanding statistics: An
>> introduction for the social sciences".
>>   Wright explains that although Fisher first suggested that
>> it would be useful to have some cutoff, and suggested 5%
>> for convenience, he never intended for this to be a fixed
>> standard. In his later writings he proposed that
>> researchers report their exact p-values and let the reader
>> judge their worth. Although this didn't used to be
>> possible, because tables only provided p-values for a few
>> critical values, the availability of computers means we can
>> now follow Fisher's advice.


>I'm preparing some notes for my students on "Why P=0.05?"
>I'll post them in the next few days (so I don't end up writing
>them twice and piecemeal, to boot!).


As I recall, Lagrange (or was it Laplace?) computed the
exact distribution of the sum of uniform random variables
so he could use .05 level tests for a sample coming from
a uniform distribution about 1795.  Physicists use 2 sigma,
approximately .05.  I do not know of any consideration of
what happened under the alternative until Neyman and Pearson.

Significance testing was widely used in the 19th century.
Student pointed out that significance levels based on
the normal distribution were wrong when estimated variances
were used.  It was widely used before Fisher.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-19 Thread Jerry Dallal

David Hardman wrote:
> 
> I'm not a statistician so don't have a detailed knowledge
> of the history of significance testing. However, I've found
> quite useful a brief summary of the 'philosophies about p'
> found in Wright, D.B. (1997), "Understanding statistics: An
> introduction for the social sciences".
>   Wright explains that although Fisher first suggested that
> it would be useful to have some cutoff, and suggested 5%
> for convenience, he never intended for this to be a fixed
> standard. In his later writings he proposed that
> researchers report their exact p-values and let the reader
> judge their worth. Although this didn't used to be
> possible, because tables only provided p-values for a few
> critical values, the availability of computers means we can
> now follow Fisher's advice.

I'm preparing some notes for my students on "Why P=0.05?"
I'll post them in the next few days (so I don't end up writing
them twice and piecemeal, to boot!).


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-19 Thread dennis roberts

At 04:57 PM 10/19/00 +0100, David Hardman wrote:


>Actually, it often strikes me as curious that so many
>people continue to report results as p < .05, when they
>could in fact report the actual value.

though true, and generally this is what we do ... that is, give the exact p 
values ... the reality remains that when one SUBMITS PAPERS ... reviewers 
and editors look at these exact p values and superimpose on them the cut 
points of .05 or something similar ... so, readers do NOT get the 
opportunity to judge for themselves ... that has been taken out of their 
hands by the editorial decision that is made (amongst other things of 
course too like a badly executed study)

in our current (and it has been this way for eons) review system, we should 
never UNderestimate the role these arbitrary cut points of .05 or whatever 
have ... on whether papers ever reach the wider audience of potentially 
interested readers



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-19 Thread David Hardman


I'm not a statistician so don't have a detailed knowledge 
of the history of significance testing. However, I've found 
quite useful a brief summary of the 'philosophies about p' 
found in Wright, D.B. (1997), "Understanding statistics: An 
introduction for the social sciences".
  Wright explains that although Fisher first suggested that 
it would be useful to have some cutoff, and suggested 5% 
for convenience, he never intended for this to be a fixed 
standard. In his later writings he proposed that 
researchers report their exact p-values and let the reader 
judge their worth. Although this didn't used to be 
possible, because tables only provided p-values for a few 
critical values, the availability of computers means we can 
now follow Fisher's advice.

Actually, it often strikes me as curious that so many 
people continue to report results as p < .05, when they 
could in fact report the actual value.


On Thu, 19 Oct 2000 13:52:10 GMT Jerry Dallal 
<[EMAIL PROTECTED]> wrote:

> "Karl L. Wuensch" wrote:
> > 
> > The origins of the silly .05 criterion of statistical significance are
> > discussed in the article:
> 
> I disagree with the characterization.  If it were silly, it would
> not have persisted for 75 years and be so widely used today.  Anyone
> can introduce anything, but the persistence of an idea requires
> acceptance and agreement to continue using it by the intended
> audience.  The 0.05 level of signficance is a comfortable level at
> which to conduct scientific research and it does a good job of
> keeping the noise down (junk out of the scientific literature).
> 
> (I am *not* saying 0.05 *should* be used as an *absolute* cutoff. 
> I'm merely saying that if there's a right way to do things, an
> intelligent use of 0.05 seems like a good approximation to that
> right way for a wide range of important problems!)
> 
> 
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =

Dr. David Hardman

"Rational - Devoid of all delusions save those 
of observation, experience and reflection." 
- Ambrose Bierce (The Devil's Dictionary)

Department of Psychology
London Guildhall University
Calcutta House
Old Castle Street
London E1 7NT

Phone:+44 020 73201256
Fax:  +44 020 73201236
E-mail:   [EMAIL PROTECTED]
Internet: http://www.lgu.ac.uk/psychology/hardman.html

For information on the London Judgment and Decision Making Group
visit: http://www.lgu.ac.uk/psychology/hardman/ljdm.html

For information on joining the 'Decision' mailbase list, see
http://www.mailbase.ac.uk/lists/decision



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-19 Thread dennis roberts

At 01:52 PM 10/19/00 +, Jerry Dallal wrote:
>"Karl L. Wuensch" wrote:
> >
> > The origins of the silly .05 criterion of statistical significance are
> > discussed in the article:
>
>I disagree with the characterization.  If it were silly, it would
>not have persisted for 75 years and be so widely used today.


jerry  do you really believe this? a "thing" can persist because it is 
the path of least resistance ... bearing no connection to reality or 
usefulness ... AND THAT IS WHAT HAS HAPPENED IN THIS CASE

take for example ... the continuation of a term like POINT BISERIAL (just 
as an example) in the area of correlation coefficients ... this term has 
persisted for decades ... and for what possible current useful purpose?
not only is there nothing "special" about this term ... which is how it is 
categorized in some books (still) today ... but it suggests that a person 
need to think about whether to use the pearson r or the point biserial 
formulas or procedures ... when encountering data where one variable is 
dichotomous (no matter how the variable came to be dichotomous) and the 
other is continuous ... when these are two different names for the same thing

to make matters even more confusing for users ... in hinkle, weirsma, and 
jurs ... 1998 ... there is a table on page 551 what shows variable X and Y 
... and the levels of measurement ... with the cross tab between nominal 
and interval/ratio being the point biserial ... and as far as i know ... 
since the formula has in it ... a p and q value which designates the p for 
getting one of the dichotomous values ... and q the other ... and and the 
dichotomous variable could clearly be continuous (graduate students = 1 and 
undergraduate students = 0) and just artificially made dichotomous for 
practicality  ... ... i don't see that this distinction is relevant at all 
... what we have is simply a different version of the pearson r formula ... 
when the data on the dichotomous variable can be "simplified" in the r 
formula ...

and it certainly has nothing to do with a "shortcut" formula for 
calculating r ... it MAY have decades ago but  it has not for the past 
20 years ...

i am not suggesting that the persistence of the use of the term point 
biserial is in the same "problematic" league as the persistence of the use 
of .05 ... but the point is that things can persist for NO good reason

finally, where did it become the case that .05 ... is that "comfortable" 
level where above it ... you are now in DIScomfort and at or below it ... 
you are COMfortable?

as was stated before ... it appears to be the persistence due to the fact 
that it was a handy TABLED value a long long time ago ... and tables have 
persisted even to this day (though they are not needed) ... and it is a far 
sight easier to reprint an EXISTING table than to manufacture a new one

.05 is a totally and irrevocably ARBITRARY VALUE ... there is no way to 
defend this nor ANY OTHER VALUE as somehow being THE cut point between 
comfort and agony ...







=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: .05 level of significance

2000-10-19 Thread Jerry Dallal

"Karl L. Wuensch" wrote:
> 
> The origins of the silly .05 criterion of statistical significance are
> discussed in the article:

I disagree with the characterization.  If it were silly, it would
not have persisted for 75 years and be so widely used today.  Anyone
can introduce anything, but the persistence of an idea requires
acceptance and agreement to continue using it by the intended
audience.  The 0.05 level of signficance is a comfortable level at
which to conduct scientific research and it does a good job of
keeping the noise down (junk out of the scientific literature).

(I am *not* saying 0.05 *should* be used as an *absolute* cutoff. 
I'm merely saying that if there's a right way to do things, an
intelligent use of 0.05 seems like a good approximation to that
right way for a wide range of important problems!)


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=