RE: .05 level of significance
Jerry Dallal posted an interesting web page about p-values: http://www.tufts.edu/~gdallal/pval.htm and I have a few comments about this page and the discussion about significance testing on edstat-l. First, it is pretty clear to all of us that the p-value does not answer any questions about practical significance, but you all might find this example amusing anyway. The British Medical Journal (BMJ) published two papers back to back on side effects of vaccination. One paper summarized the results using p-values, and the other using confidence intervals. So I took the opportunity to submit a letter to the editor via their web pages. You can read it at http://www.bmj.com/cgi/eletters/318/7192/1169 although it did not get published in the paper version of BMJ. I computed a confidence interval for the odds ratio of 1.06 that the one paper only reported the p-value (0.545). The interval was 0.81 to 1.37. and I argued that if we accepted the interval of 0.67 to 1.50 as a range of clinical indifference for the odds ratio, then the resulting confidence interval would give us some assurance that there was not a clinically important change in the odds of a side effect. But then I played some "what if" games. Let's suppose that the rate of the side effect was 20 times larger in both groups. This leads to a confidence interval of 1.02 to 1.10 and a p-value of 0.0049. Let's suppose that the rate of side effects was 20 times smaller in both groups. Then the confidence interval would be 0.46 to 2.4 and the p-value would be 0.90. The interesting thing here is that the case with the smallest p-value is the case where you have the most assurance that there is no clinically significant increase in the risk of side effects (since the upper confidence limit is only 1.10). The case where the p-value is the largest is the case where you have the least assurance that there is no clinically significant increase (since the upper confidence limit is 2.4). So you could argue that (at least in this case) the smaller the p-value the greater the evidence of a statistically significant finding and the lesser the evidence of a clinically important finding. This is without changing the sample size or the odds ratio. So, not only does the p-value not inform you about practical significance, it actually can be completely reversed from the way that people would be likely to interpret it. I had to put in some cautionary statements about how you use medical judgement to define the range of clinical indifference and that I was a statistician and not a doctor. But that does not detract from my general point about practical significance. The second comment is that the issue of practical versus statistical significance is not the only criticism of p-values. There is an important issue that Herman Rubin raises frequently about how you need to balance the risks of various decisions. I probably cannot explain it as well as he can, but you should probably demand a different level of proof depending on the nature of the disease and the severity of the proposed therapy. For example, you might demand a very high level of proof when examining a surgical intervention for a non-life threatening condition. On the other hand, there was a recent study that showed that you could decrease the risk of cataracts by wearing sunglasses. I would demand a lower level of proof for this type of research, because wearing sunglasses carries far fewer costs and risks than a typical surgery. Besides, I would look pretty cool in shades, don't you think? Finally, I would argue that combining a p-value with either an a priori power calculation or with a confidence interval (either one implying some discussion of what a range of clinical indifference might be) overcomes most of the objections to the use of p-values. In particular, you define the range of clinical indifference by balancing the severity of the disease against the cost and side effects of the therapy. Sad to say, very few reasearchers touch the issue of clinical indifference when they publish their findings. Others may disagree with my perspective, and I look forward to further discussion of this issue. Steve Simon, [EMAIL PROTECTED], Standard Disclaimer. STATS: STeve's Attempt to Teach Statistics. http://www.cmh.edu/stats = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
One of the original questions on this thread had to do with the origin of the ".05" cutoff. I suggested that if naive subjects were placed in a situation in which they had to detect whether a coin was fair or not, it would correspond closely to the commonly used .05 level. I just did it with 65 naive subjects (Intro Psych - mostly freshmen). Three were discarded for not following instructions or having unreadable answers. I flipped a double-headed coin 10 times, and subjects indicated where in the sequence of Heads they would challenge the fairness of the coin. The results are as follows - expressed as % and cumulative % of the 63 I scored. AFTER thisThis % of my number ofsample challenged Heads:fairness: Cumulative % 1 0 0 21.61 1.61 3 11.2912.90 4 22.5835.48 5 25.8161.29 6 24.1985.48 79.6895.16 83.2398.39 91.61 100.00 10 0100.00 - >From the binomial, 5 heads in 5 flips is .031, 6 heads in 6 flips is .016. So, a majority challenged after I got 5 heads in a row. I suggested that the .05 may be rooted in human cognitive heuristics that evolved to serve everyday decision making - such as catching cheaters (as opposed to formal statistical trainig). "Evolutionary psychologists" have marshalled quite a bit of evidence that many of our cognitive abilities (including deductive logic) did not evolve "context-free" but to meet the needs people in everyday decisions . It's a speculative, but not unreasonable, hypothesis. I learned a few things doing the demo. Because these were my students, they expressed great reluctance challenging the coin I suggested was fair. After I detected their reluctance I toned down the "challenge" language and simply asked them to indicate where in the sequence you'd suspect a non-fair coin. It would be fun (but alot of work) to have both heads and tails drawn - but in different proportions. There are a host of other contextual features (including the cost of making a Type I vs. Type II error) that should matter too. -- --- John W. Kulig[EMAIL PROTECTED] Department of Psychology http://oz.plymouth.edu/~kulig Plymouth State College tel: (603) 535-2468 Plymouth NH USA 03264fax: (603) 535-2412 --- "What a man often sees he does not wonder at, although he knows not why it happens; if something occurs which he has not seen before, he thinks it is a marvel" - Cicero. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
Petr Kuzmic wrote: > > Jerry Dallal wrote: > [...] > > http://www.tufts.edu/~gdallal/pval.htm > > http://www.tufts.edu/~gdallal/p05.htm > > Thanks for sharing these links. However, a lot of URSs on the "Little > Handbook of Statistical Practice" website > (http://www.tufts.edu/~gdallal/LHSP.HTM) have broken links to image > files. Ah, the joys of website authoring... [;)] > > Hope this helps, Thanks. All fixed. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
Jerry Dallal wrote: [...] > http://www.tufts.edu/~gdallal/pval.htm > http://www.tufts.edu/~gdallal/p05.htm Thanks for sharing these links. However, a lot of URSs on the "Little Handbook of Statistical Practice" website (http://www.tufts.edu/~gdallal/LHSP.HTM) have broken links to image files. Ah, the joys of website authoring... [;)] Hope this helps, - Petr Kuzmic _ P e t r K u z m i c, Ph.D. mailto:[EMAIL PROTECTED] BioKin Ltd. * Software and Consulting http://www.biokin.com = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
while this may be the case ... in general ... for some decisions we make ... we would not even allow this level of snickering to suggest to us that something is afoul ... whereas for others ... it would not bother us (or should not) if the chances were larger ... it all depends ... At 10:36 AM 10/23/00 -0400, David Evans wrote: >I remember seeing the same thing a year or so ago on this list. I tried >it for the first time this semester with my "refresher" course in >statistics for a class of incoming graduate students. I tossed a coin a >number of times and reported the result as "heads" each time >irrespective of the actual outcome. At the third call a slight snigger >went round the room, clearly emerging disbelief at the fourth and >outright disbelief at the fifth, corresponding to p values of 0.125, >0.0625, 0.03215 based on a hypothesis of a fair coin and a truthful >instructor. It appears, indeed, that 0.05 reasonably represents the >level at which human scepticism begins to emerge. > >David Evans >School of Marine Science >College of William & Mary >Gloucester Point, VA > >"John W. Kulig" wrote: > > > > I have been searching for some "psychological" data on the .05 > issue - I > > know it's out there but haven't found it yet. It went something like this: > > Claim to a friend that you have a fair coin. But the coin is not fair. > Flip the > > coin (you get heads). Flip it again (heads again). Ask the friend if > s/he wants > > to risk $100 (even odds) that the coin is not fair. At what point does the > > friend (who is otherwise ignorant of p issues) wager a bet that the > coin is not > > fair? I have heard that after 5 or 6 heads the friend is pretty sure > it's a bad > > coin - or at least a trick (at this point we cross .05 on the binomial > chart) > > .05 may be rooted in our general judgment/perception heuristics - > > understandable in evolutionary terms if we examine the everyday > situations we > > make these judgments in. Of course the relative risks of I versus II would > > matter (e.g. falsely accusing and starting a brawl vs. losing to a con > artist). > > I will try to locate some research data on this or I'll flip a few > coins > > in my next statistically naive class. > > > > -- > > --- > > John W. Kulig[EMAIL PROTECTED] > > Department of Psychology http://oz.plymouth.edu/~kulig > > Plymouth State College tel: (603) 535-2468 > > Plymouth NH USA 03264fax: (603) 535-2412 > > --- > > "What a man often sees he does not wonder at, although he knows > > not why it happens; if something occurs which he has not seen before, > > he thinks it is a marvel" - Cicero. > > > > = > > Instructions for joining and leaving this list and remarks about > > the problem of INAPPROPRIATE MESSAGES are available at > > http://jse.stat.ncsu.edu/ > > = > > >= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >= = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
I remember seeing the same thing a year or so ago on this list. I tried it for the first time this semester with my "refresher" course in statistics for a class of incoming graduate students. I tossed a coin a number of times and reported the result as "heads" each time irrespective of the actual outcome. At the third call a slight snigger went round the room, clearly emerging disbelief at the fourth and outright disbelief at the fifth, corresponding to p values of 0.125, 0.0625, 0.03215 based on a hypothesis of a fair coin and a truthful instructor. It appears, indeed, that 0.05 reasonably represents the level at which human scepticism begins to emerge. David Evans School of Marine Science College of William & Mary Gloucester Point, VA "John W. Kulig" wrote: > > I have been searching for some "psychological" data on the .05 issue - I > know it's out there but haven't found it yet. It went something like this: > Claim to a friend that you have a fair coin. But the coin is not fair. Flip the > coin (you get heads). Flip it again (heads again). Ask the friend if s/he wants > to risk $100 (even odds) that the coin is not fair. At what point does the > friend (who is otherwise ignorant of p issues) wager a bet that the coin is not > fair? I have heard that after 5 or 6 heads the friend is pretty sure it's a bad > coin - or at least a trick (at this point we cross .05 on the binomial chart) > .05 may be rooted in our general judgment/perception heuristics - > understandable in evolutionary terms if we examine the everyday situations we > make these judgments in. Of course the relative risks of I versus II would > matter (e.g. falsely accusing and starting a brawl vs. losing to a con artist). > I will try to locate some research data on this or I'll flip a few coins > in my next statistically naive class. > > -- > --- > John W. Kulig[EMAIL PROTECTED] > Department of Psychology http://oz.plymouth.edu/~kulig > Plymouth State College tel: (603) 535-2468 > Plymouth NH USA 03264fax: (603) 535-2412 > --- > "What a man often sees he does not wonder at, although he knows > not why it happens; if something occurs which he has not seen before, > he thinks it is a marvel" - Cicero. > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
I wrote: > I'm preparing some notes for my students on "Why P=0.05?" > I'll post them in the next few days (so I don't end up writing > them twice and piecemeal, to boot!). I'm writing these notes as I'm teaching, so they are necessarily a series of first drafts. I don't have time to polish them if I'm not to fall behind. Nevertheless, I consider them good enough to distribution for class discussion. Some of the references in "Why P=0.05" are incomplete. I'll get them on my next trip to the library unless someone knows of a detailed Fisher bibliography online. I wasn't able to locate one myself. http://www.tufts.edu/~gdallal/pval.htm http://www.tufts.edu/~gdallal/p05.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
Pardon my interference, but I think there's some confusion regarding the events here. When I toss the first round of coins, I get about 1/2 of them heads. No problema. Then, when I toss the second time, 1/2 of *those ones that fell heads* (1/4 of the total, .5*.5) have a chance to be heads again. and also, about .5 of those that fell tails before have a chance to fall heads too (1/4 of the total more). so, we now have the union of two intersections, 1/4 +1/4 Am I on the right track here? - Original Message - From: Bill Jefferys <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, October 22, 2000 12:19 PM Subject: Re: .05 level of significance > In article <[EMAIL PROTECTED]>, > [EMAIL PROTECTED] (Donald Burrill) wrote: > > #On Sat, 21 Oct 2000, Bill Jefferys wrote: > > #> However, the combined experiment is 400 heads on 800 trials, > # > #This however is not the _intersection_ of the two specified events. > > Sure it is. It's the event I get by first getting 220 heads on 400 > trials AND THEN tossing 180 heads on 400 trials. If I toss one head > (p=1/2) and then toss 1 tail (p=1/2) then the probability that I toss > one head and then toss 1 tail is (1/2*1/2=1/4). That is a correct use of > probability, and the intersection of the event of first tossing one head > and the event of second tossing 1 tail is indeed the event of tossing > one head followed by one tail. > > Similarly, the probability of first tossing 220 heads on 400 trials is > given by the binomial distribution 0.5^400*C^400_220. And the > probability of next tossing 180 heads on 400 trials is also given by the > binomial distribution 0.5^400C^400_180. The probability that I > accomplish both events in that order is the product of these two, is it > not? So how can you say that these are not independent events, and how > can you say that the intersection of the two is not as I say? > > It's true that the probability of tossing 400 heads on 800 trials in any > order is not this product, but that is irrelevant. > > Do you claim that there is any situation where it is correct to multiply > p-values? > > #> for which the two-tailed p-value is 1.0, not 0.05^2. > # > #> Contrary to popular belief, observed p-values are not probabilities. > #> They cannot be probabilities because they do not obey the rules of the > #> probability calculus, as the example shows. They are, well, p-values. > # > #Sorry; the example does not show that. It shows only that if one uses > #"combined" (in the phrase "combined event", or equivalent) to mean > #something other than "intersection", the rules governing the behavior of > #intersections may not apply to the behavior of combined events. > > Show me that it is in general correct to combine p-values by > multiplication and I might agree with you. > > Best wishes, Bill > > -- > Bill Jefferys/Department of Astronomy/University of Texas/Austin, TX 78712 > Email: replace 'warthog' with 'clyde' | Homepage: quasar.as.utexas.edu > I report spammers to [EMAIL PROTECTED] > Finger for PGP Key: F7 11 FB 82 C6 21 D8 95 2E BD F7 6E 99 89 E1 82 > Unlawful to use this email address for unsolicited ads: USC Title 47 Sec 227 > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (dennis roberts) wrote: #so, what does the multiplicative "law" in probability mean then? If A is an event and B is an event, then the probability of the event A&B is given by P(A&B)=P(A)P(B) in the case of independence (it is P(A)P(B|A)=P(B)P(A|B) if the events aren't independent). #i was merely indicating ... since i have done this in classes ... that if #you show to students ... a sequence of (using a coin flip as the exemplar) #... of heads ... in a row ... when it appears that they came about due to #"random" flipping ... that when the probability of getting THAT particular #sequence ... by chance alone (given that they assume that it is a fair #coin) ... gets somewhere in the vicinity of .05 ... .01 approximately #... that students start perceiving that something is awry ... # #are you saying that i have misrepresented the coin flipping example? take #one coin ... flip ... observe outcome ... flip same coin again ... observe #outcome .. etc? I'm not saying that your coin flip example is wrong, only that it has nothing to do with p-values, which are not probabilities. It's surely true that if you have a specific sequence of N heads and tails, under the assumption that the coin is fair, then the probability of obtaining that particular sequence is 0.5^N. Correct statement: If 0<=x<=1 then the probability that the p-value is <=x is x. Incorrect statement: The p-value that I observed is the probability of (something). Except for the trivial and irrelevant fact that a p-value is by definition between 0 and 1 and thus could be one of the x's in the above correct statement, but that's after the fact and only refers to _future_ trials, not to the trial that generated the observed p-value. Bill -- Bill Jefferys/Department of Astronomy/University of Texas/Austin, TX 78712 Email: replace 'warthog' with 'clyde' | Homepage: quasar.as.utexas.edu I report spammers to [EMAIL PROTECTED] Finger for PGP Key: F7 11 FB 82 C6 21 D8 95 2E BD F7 6E 99 89 E1 82 Unlawful to use this email address for unsolicited ads: USC Title 47 Sec 227 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Donald Burrill) wrote: #On Sat, 21 Oct 2000, Bill Jefferys wrote: #> However, the combined experiment is 400 heads on 800 trials, # #This however is not the _intersection_ of the two specified events. Sure it is. It's the event I get by first getting 220 heads on 400 trials AND THEN tossing 180 heads on 400 trials. If I toss one head (p=1/2) and then toss 1 tail (p=1/2) then the probability that I toss one head and then toss 1 tail is (1/2*1/2=1/4). That is a correct use of probability, and the intersection of the event of first tossing one head and the event of second tossing 1 tail is indeed the event of tossing one head followed by one tail. Similarly, the probability of first tossing 220 heads on 400 trials is given by the binomial distribution 0.5^400*C^400_220. And the probability of next tossing 180 heads on 400 trials is also given by the binomial distribution 0.5^400C^400_180. The probability that I accomplish both events in that order is the product of these two, is it not? So how can you say that these are not independent events, and how can you say that the intersection of the two is not as I say? It's true that the probability of tossing 400 heads on 800 trials in any order is not this product, but that is irrelevant. Do you claim that there is any situation where it is correct to multiply p-values? #> for which the two-tailed p-value is 1.0, not 0.05^2. # #> Contrary to popular belief, observed p-values are not probabilities. #> They cannot be probabilities because they do not obey the rules of the #> probability calculus, as the example shows. They are, well, p-values. # #Sorry; the example does not show that. It shows only that if one uses #"combined" (in the phrase "combined event", or equivalent) to mean #something other than "intersection", the rules governing the behavior of #intersections may not apply to the behavior of combined events. Show me that it is in general correct to combine p-values by multiplication and I might agree with you. Best wishes, Bill -- Bill Jefferys/Department of Astronomy/University of Texas/Austin, TX 78712 Email: replace 'warthog' with 'clyde' | Homepage: quasar.as.utexas.edu I report spammers to [EMAIL PROTECTED] Finger for PGP Key: F7 11 FB 82 C6 21 D8 95 2E BD F7 6E 99 89 E1 82 Unlawful to use this email address for unsolicited ads: USC Title 47 Sec 227 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
On Sat, 21 Oct 2000, Bill Jefferys wrote: > At 12:56 PM -0500 10/20/00, dennis roberts wrote: > >randomly independent events have the p value being the multiplication of > >each event's p value ... so ... p for getting a head in a good coin > >is .5 ... 2 in a row = .25 ... etc. > > This is wrong. In general you cannot multiply the p-values from > independent events to obtain the p-value of the combined event. Surely this depends on how you define "the combined event". If "the combined event" is the intersection of two independent events, the probabilities do in general multiply, as Dennis asserts. If some other definition is used (as in Bill's example below), then of course one cannot expect the multiplication rule to hold. > Example: You toss 220 heads on 400 trials of a fair coin. The > two-tailed p-value for this event is almost exactly 0.05 [J.O. Berger > and M. Delampady, Statistical Science 2, 317-352 (1987)]. I.e., the probability of observing 200 heads or more, or 180 heads or fewer, in 400 trials is 0.05. > Suppose you then independently toss 180 heads on an additional 400 > trials. Again, the two-tailed p-value is 0.05. Again, the probability of observing 180 heads or fewer, or 220 heads or more, in 400 trials is 0.05. OK so far... > However, the combined experiment is 400 heads on 800 trials, This however is not the _intersection_ of the two specified events. > for which the two-tailed p-value is 1.0, not 0.05^2. > Contrary to popular belief, observed p-values are not probabilities. > They cannot be probabilities because they do not obey the rules of the > probability calculus, as the example shows. They are, well, p-values. Sorry; the example does not show that. It shows only that if one uses "combined" (in the phrase "combined event", or equivalent) to mean something other than "intersection", the rules governing the behavior of intersections may not apply to the behavior of combined events. The antecedent proposition therefore does not follow. -- DFB. -- Donald F. Burrill[EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 (603) 535-2597 Department of Mathematics, Boston University[EMAIL PROTECTED] 111 Cummington Street, room 261, Boston, MA 02215 (617) 353-5288 184 Nashua Road, Bedford, NH 03110 (603) 471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
so, what does the multiplicative "law" in probability mean then? i was merely indicating ... since i have done this in classes ... that if you show to students ... a sequence of (using a coin flip as the exemplar) ... of heads ... in a row ... when it appears that they came about due to "random" flipping ... that when the probability of getting THAT particular sequence ... by chance alone (given that they assume that it is a fair coin) ... gets somewhere in the vicinity of .05 ... .01 approximately ... that students start perceiving that something is awry ... are you saying that i have misrepresented the coin flipping example? take one coin ... flip ... observe outcome ... flip same coin again ... observe outcome .. etc? At 06:26 PM 10/21/00 -0500, Bill Jefferys wrote: >At 12:56 PM -0500 10/20/00, dennis roberts wrote: >>randomly independent events have the p value being the multiplication of >>each event's p value ... so ... p for getting a head in a good coin >>is .5 ... 2 in a row = .25 ... etc. > >This is wrong. In general you cannot multiply the p-values from >independent events to obtain the p-value of the combined event. == dennis roberts, penn state university educational psychology, 8148632401 http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
Jerry Dallal wrote: > I have a note from Frank Anscombe in my files. It says, "Cardano. > See the bit from "De Vita Propria" at the head of Chap. 6 of FN > David's "Games, Gods, and Gambling (1962). That shows that the idea > of a test of significance, informally described, is very ancient." > I don't have David's book with me, but I do recall that Cardano > flourished around 1650. We all do significance tests everyday! When someone tells you something, you test its truth against your experience. 'You have to convince me of the truth of that!' Well, some of us believe anything we are told, maybe most of us believe anything that is told to us by an 'authority'. I guess the fundamental feature of the scientific approach is to insist on evidence (not proof ) before we accept 'it' as (probably) valid. -- Alan McLean (alan.buseco.monash.edu.au) Department of Econometrics and Business Statistics Monash University, Caulfield Campus, Melbourne Tel: +61 03 9903 2102Fax: +61 03 9903 2007 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
Michael Granaas wrote: > Someone, I think it was on this thread, mentioned Abelson's book > "Statistics as Principled Argument". In this book Abelson argues that > individual studies simply provide pieces of evidence for or against a > particular hypothesis. It is the accumulation of the evidence that allows > us to make a conclusion. (My appologies to Abelson if I have > misremembered his arguments.) It is perfectly true that 'individual studies simply provide pieces of evidence for or against a particular hypothesis' - but it is equally true that multiple studies do the same. Assuming the multiple studies show the same results, the evidence is of course stronger - but it is still 'only' evidence. One can legitimately draw a conclusion on one or several studies. One's confidence (and the confidence of others!) in the conclusion depends on the strength of the evidence. One well designed, well carried out study with clear results provides strong evidence which may be enough to convince most people. Several such studies which support each other provide even stronger evidence. On the other hand, replications of poorly designed studies leading to unclear results may give a little more evidence, but not enough to convince people. In an individual study, the p-value(s) used is a measure of the strength of the evidence provided by the study - BUT it is totally dependent on the validity of the design of the study, the choice of variables, the selection of the sample, the appropriateness of the models used to obtain the p-value. So it is important, but certainly only one brick in the wall. And of course treating 5% as some God-given rule of importance is ridiculous. (It is nearly as bad as the N>30 'law' for treating a sample as 'large'.) But it is a useful benchmark figure. Regards, Alan -- Alan McLean (alan.buseco.monash.edu.au) Department of Econometrics and Business Statistics Monash University, Caulfield Campus, Melbourne Tel: +61 03 9903 2102Fax: +61 03 9903 2007 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
At 12:56 PM -0500 10/20/00, dennis roberts wrote: >randomly independent events have the p value being the multiplication of >each event's p value ... so ... p for getting a head in a good coin >is .5 ... 2 in a row = .25 ... etc. This is wrong. In general you cannot multiply the p-values from independent events to obtain the p-value of the combined event. Example: You toss 220 heads on 400 trials of a fair coin. The two-tailed p-value for this event is almost exactly 0.05 [J.O. Berger and M. Delampady, Statistical Science 2, 317-352 (1987)]. Suppose you then independently toss 180 heads on an additional 400 trials. Again, the two-tailed p-value is 0.05. However, the combined experiment is 400 heads on 800 trials, for which the two-tailed p-value is 1.0, not 0.05^2. Similar examples can be given for one-tailed p-values. If you must use p-values and must combine them from independent experiments, you need to use the methods of meta-analysis. Not that I recommend using either p-values or meta-analysis (I don't). Contrary to popular belief, observed p-values are not probabilities. They cannot be probabilities because they do not obey the rules of the probability calculus, as the example shows. They are, well, p-values. That said, I wonder if you haven't confused p-values and probabilities. It is true that if you toss N heads in a row with a fair coin, the probability of that event is 0.5^N. It is also true that this probability happens to be numerically equal to the one-tailed p-value for tossing N heads in a row. So in this particular case it happens that the one-tailed p-value for the combined event is numerically equal to the product of the individual p-values. However, this has nothing to do with combining p-values. It is a consequence of the fortuitous numerical equality between the p-value and the probability in this special case, and the fact that independent probabilities do multiply to get the joint probability. Put another way, there is really no "tail" in this special case. The entire contribution to the p-value comes from the probability of obtaining the actually observed data, not from outcomes out in the tail that might have been observed but were not. Bill -- Bill Jefferys/Department of Astronomy/University of Texas/Austin, TX 78712 Email: replace 'warthog' with 'clyde' | Homepage: quasar.as.utexas.edu I report spammers to [EMAIL PROTECTED] Finger for PGP Key: F7 11 FB 82 C6 21 D8 95 2E BD F7 6E 99 89 E1 82 Unlawful to use this email address for unsolicited ads: USC Title 47 Sec 227 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
In article <[EMAIL PROTECTED]>, John W. Kulig <[EMAIL PROTECTED]> wrote: >I have been searching for some "psychological" data on the .05 issue - I >know it's out there but haven't found it yet. It went something like this: >Claim to a friend that you have a fair coin. But the coin is not fair. Flip the >coin (you get heads). Flip it again (heads again). Ask the friend if s/he wants >to risk $100 (even odds) that the coin is not fair. At what point does the >friend (who is otherwise ignorant of p issues) wager a bet that the coin is not >fair? I have heard that after 5 or 6 heads the friend is pretty sure it's a bad >coin - or at least a trick (at this point we cross .05 on the binomial chart) >.05 may be rooted in our general judgment/perception heuristics - >understandable in evolutionary terms if we examine the everyday situations we >make these judgments in. Of course the relative risks of I versus II would >matter (e.g. falsely accusing and starting a brawl vs. losing to a con artist). >I will try to locate some research data on this or I'll flip a few coins >in my next statistically naive class. Is the coin exactly fair? Could it be exactly fair? Remember that the coin is a physical object; is it even possible that the probability of it coming up head by tossing it in a particular manner is exactly .5? Is it even possible that the probabilities of heads in different tosses is exactly the same? Is it possible that the tosses are exactly independent? Even if the appropriate modifications are made, the bet problem above calls for a Bayesian approach. If the leeway in "fair" is small enough (small relative to the usual standard deviation), it is robust to treat it as a point null. In a sample that small, the entire prior distribution comes in; with more data, only the alternative prior density "at" the null, relative to the prior probability of the null, is of much importance. If the loss function is changed, the above needs to be modified. It is the integrated loss-weighted prior over the null, and the density times the local loss function under the alternative, which are important. The subject of statistics is how people should behave when facing decision problems under uncertainty, not how they do behave. Look at the utility chapter in Raiffa's book, _Decision Analysis_, to see that people do not behave consistently when offered composite bets with known probabilities. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
dennis roberts <[EMAIL PROTECTED]> wrote: [regarding the "point biserial correlation"] > and it certainly has nothing to do with a "shortcut" formula for > calculating r ... it MAY have decades ago but it has not for the past > 20 years ... While I certainly agree that many textbooks convey the absolutely misleading impression that the "PBC" is some special form of measure, I think that the usual formula presented for it is pedagogically useful in a few ways (not that the typical textbook makes use of them): 1) It demonstrates that a correlation problem in which one variable is dichotomous is equivalent to a two-group mean-difference problem. 2) It shows that in such a case, the correlation coefficient is a function of both a standard effect-size measure (Cohen's d) and the relative sizes of the two groups. 2a) It demonstrates that variations in the relative sizes of the group will result in variations in the magnitude of the correlation, even if the effect size is held constant. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
Herman Rubin wrote: > > As I recall, Lagrange (or was it Laplace?) computed the > exact distribution of the sum of uniform random variables > so he could use .05 level tests for a sample coming from > a uniform distribution about 1795. Physicists use 2 sigma, > approximately .05. I do not know of any consideration of > what happened under the alternative until Neyman and Pearson. > > Significance testing was widely used in the 19th century. > Student pointed out that significance levels based on > the normal distribution were wrong when estimated variances > were used. It was widely used before Fisher. I have a note from Frank Anscombe in my files. It says, "Cardano. See the bit from "De Vita Propria" at the head of Chap. 6 of FN David's "Games, Gods, and Gambling (1962). That shows that the idea of a test of significance, informally described, is very ancient." I don't have David's book with me, bu t I do recall that Cardano flourished around 1650. The earliest example in my files is in Arbuthnot's "An Argument for Divine Providence..." from 1710. According to Frank, Edgeworth formally defined the procedure in 1885, but did not give it the name "significance test". which seems to have occurred later. Edgeworth does use the phrase "significant difference", though. I don't not find the phrase significance test or any of its close relatives in Pearson 1900. Some early instances of "significance" and "test" used together are cited at http://members.aol.com/jeff570/s.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
On 19 Oct 2000, Herman Rubin wrote: > Physicists use 2 sigma, approximately .05. Not in particle physics and astrophysics, where 5 sigma is generally used. Documention: C. Seife, 2000: "CERN's gamble shows perils, rewards of playing odds". Science, 289, 2260-2 (29 Sept. 2000). Quoted in the article is John Bahcall, distinguished physicist at Princeton: "Half of all 3-sigma events are wrong". The article has numerous examples of even alleged 5-sigma events proving to be spurious. On the other hand, "Neutrino mass is taken seriously even though it's not five sigma currently" according to P. Igo-Kimenes of CERN. This is because there are non-statistical arguments (e.g., physics theories) that argue in favor of it. Statistical significance alone is not (and shouldn't be) the sole criteria for scientific conclusions. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
dennis roberts wrote: > thus ... when we spend all this time on debating the usefulness or lack of > usefulness of a p value ... whether it be the .05 level or ANY other ... we > are totally ignoring the fact that this p value that is reported ... could > have been the result of many factors having NOTHING to do with sampling > error ... and nothing to do with the treatments ... >From my class notes 100% of all disasters are failures of design, not analysis. -- Ron Marks, Toronto, August 16, 1994 To propose that poor design can be corrected by subtle analysis techniques is contrary to good scientific thinking. -- Stuart Pocock (Controlled Clinical Trials, p 58) regarding the use of retrospective adjustment for trials with historical controls. Issues of design always trump issues of analysis. -- GE Dallal, 1999, explaining why it would be wasted effort to focus on the analysis of data from a study under challenge whose design was fatally flawed. Bias dominates variability. -- John C. Bailler, III, Indianapolis, August 14, 2000 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
randomly independent events have the p value being the multiplication of each event's p value ... so ... p for getting a head in a good coin is .5 ... 2 in a row = .25 ... etc. here is a table up to 10 in a row of the same side Row numheads pvalue 1 1 0.50 2 2 0.25 3 3 0.125000 4 4 0.062500 5 5 0.031250 6 6 0.015625 7 7 0.007813 8 8 0.003906 9 9 0.001953 1010 0.000977 i have argued before ... that a value of .05 makes SOME sense IF we consider the observed data to be "derived" from some model ... like a sequence of randomly occurring independent events if one were to flip a coin and then SHOW the result ... AND then ask Ss to give their perceptions about whether what they see could have occurred by chance ALONE ... what you will find is that IF you present 4 or 5 or 6 in a row all being the same ... these are the areas (increasingly so) where there becomes more and more SUSpicion ... that you would see this IF THE COIN IS GOOD ... OR THE COIN FLIPPER IS NOT CHEATING ... thus, the nervousness starts to set in around the .05 .01 areas ... ie, the times where the probability of that happening ACCORDING TO THE MODEL ... (coin is good) ... STARTS GETTING RATHER REMOTE At 12:01 PM 10/20/00 -0400, you wrote: > I have been searching for some "psychological" data on the .05 issue - I >know it's out there but haven't found it yet. It went something like this: >Claim to a friend that you have a fair coin. But the coin is not fair. >Flip the >coin (you get heads). Flip it again (heads again). Ask the friend if s/he >wants >to risk $100 (even odds) that the coin is not fair. At what point does the >friend (who is otherwise ignorant of p issues) wager a bet that the coin >is not >fair? I have heard that after 5 or 6 heads the friend is pretty sure it's >a bad >coin - or at least a trick (at this point we cross .05 on the binomial chart) >.05 may be rooted in our general judgment/perception heuristics - >understandable in evolutionary terms if we examine the everyday situations we >make these judgments in. Of course the relative risks of I versus II would >matter (e.g. falsely accusing and starting a brawl vs. losing to a con >artist). >I will try to locate some research data on this or I'll flip a few coins >in my next statistically naive class. > >-- >--- >John W. Kulig[EMAIL PROTECTED] >Department of Psychology http://oz.plymouth.edu/~kulig >Plymouth State College tel: (603) 535-2468 >Plymouth NH USA 03264fax: (603) 535-2412 >--- >"What a man often sees he does not wonder at, although he knows >not why it happens; if something occurs which he has not seen before, >he thinks it is a marvel" - Cicero. > > > > >= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >= = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
On Fri, 20 Oct 2000, David Hardman wrote: > And it's almost too obvious to be worth stating, but let's > not forget the role of replication in science. You may get > a p value of p < .0001, but if no-one else can replicate it > then your result may well be a fluke. Of course, the > failures to replicate may not be so easy to publish...! This is exactly the point that I was going to add to Dennis's comments, guess David saved me the trouble. Unfortunately, I think that replication is probably one of the most overlooked issues in the discussion of hypothesis testing etc. We (frequently) teach, and certainly act, as if we can make a decision based on the weight of a single research effort. When we behave as if a scientific knowledge can be arrived at through a single study it is no wonder that we have so much trouble with p-values. In some disciplines we have a near absense of multi-experiment papers. Admittedly publication pressures are a great problem here, but at least some of these single study publications are fueled by the myth that you can reach a scientific conclusion based on a single study. I don't know what to do about this, but I do know that I have changed my teaching so as to encourage students to think about a study as being a piece of the puzzle, not the solution to the entire puzzle. Ultimately new findings need to be replicated under a variety of circumstances to validate any new knowledge. Someone, I think it was on this thread, mentioned Abelson's book "Statistics as Principled Argument". In this book Abelson argues that individual studies simply provide pieces of evidence for or against a particular hypothesis. It is the accumulation of the evidence that allows us to make a conclusion. (My appologies to Abelson if I have misremembered his arguments.) Michael > > > Dr. David Hardman > > "Rational - Devoid of all delusions save those > of observation, experience and reflection." > - Ambrose Bierce (The Devil's Dictionary) > > Department of Psychology > London Guildhall University > Calcutta House > Old Castle Street > London E1 7NT > > Phone:+44 020 73201256 > Fax: +44 020 73201236 > E-mail: [EMAIL PROTECTED] > Internet: http://www.lgu.ac.uk/psychology/hardman.html > > For information on the London Judgment and Decision Making Group > visit: http://www.lgu.ac.uk/psychology/hardman/ljdm.html > > For information on joining the 'Decision' mailbase list, see > http://www.mailbase.ac.uk/lists/decision > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = > *** Michael M. Granaas Associate Professor[EMAIL PROTECTED] Department of Psychology University of South Dakota Phone: (605) 677-5295 Vermillion, SD 57069 FAX: (605) 677-6604 *** All views expressed are those of the author and do not necessarily reflect those of the University of South Dakota, or the South Dakota Board of Regents. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
I have been searching for some "psychological" data on the .05 issue - I know it's out there but haven't found it yet. It went something like this: Claim to a friend that you have a fair coin. But the coin is not fair. Flip the coin (you get heads). Flip it again (heads again). Ask the friend if s/he wants to risk $100 (even odds) that the coin is not fair. At what point does the friend (who is otherwise ignorant of p issues) wager a bet that the coin is not fair? I have heard that after 5 or 6 heads the friend is pretty sure it's a bad coin - or at least a trick (at this point we cross .05 on the binomial chart) .05 may be rooted in our general judgment/perception heuristics - understandable in evolutionary terms if we examine the everyday situations we make these judgments in. Of course the relative risks of I versus II would matter (e.g. falsely accusing and starting a brawl vs. losing to a con artist). I will try to locate some research data on this or I'll flip a few coins in my next statistically naive class. -- --- John W. Kulig[EMAIL PROTECTED] Department of Psychology http://oz.plymouth.edu/~kulig Plymouth State College tel: (603) 535-2468 Plymouth NH USA 03264fax: (603) 535-2412 --- "What a man often sees he does not wonder at, although he knows not why it happens; if something occurs which he has not seen before, he thinks it is a marvel" - Cicero. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
dennis roberts wrote: > 4. are all the Ss in the study at the end ... compared to the beginning? See today's "Wizard of Id" [Friday Oct. 20]... -Robert Dawson = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
And it's almost too obvious to be worth stating, but let's not forget the role of replication in science. You may get a p value of p < .0001, but if no-one else can replicate it then your result may well be a fluke. Of course, the failures to replicate may not be so easy to publish...! On Fri, 20 Oct 2000 10:56:03 -0400 dennis roberts <[EMAIL PROTECTED]> wrote: > what is interesting to me in our discussions of p values ... .05 for > example is ... we have failed (generally that is) to put this one piece of > information in the context of the total environment of the investigation or > study ... we have blown totally out of proportion ... THIS one "fact" to > all the other components of the study ... which are FAR more important > > > take for example a very simple experimental study where we are doing drug > trials ... and have assigned Ss to the experimental, placebo control, and > regular control conditions ... and then when the study is over ... we do > our ANOVA ... see the printed p value ... then make our decision about the > meaningfulness of the results of this study > > 1. does this p value truly represent what p should be IF the null is true > AND nothing but sampling error is producing the result seen? > 2. have Ss really be assigned at random ... or treatments at random to Ss? > 3. have the drug therapies been implemented consistently and accurately > throughout the study? > 4. are all the Ss in the study at the end ... compared to the beginning? > 5. was the dosage level (if that was the thing being examined) really the > right one to use? > 6. if these were humans are we totally sure that NO S had ANY contact > with ANY other S ... thus having the possible contamination effect across > experimental conditions? > 7. have all the data been recorded correctly? if not, would there be ANY > way to know if a mistake had been made? > 8. if humans were involved, and there was some element of self reporting > involved in the way the data were collected ... have Ss honestly and > accurately reported their data? > > and on and on and on > > > there are SO many factors that produce the results ... that we have no way > of knowing which of the above or any other ... might have influenced the > results ... BUT, the p value only applies IF we are assuming sampling error > is the only factor involved ... > > thus ... when we spend all this time on debating the usefulness or lack of > usefulness of a p value ... whether it be the .05 level or ANY other ... we > are totally ignoring the fact that this p value that is reported ... could > have been the result of many factors having NOTHING to do with sampling > error ... and nothing to do with the treatments ... > > our persistence on insisting on a p value like .05 as being either the > magical or agreed to cut point ... is SO FAR OVERSHADOWED by all these > other potential problems ... that it makes the interpretation and > DEPENDENCE ON ANY reported p value highly suspect ... > > so here we are, arguing about .03 versus .06 ... when we should be arguing > about things like items 2 to 8 ... and then ONLY when we have been able to > account for and do away with all of those ... then we MIGHT have a look at > the p value and see what it is ... > > but until we do, our essentially total fixation of p values is so highly > misplaced attention ... as to be almost downright laughable behavior > > and this is what we are passing along to our students? and this is what we > are passing along to our peers via published documents? > > > > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = Dr. David Hardman "Rational - Devoid of all delusions save those of observation, experience and reflection." - Ambrose Bierce (The Devil's Dictionary) Department of Psychology London Guildhall University Calcutta House Old Castle Street London E1 7NT Phone:+44 020 73201256 Fax: +44 020 73201236 E-mail: [EMAIL PROTECTED] Internet: http://www.lgu.ac.uk/psychology/hardman.html For information on the London Judgment and Decision Making Group visit: http://www.lgu.ac.uk/psychology/hardman/ljdm.html For information on joining the 'Decision' mailbase list, see http://www.mailbase.ac.uk/lists/decision = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
what is interesting to me in our discussions of p values ... .05 for example is ... we have failed (generally that is) to put this one piece of information in the context of the total environment of the investigation or study ... we have blown totally out of proportion ... THIS one "fact" to all the other components of the study ... which are FAR more important take for example a very simple experimental study where we are doing drug trials ... and have assigned Ss to the experimental, placebo control, and regular control conditions ... and then when the study is over ... we do our ANOVA ... see the printed p value ... then make our decision about the meaningfulness of the results of this study 1. does this p value truly represent what p should be IF the null is true AND nothing but sampling error is producing the result seen? 2. have Ss really be assigned at random ... or treatments at random to Ss? 3. have the drug therapies been implemented consistently and accurately throughout the study? 4. are all the Ss in the study at the end ... compared to the beginning? 5. was the dosage level (if that was the thing being examined) really the right one to use? 6. if these were humans are we totally sure that NO S had ANY contact with ANY other S ... thus having the possible contamination effect across experimental conditions? 7. have all the data been recorded correctly? if not, would there be ANY way to know if a mistake had been made? 8. if humans were involved, and there was some element of self reporting involved in the way the data were collected ... have Ss honestly and accurately reported their data? and on and on and on there are SO many factors that produce the results ... that we have no way of knowing which of the above or any other ... might have influenced the results ... BUT, the p value only applies IF we are assuming sampling error is the only factor involved ... thus ... when we spend all this time on debating the usefulness or lack of usefulness of a p value ... whether it be the .05 level or ANY other ... we are totally ignoring the fact that this p value that is reported ... could have been the result of many factors having NOTHING to do with sampling error ... and nothing to do with the treatments ... our persistence on insisting on a p value like .05 as being either the magical or agreed to cut point ... is SO FAR OVERSHADOWED by all these other potential problems ... that it makes the interpretation and DEPENDENCE ON ANY reported p value highly suspect ... so here we are, arguing about .03 versus .06 ... when we should be arguing about things like items 2 to 8 ... and then ONLY when we have been able to account for and do away with all of those ... then we MIGHT have a look at the p value and see what it is ... but until we do, our essentially total fixation of p values is so highly misplaced attention ... as to be almost downright laughable behavior and this is what we are passing along to our students? and this is what we are passing along to our peers via published documents? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] says... > > Actually, it often strikes me as curious that so many > people continue to report results as p < .05, when they > could in fact report the actual value. Well, the exact value isn't really all that relevant, certainly if significance is smaller that .001 (who cares if it's .0009 or 9E-32?). Most researchers don't care about the exact p-value as long as it's less than .01 -- in that case the results are solid, give them two stars **. If the p-value is between .05 and .01, the results are significant, but keep an eye on them, there's a real chance these results are fluke. One star only *. "A statistician is a person whose lifetime ambition is to be wrong 5% of the time". = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
In article <[EMAIL PROTECTED]>, Jerry Dallal <[EMAIL PROTECTED]> wrote: >"Karl L. Wuensch" wrote: >> The origins of the silly .05 criterion of statistical significance are >> discussed in the article: >I disagree with the characterization. If it were silly, it would >not have persisted for 75 years and be so widely used today. Anyone >can introduce anything, but the persistence of an idea requires >acceptance and agreement to continue using it by the intended >audience. The 0.05 level of signficance is a comfortable level at >which to conduct scientific research and it does a good job of >keeping the noise down (junk out of the scientific literature). Keeping the noise down is the only justification I have seen; there were major attempts to come up with any grounds on which to justify .05 or any other p-value from principles. As to the persistence for more than 200 years (it is that old), it was first believed to be a measure of the probability of correctness of the null. By the time those using it then, as now, as religion realized this was not the case, they were too indoctrinated to change. It is not that unlike the case of well-established beliefs; showing the error is not always enough. >(I am *not* saying 0.05 *should* be used as an *absolute* cutoff. >I'm merely saying that if there's a right way to do things, an >intelligent use of 0.05 seems like a good approximation to that >right way for a wide range of important problems!) Even those who try to intelligently consider p-values recognize that more accurate information should result in lower p-values; incorrect rejection needs to be balanced against incorrect acceptance. If .05 is the appropriate value for one sample size, it is not for other sizes. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
In article <[EMAIL PROTECTED]>, Jerry Dallal <[EMAIL PROTECTED]> wrote: >David Hardman wrote: >> I'm not a statistician so don't have a detailed knowledge >> of the history of significance testing. However, I've found >> quite useful a brief summary of the 'philosophies about p' >> found in Wright, D.B. (1997), "Understanding statistics: An >> introduction for the social sciences". >> Wright explains that although Fisher first suggested that >> it would be useful to have some cutoff, and suggested 5% >> for convenience, he never intended for this to be a fixed >> standard. In his later writings he proposed that >> researchers report their exact p-values and let the reader >> judge their worth. Although this didn't used to be >> possible, because tables only provided p-values for a few >> critical values, the availability of computers means we can >> now follow Fisher's advice. >I'm preparing some notes for my students on "Why P=0.05?" >I'll post them in the next few days (so I don't end up writing >them twice and piecemeal, to boot!). As I recall, Lagrange (or was it Laplace?) computed the exact distribution of the sum of uniform random variables so he could use .05 level tests for a sample coming from a uniform distribution about 1795. Physicists use 2 sigma, approximately .05. I do not know of any consideration of what happened under the alternative until Neyman and Pearson. Significance testing was widely used in the 19th century. Student pointed out that significance levels based on the normal distribution were wrong when estimated variances were used. It was widely used before Fisher. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
David Hardman wrote: > > I'm not a statistician so don't have a detailed knowledge > of the history of significance testing. However, I've found > quite useful a brief summary of the 'philosophies about p' > found in Wright, D.B. (1997), "Understanding statistics: An > introduction for the social sciences". > Wright explains that although Fisher first suggested that > it would be useful to have some cutoff, and suggested 5% > for convenience, he never intended for this to be a fixed > standard. In his later writings he proposed that > researchers report their exact p-values and let the reader > judge their worth. Although this didn't used to be > possible, because tables only provided p-values for a few > critical values, the availability of computers means we can > now follow Fisher's advice. I'm preparing some notes for my students on "Why P=0.05?" I'll post them in the next few days (so I don't end up writing them twice and piecemeal, to boot!). = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
At 04:57 PM 10/19/00 +0100, David Hardman wrote: >Actually, it often strikes me as curious that so many >people continue to report results as p < .05, when they >could in fact report the actual value. though true, and generally this is what we do ... that is, give the exact p values ... the reality remains that when one SUBMITS PAPERS ... reviewers and editors look at these exact p values and superimpose on them the cut points of .05 or something similar ... so, readers do NOT get the opportunity to judge for themselves ... that has been taken out of their hands by the editorial decision that is made (amongst other things of course too like a badly executed study) in our current (and it has been this way for eons) review system, we should never UNderestimate the role these arbitrary cut points of .05 or whatever have ... on whether papers ever reach the wider audience of potentially interested readers = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
I'm not a statistician so don't have a detailed knowledge of the history of significance testing. However, I've found quite useful a brief summary of the 'philosophies about p' found in Wright, D.B. (1997), "Understanding statistics: An introduction for the social sciences". Wright explains that although Fisher first suggested that it would be useful to have some cutoff, and suggested 5% for convenience, he never intended for this to be a fixed standard. In his later writings he proposed that researchers report their exact p-values and let the reader judge their worth. Although this didn't used to be possible, because tables only provided p-values for a few critical values, the availability of computers means we can now follow Fisher's advice. Actually, it often strikes me as curious that so many people continue to report results as p < .05, when they could in fact report the actual value. On Thu, 19 Oct 2000 13:52:10 GMT Jerry Dallal <[EMAIL PROTECTED]> wrote: > "Karl L. Wuensch" wrote: > > > > The origins of the silly .05 criterion of statistical significance are > > discussed in the article: > > I disagree with the characterization. If it were silly, it would > not have persisted for 75 years and be so widely used today. Anyone > can introduce anything, but the persistence of an idea requires > acceptance and agreement to continue using it by the intended > audience. The 0.05 level of signficance is a comfortable level at > which to conduct scientific research and it does a good job of > keeping the noise down (junk out of the scientific literature). > > (I am *not* saying 0.05 *should* be used as an *absolute* cutoff. > I'm merely saying that if there's a right way to do things, an > intelligent use of 0.05 seems like a good approximation to that > right way for a wide range of important problems!) > > > = > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > = Dr. David Hardman "Rational - Devoid of all delusions save those of observation, experience and reflection." - Ambrose Bierce (The Devil's Dictionary) Department of Psychology London Guildhall University Calcutta House Old Castle Street London E1 7NT Phone:+44 020 73201256 Fax: +44 020 73201236 E-mail: [EMAIL PROTECTED] Internet: http://www.lgu.ac.uk/psychology/hardman.html For information on the London Judgment and Decision Making Group visit: http://www.lgu.ac.uk/psychology/hardman/ljdm.html For information on joining the 'Decision' mailbase list, see http://www.mailbase.ac.uk/lists/decision = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
At 01:52 PM 10/19/00 +, Jerry Dallal wrote: >"Karl L. Wuensch" wrote: > > > > The origins of the silly .05 criterion of statistical significance are > > discussed in the article: > >I disagree with the characterization. If it were silly, it would >not have persisted for 75 years and be so widely used today. jerry do you really believe this? a "thing" can persist because it is the path of least resistance ... bearing no connection to reality or usefulness ... AND THAT IS WHAT HAS HAPPENED IN THIS CASE take for example ... the continuation of a term like POINT BISERIAL (just as an example) in the area of correlation coefficients ... this term has persisted for decades ... and for what possible current useful purpose? not only is there nothing "special" about this term ... which is how it is categorized in some books (still) today ... but it suggests that a person need to think about whether to use the pearson r or the point biserial formulas or procedures ... when encountering data where one variable is dichotomous (no matter how the variable came to be dichotomous) and the other is continuous ... when these are two different names for the same thing to make matters even more confusing for users ... in hinkle, weirsma, and jurs ... 1998 ... there is a table on page 551 what shows variable X and Y ... and the levels of measurement ... with the cross tab between nominal and interval/ratio being the point biserial ... and as far as i know ... since the formula has in it ... a p and q value which designates the p for getting one of the dichotomous values ... and q the other ... and and the dichotomous variable could clearly be continuous (graduate students = 1 and undergraduate students = 0) and just artificially made dichotomous for practicality ... ... i don't see that this distinction is relevant at all ... what we have is simply a different version of the pearson r formula ... when the data on the dichotomous variable can be "simplified" in the r formula ... and it certainly has nothing to do with a "shortcut" formula for calculating r ... it MAY have decades ago but it has not for the past 20 years ... i am not suggesting that the persistence of the use of the term point biserial is in the same "problematic" league as the persistence of the use of .05 ... but the point is that things can persist for NO good reason finally, where did it become the case that .05 ... is that "comfortable" level where above it ... you are now in DIScomfort and at or below it ... you are COMfortable? as was stated before ... it appears to be the persistence due to the fact that it was a handy TABLED value a long long time ago ... and tables have persisted even to this day (though they are not needed) ... and it is a far sight easier to reprint an EXISTING table than to manufacture a new one .05 is a totally and irrevocably ARBITRARY VALUE ... there is no way to defend this nor ANY OTHER VALUE as somehow being THE cut point between comfort and agony ... = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: .05 level of significance
"Karl L. Wuensch" wrote: > > The origins of the silly .05 criterion of statistical significance are > discussed in the article: I disagree with the characterization. If it were silly, it would not have persisted for 75 years and be so widely used today. Anyone can introduce anything, but the persistence of an idea requires acceptance and agreement to continue using it by the intended audience. The 0.05 level of signficance is a comfortable level at which to conduct scientific research and it does a good job of keeping the noise down (junk out of the scientific literature). (I am *not* saying 0.05 *should* be used as an *absolute* cutoff. I'm merely saying that if there's a right way to do things, an intelligent use of 0.05 seems like a good approximation to that right way for a wide range of important problems!) = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =