(no subject)
unsubscribe edstat-l === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
Dennis Roberts asked, imagining a testing-free universe: what would the vast majority of folks who either do inferential work and/or teach it ... DO what analyses would they be doing? what would they be teaching? I wrote: * students would be told in their compulsory intro stats that "a posterior probability of 95% or greater is called "statistically significant", and we say 'we believe the hypothesis'. Anything less than that is called "not statistically significant", and we say 'we disbelieve the hypothesis'". and Herman Rubin responded: Why? What should be done is to use the risk of the procedure, not the posterior probability. The term "statistically significant" needs abandoning; it is whether the effect is important enough that it pays to take it into account. Dennis asked what _would_ happen, not what _should_. Most of the abuses we see around us are not the fault of hypothesis testing _per_se_, but of statistics users who believe: (a) that their discipline ought to be a science; (b) that statistics must be used to make this so; (c) and that it is unreasonable to expect them to _understand_ statistics just because of (a) and (b). Granted, if they did understand statistics, they would not test hypotheses nearly as often as they do. However, that said, I am not entirely persuaded that risk calculation is the whole story, either. In many pure research situations, "risk" is just not well defined. What is the risk involved in believing (say) that the universe is closed rather than open? Moreover, suppose we elected Herman to the post of Emperor of Inference, (with the power of the "Bars and the Axes"?) to enforce a risk-based approach to statistics (not that he'd take it, but bear with me...), would the situation realy improve? My own feeling is that, in many "soft" science papers of the sort where the research is not immediately applied to the real world, but may affect public policy and personal politics, a "risk" aproach would be disastrous. If the researcher had to assign "risks" to outcomes that were merely a matter of correct or incorrect belief, it would be all too tempting to assign a large risk to an outcome that "would set back the cause of X fifty years" and conversely a small risk to accepting a belief that might be considered "if not true, at least a useful myth." (Exercise: provide your own examples). Everything would be lowered to the level of Pascal's Wager - surely the canonical example of the limitations of a risk-based approach? One might argue that in such a situation the rare reader who intends to take action, and not the writer, should do the statistics. Unfortunately, in the real world, that won't wash. People want simple answers, and with the flood of information that we have to deal with in keeping up with the literature in any subject today, this is not entirely a foolish or lazy desire. It is considered the author's responsibility to reach a conclusion, not just to present a mass of undigested data for posterity to analyze. Thus, it would be unrealistic to expect any discipline, forced to use risk-based inference, to do other than have the author guess at risks (and work with those guesses) in situations where objective measurements of risk don't exist. -Robert Dawson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On Fri, 7 Apr 2000, Chris Mecklin wrote: Among other things My point is that I want to show my class an example where they can see the pitfalls of making a decision based solely on a p-value. I don't want My favorite, not contrived example, has to do with vocational advice and gender. It is well known that in high school boys do better on standardized measures of mathematics and girls do better on verbal measures. This lead to the "obvious" conclusion that girls should avoid anything mathy in their career choices while boys should avoid the humanities. But, it turns out that the effect sizes for these results are typically around d=0.1 with individual studies maxing out at about 0.2. (I can't lay my hands on my notes right now, but these convert to an R^2 that is fairlly small. Somewhere below .05.) The significant findings all have large sample sizes in common. Typically 1000+ students, so the results are all p .01. If you think about these results in terms of variance accounted for individual variability NOT associated with gender clearly overwhelms any gender effects by about 19:1. If you think about these in terms of overlapping distributions the top 48-49% of the girls are scoring in roughly the same range as the top 50% of the boys for math with a similar gender reversed result for verbal skills. In other words this real relation between gender and ability tests is an extremely poor substitute for individual information. Many girls have the ability to do well in mathy areas and many boys lack the ability. Michael them going "Ok, the p-value is .04 in this problem, so I don't reject, no wait, I reject, I think, Ok, yeah I reject, so whatever the treatment is must be good." ___ Christopher Mecklin Doctoral Student, Department of Applied Statistics University of Northern Colorado Greeley, CO 80631 (970) 304-1352 or (970) 351-1684 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === *** Michael M. Granaas Associate Professor[EMAIL PROTECTED] Department of Psychology University of South Dakota Phone: (605) 677-5295 Vermillion, SD 57069 FAX: (605) 677-6604 *** All views expressed are those of the author and do not necessarily reflect those of the University of South Dakota, or the South Dakota Board of Regents. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On 7 Apr 2000, dennis roberts wrote: i was not suggesting taking away from our arsenal of tricks ... but, since i was one of those old guys too ... i am wondering if we were mostly lead astray ...? the more i work with statistical methods, the less i see any meaningful (at the level of dominance that we see it) applications of hypothesis testing ... here is a typical problem ... and we teach students this! 1. we design a new treatment 2. we do an experiment 3. our null hypothesis is that both 'methods', new and old, produce the same results 4. we WANT to reject the null (especially if OUR method is better!) 5. we DO a two sample t test (our t was 2.98 with 60 df) and reject the null ... and in our favor! 6. what has this told us? if this is ALL you do ... what it has told you AT BEST is that ... the methods probably are not the same ... but, is that the question of interest to us? no ... the real question is: how much difference is there in the two methods? -- 8 --- In one of his papers, Bob Frick has argues very persuasively that very often (in experimental psychology, at least), this is NOT the real question at all. I think that is especially the case when you are testing theories. Suppose, for example that my theory of selective attention posits that inhibition of the internal representations of distracting items is an important mechanism of selection. This idea has been testing in so-called "negative priming" experiments. (Negative priming refers to the fact that subjects respond more slowly to an item that was previously ignored, or is semantically related to a previously ignored item, than they do to a novel item.) Negative priming is measured as a response time difference between 2 conditions in an experiment. The difference is typically between about 20 and 40 milliseconds. I think the important thing to remember about this is that the researcher is not trying to account for variability in response time per se, even though response time is the dependent variable: He or she is just using response time to indirectly measure the object of real interest. If one was trying to account for overall variability in response time, the conditions of this experiment would almost certainly not make the list of important variables. The researcher KNOWS that a lot of other things affect response time, and some of them a LOT more than his experimental conditions do. However, because one is interested in testing a theory of selective attention, this small difference between conditions is VERY important, provided it is statistically significant (and there is sufficient power); and measures of effect size are not all that relevant. Just my 2 cents. -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On Fri, 7 Apr 2000, dennis roberts wrote: At 04:00 PM 4/7/00 -0500, Michael Granaas wrote: But whatever form hypothesis testing takes it must first and formost be viewed in the context of the question being asked. this seems to be the key to REinventing ourselves ... make sure the focus is on the question ... AND, to REshape the question FROM what we traditionally do in hyp test ... If you look at Psychology you might well see two traditions, one in which the zero valued null is used in a rather automatic and mindless fashion and another in which researchers work very hard setting up experiments where rejection of the zero valued null does provide some information. set up the null, etc. etc to ... ask the question of real interest ... what effect DOES this new treatment have? what kind of correlation IS there between X and Y? In the second tradition I spoke of you find people asking exactly these types of questions once they have established that their experimental results are not due to chance. They use the hypothesis test as a step on the road to understanding, not as an end in and of itself. To me this second group acts more like model fitters (emphasis on prediction) than they do like hypothesis testers (emphasis on rejecting nil effects). Even though this second group rejects some nil valued hypothesis they, unlike the first, ask questions about things like effect size or functional form of an effect rather than simply declaring the effect is not zero and drawing some final conclusion. For myself I try to get students at all levels asking the types of questions that Dennis suggests as being obvious follow-ups to rejecting some nil hypothesis. I cannot claim a great deal of success, but I am trying. what IS the difference between the smartness of democrats and republicans? if you ask questions that way ... they do not naturally or sensibly lead to our testing the typical null hypotheses we set up Yes. There are a variety of answers to this problem, but, rejecting the no difference hypothesis when it is a priori false is not among them. Michael *** Michael M. Granaas Associate Professor[EMAIL PROTECTED] Department of Psychology University of South Dakota Phone: (605) 677-5295 Vermillion, SD 57069 FAX: (605) 677-6604 *** All views expressed are those of the author and do not necessarily reflect those of the University of South Dakota, or the South Dakota Board of Regents. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
Bruce Weaver wrote (in part): ...Negative priming is measured as a response time difference between 2 conditions in an experiment. The difference is typically between about 20 and 40 milliseconds... The researcher KNOWS that a lot of other things affect response time, and some of them a LOT more than his experimental conditions do. However, because one is interested in testing a theory of selective attention, this small difference between conditions is VERY important, provided it is statistically significant (and there is sufficient power); and measures of effect size are not all that relevant. Where the measure of effect size is relevant here is in answer to the question: Can we rule out all other plausible causes for what we observe? No experimental design is perfect, and in real life one may be forced to work with some that are very imperfect indeed. The experimenter may be able to eliminate some major confounding variables by careful design; and while there are always huge numbers of minor effects that *might* be confounded with what one wants to observe, it is true that most of them are small in size. If one can conclude that the effect size is on the order of 20ms, one can then ask oneself "is there anything else, not controlled for in the experiment, that could cause an effect of that magnitude?" and with luck and good management the answer would be "no". Whereas, if one just rejected the null hypothesis, the corresponding question would be "is there anything else, not controlled for in the experiment, that could cause an effect?" and the answer, if given honestly, would be "yes". In the case of negative priming, had the effect been of the order of 1ms (and the sample size correspondingly vast), I would conjecture that many other plausible causes (lengthened time between trials? more opportunity to become curious about the experiment?) for the difference could be dreamed up that would be difficult to eliminate. -Robert Dawson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
the term 'null' does NOT mean 0 (zero) ... though it is misconstrued that way the term 'null' means a hypothesis that is the straw dog case ... for which we are hoping that sample data will allow us to NULLIFY ... in some cases, the null happens to be 0 ... but in many cases, it does not cases in point: 1. null hypothesis is that the population variance for IQ is 225 2. null hypothesis is that the population mean for IQ is 100 3. to test the variance of a population ... the null is that the chi square value will be degrees of freedom and on and on and on At 10:04 AM 4/10/00 -0500, Michael Granaas wrote: On Fri, 7 Apr 2000, dennis roberts wrote: At 04:00 PM 4/7/00 -0500, Michael Granaas wrote: But whatever form hypothesis testing takes it must first and formost be viewed in the context of the question being asked. this seems to be the key to REinventing ourselves ... make sure the focus is on the question ... AND, to REshape the question FROM what we traditionally do in hyp test ... If you look at Psychology you might well see two traditions, one in which the zero valued null is used in a rather automatic and mindless fashion and another in which researchers work very hard setting up experiments where rejection of the zero valued null does provide some information. set up the null, etc. etc to ... ask the question of real interest ... what effect DOES this new treatment have? what kind of correlation IS there between X and Y? In the second tradition I spoke of you find people asking exactly these types of questions once they have established that their experimental results are not due to chance. They use the hypothesis test as a step on the road to understanding, not as an end in and of itself. To me this second group acts more like model fitters (emphasis on prediction) than they do like hypothesis testers (emphasis on rejecting nil effects). Even though this second group rejects some nil valued hypothesis they, unlike the first, ask questions about things like effect size or functional form of an effect rather than simply declaring the effect is not zero and drawing some final conclusion. For myself I try to get students at all levels asking the types of questions that Dennis suggests as being obvious follow-ups to rejecting some nil hypothesis. I cannot claim a great deal of success, but I am trying. what IS the difference between the smartness of democrats and republicans? if you ask questions that way ... they do not naturally or sensibly lead to our testing the typical null hypotheses we set up Yes. There are a variety of answers to this problem, but, rejecting the no difference hypothesis when it is a priori false is not among them. Michael *** Michael M. Granaas Associate Professor[EMAIL PROTECTED] Department of Psychology University of South Dakota Phone: (605) 677-5295 Vermillion, SD 57069 FAX: (605) 677-6604 *** All views expressed are those of the author and do not necessarily reflect those of the University of South Dakota, or the South Dakota Board of Regents. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
Dennis Roberts wrote: the term 'null' does NOT mean 0 (zero) ... though it is misconstrued that way the term 'null' means a hypothesis that is the straw dog case ... for which we are hoping that sample data will allow us to NULLIFY ... in some cases, the null happens to be 0 ... but in many cases, it does not It always means that _something_ is zero - as does just about any other algebraic or mathematical expression, after a little rearrangement into something logically equivalent . Moreover, in cases in which the null hypothesis has any prior credibility - as should always be the case - that "something" (eg, amount by which the IQ of the subject population differs from the standardized value of 100) is usually a sensible thing to study. And that thing can usually be thought of as "effect size". In the classic student blooper cases: "H0: mu = x bar" and "H0: mu equals the nearest round number to x bar" it isn't: and those tests should not be done. -Robert === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On Mon, 10 Apr 2000, Robert Dawson wrote: Dennis Roberts wrote: the term 'null' does NOT mean 0 (zero) ... though it is misconstrued that way the term 'null' means a hypothesis that is the straw dog case ... for which we are hoping that sample data will allow us to NULLIFY ... in some cases, the null happens to be 0 ... but in many cases, it does not It always means that _something_ is zero - as does just about any other algebraic or mathematical expression, after a little rearrangement into something logically equivalent . Moreover, in cases in which the null My grandmother could have told me that the mean height for men and women was not the same (zero difference). So based on prior evidence I hypothesize that the actual difference is 3 inches (mu1 - mu2 = 3) and use that for my null hypothesis. True, I can reduce this to a zero difference version by using (mu1 - mu2) - 3 = 0 but do I really want to? The problem is that Fisher ment "hypothesis to be nullified" and chose the term "null" which has a mathematical meaning of "zero". This might have been sensible in Fisher's applications where you wouldn't use a new fertilizer unless it was different from nothing or some other treatment. So null met both meanings. But, I would argue that the height difference hypothesis is better understood and more meaningful in its non-zero form. Perhaps we need to refer to this hypothesis as the "test hypothesis". hypothesis has any prior credibility - as should always be the case - that "something" (eg, amount by which the IQ of the subject population differs from the standardized value of 100) is usually a sensible thing to study. And that thing can usually be thought of as "effect size". In the classic student blooper cases: "H0: mu = x bar" and "H0: mu equals the nearest round number to x bar" it isn't: and those tests should not be done. -Robert *** Michael M. Granaas Associate Professor[EMAIL PROTECTED] Department of Psychology University of South Dakota Phone: (605) 677-5295 Vermillion, SD 57069 FAX: (605) 677-6604 *** All views expressed are those of the author and do not necessarily reflect those of the University of South Dakota, or the South Dakota Board of Regents. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
Dennis Roberts wrote: if you are interested in the relationship between heights and weights of people, in the larger population ... the notion that we test this against a null of rho=0 is not credible ... in fact, it is rather stupid ... a more sensible null would be perhaps a rho of .5 ... No if you have to start "a more sensible null would be perhaps" you almost surely do not have a hypothesis worth testing. Put it this way: If your possible outcomes are "The correlation is not zero" or "The correlation may be zero" these are both weak statements but you might not feel silly making either of them. (If you would, then use an interval estimate instead!) "The correlation is not 0.5" or "The correlation may be 0.5" both leave the listener wondering "why 0.5?" If the only answer is "well, it was a round number close enough to x bar [or "to my guesstimate before the experiment"] not to seem silly, but far enough away that I thought I could reject it." then the test is pointless. -Robert Dawson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
Michael Granaas wrote: My grandmother could have told me that the mean height for men and women was not the same (zero difference). So based on prior evidence I hypothesize that the actual difference is 3 inches (mu1 - mu2 = 3) and use that for my null hypothesis. True, I can reduce this to a zero difference version by using (mu1 - mu2) - 3 = 0 but do I really want to? if the 3" is credible enough to be worth testing, then yes, you do. Example: 3" is determined from historical large-sample measurements to be the difference for the population overall, and you want to determine whether there is a larger difference in heights within male and female members of a certain ethnic group; or you want to see if the height difference is decreasing over time; or whether it is larger for the armed forces. In these cases (mu1-mu2-3) represents "change in height difference explained by..." and it is indeed the effect size. However, if you are simply studying height differences, and do not have any real source for that 3" figure, you would not be justified in pulling it out of thin air just to permit you to do a hypothesis test. I would suggest, in fact, that a very good rule of thumb would be that if you _can't_ cast a null hypothesis meaningfully in the form "effectsize = 0" you should be very, very wary of doing the test at all. -Robert Dawson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: What to do about simple techniques
I am new to the list so I am jumping into the middle of this. However, we have to start teaching hypothesis testing somewhere. Even if it goes the way of the Edsel, it will be a slow death because many of us will continue to use when we feel it is appropriate to the question. However, I tell my students that there are always more complicated ways to do things. In many instances these take more math ability, computer skills, or time than I have to explain them. So I am going to show them a method that will work although it won't necessarily be the most powerful or efficient. The key is to first understand inference thoroughly before jumping into more complicated things. You don't teach a first grader multiplication before they understand addition and I don't teach a nursing student a logistic regression before they can understand a chi-square goodness of fit. If we had the ability to teach all the students what we thought they needed to know, we might do things a little differently. Someone mentioned Joe Ward. His colleague Earl Jennings once told me that the biggest impediment to understanding linear models was learning the t test, anova, and regression techniques separately. When I teach linear models I try to get the students to unlearn a lot of what they know. It seems a waste of time but we are not always in charge of the curriculum. When you are given 3 hours to teach a student something about statistics, do you start with linear models? Probably not. Well, I did not intend this to be quite so long so I'll shut up. At 06:05 PM 4/7/00 -0600, you wrote: Dear all, I am interested in what others are doing when faced with techniques that appear in standard textbooks that are "simpler" (either computationally and/or conceptually) than better (but more difficult) techniques. My concern is when the "superior" techniques is either inaccessible to the audience (for instance, a "stat" 1011 class) or would take considerably longer to teach (and the semester isn't long enough now) or requires use of the computer for almost any sample. Some examples of techniques that I see in lots of stat textbooks but would rarely be used by a statistician are: 1) chi-square goodness of fit to test for normality (when Shapiro-Wilk is much better for the univariate case and the Henze-Zirkler for the multivariate case); 2) paired sample t-tests (usually better options here such as ANCOVA); 3) sign test (randomization tests are much superior). I'm sure I left out/didn't think of plenty of other cases. My question to the group, as someone at the beginning a career teaching statistics, is what to do? Should some of these tests be left out (knowing the students may run into the tests in future course work or in some research? Should the better procedures always be taught, knowing that the additional difficulty due to level of mathematics/concepts/computational load may well lose many students? I don't know yet; What do you thing? ___ Christopher Mecklin Doctoral Student, Department of Applied Statistics University of Northern Colorado Greeley, CO 80631 (970) 304-1352 or (970) 351-1684 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === Paul R. Swank, PhD. Advanced Quantitative Methodologist UT-Houston School of Nursing Center for Nursing Research Phone (713)500-2031 Fax (713) 500-2033 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
At 01:16 PM 4/10/00 -0300, Robert Dawson wrote: No if you have to start "a more sensible null would be perhaps" you almost surely do not have a hypothesis worth testing. now we get to the crux of the matter ... WHY do we need a null ... or any hypothesis ... (credible and/or sensible) to test??? what is the scientific need for this? what is the rationale within statistical exploration for this? i am not suggesting that we don't need or must not deal with inferences from sample data to what parameters might be ... but, i fail to see WHY that necessarily means that one has to have a null hypothesis of any kind perhaps this is what needs to be debated more ... what function does having a hypothesis really have? if any ... it would be useful if we could have some short listing of reasons why and, some examples where WITHOUT a hypothesis, we are unable to make any scientific progress === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
At 01:16 PM 4/10/00 -0300, Robert Dawson wrote: both leave the listener wondering "why 0.5?" If the only answer is "well, it was a round number close enough to x bar [or "to my guesstimate before the experiment"] not to seem silly, but far enough away that I thought I could reject it." then the test is pointless. -Robert Dawson YOU HAVE made my case perfectly! ... this is why the notion of hypothesis testing is outmoded, no longer useful ... not worth the time we put into teaching it ... in the case above ... i would ask: what is the population rho value ... THAT is the important inferential issue ... there is no reason why we would have to say: i wonder if it is .5 ... let's TEST that, or ... i wonder if it is .7 ... let's TEST that ... we can simply ask the question and try to get an answer to that ... and there is no need to test a pre formulated null to get some sensible answer to the question no need for ANY null ... therefore no need for any hypothesis test if 0 is absurd ... and, if i hypothesized .5 and you ask why .5??? then we could have asked anywhere from 0 to .5 ... and they would have been just as non functional ... that's it ... hypothesis testing is non functional === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Sensible nulls
I, and I think Dennis, are arguing that when we test a hypothesis we should have a null hypothesis that is plausibly true. A hypothesis that reflects some sort of an effect size estimate where such an estimate is meaningful. If I understand correctly Robert is arguing that we should always phrase the null so that it becomes a hypothesis of no effect. In one case we can do that mathematically by rearranging a hypothesis of (mu1 - mu2 = 3) to the form ((mu1 - mu2) - 3 = 0). If this is what Robert means by saying that only no effect hypotheses are meaningful I think we are in partial agreement. I personally shudder to think of trying to teach the second form to my students. I think that they will have a much easier time understanding that I am predicting a difference between two groups of 3 units using the first. And they will have an easier time understanding any implications of rejecting/not rejecting a hypothesis in the first case. If Robert is saying it is not sensible to test (mu1 - mu2 = 3) under any circumstances I disagree. My reading of another message is that he thinks there should be some prior evidence for testing a hypothesis of 3 units of difference. If my reading here is correct I think that we may be differing in what we consider adequate prior evidence, but otherwise are close. I guess I don't wish to argue all three possibilities if only one of them is an actual point of disagreement. For reasons I am willing to develop fully later I think that specifying a plausibly true value for a null hypothesis (test hypothesis) is more valuable than a null hypothesis where the specified value is not plausibly true. In psychology, and I think education, we see the zero value specified when it is not even remotely plausible way too often. This plausibility judgement is informed by at least some prior evidence. By plausibly true I am willing to conceed some reasonable interval around the tested null value where the interval size is informed by content area knowledge. (I am willing to say that some small effects should be treated as if they were zero. I am willing to say that true values only slightly different from the hypothesized value should be treated as if they were the hypothesized value.) Michael *** Michael M. Granaas Associate Professor[EMAIL PROTECTED] Department of Psychology University of South Dakota Phone: (605) 677-5295 Vermillion, SD 57069 FAX: (605) 677-6604 *** All views expressed are those of the author and do not necessarily reflect those of the University of South Dakota, or the South Dakota Board of Regents. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Sensible nulls
On Mon, 10 Apr 2000, Robert Dawson wrote: Michael Granaas wrote: H0: being in the target population has no effect on sexual dimorphism in height Ha: being in the target population does affect sexual dimorphism in height I want to see if I am interpreting your meaning correctly. If some value such as "3" comes from some place sensible then your null here would represent the same idea that I have been expressing as (mu1 - mu2) - 3 = 0? Michael which gets to the real heart of the matter. Michael M. Granaas Associate Professor[EMAIL PROTECTED] Department of Psychology University of South Dakota Phone: (605) 677-5295 Vermillion, SD 57069 FAX: (605) 677-6604 *** All views expressed are those of the author and do not necessarily reflect those of the University of South Dakota, or the South Dakota Board of Regents. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On Mon, 10 Apr 2000, dennis roberts wrote in part: .. the fact that we create a null and test a null does NOT imply that we are therefore testing some effect size ... Of course not. One does not TEST an effect size, one ESTIMATES it. And it is useful to do so only if one has found it not equal to some value (possibly, but not necessarily, zero) that would imply the effect size to be uninteresting. (Sometimes it is convenient to do both of these things at once, as in constructing a confidence interval. But if the interval includes values that are, a priori, uninteresting, there is little utility to pursuing the current estimate of the effect size.) and, if we were interested in an effect size, then we don't have to test for it ... but we could ask the question: how large is it? that is NOT a test of a hypothesis we don't need ANY null to find answers to questions of import that we might have I beg to differ. Strenuously. The whole point of a point null hypothesis is to be able to specify a probability distribution against which one can assert, more or less credibly, that one's conclusion is supported with a suitably limited probability of error. (Error, that is, in drawing the conclusion.) For this reason Lumsden used to call this the "model-distributional" hypothesis, which had the virtue of describing its proper purpose moderately clearly, and had the obvious defect of being too much of a mouthful for us ordinary folks to use in conversation (or in classrooms, or in other contexts beginning with "c"). So long as the logical style of scientific argumentation is argument by elimination, one needs a set of propositions about the world [Note: not about the present sample, nor about statistics measured on the sample.] that are both exhaustive and mutually exclusive (viz., the null and alternative hypotheses), and a means of determining either that one proposition is false, or that one's decision that it is false (should one come to that decision) has a low probability of being wrong. THAT is what hypothesis testing is about; and it follows that sensible discussions of the form and/or values associated with a model-distributional hypothesis cannot take place in the absence of the alternative hypothesis(-es) that are to be considered simultaneously. -- Don. Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 603-535-2597 184 Nashua Road, Bedford, NH 03110 603-471-7128 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: 3-D regression planes graph,
The free ARC software from the University of Minnesota will do some of this. Look at http://stat.umn.edu/ARCHIVES/archives.html Jon Cryer At 01:59 PM 4/10/00 -0500, you wrote: Hello all, I'm looking for software that can display a 3-D regression environment (x, y, and z variables) and draw a regression plane for each of two subgroups. So far, Minitab does a good job of the 3-D scatterplots (regular, wireframe, and surface (plane) plots), but there's no option (as in the regular scatterplots) to either code data points according to categorical variables or to overlay two graphs on the same set of axes. I'm saving the data in both Minitab and SPSS files, and I can easily convert to Excel (as a standard go-between spreadsheet file). Any help will be greatly appreciated. The effect in my research that I'm finding so far is that my two groups look similar in univariate and bivariate settings, but the trivariate regression planes are different. I know I could do what I needed to with regression equations (and will do so), but I'd l-o-v-e to have some graphs to go with it. SPSS will be fine for the actual regression equations-- it can deal with subgroups like that. Thank you very much in advance, Cherilyn Young === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
My comments are about half-way down. Michael On Mon, 10 Apr 2000, Robert Dawson wrote: Dennis Roberts wrote: now we get to the crux of the matter ... WHY do we need a null ... or any hypothesis ... (credible and/or sensible) to test??? what is the scientific need for this? what is the rationale within statistical exploration for this? My understanding, not perhaps parallelling the historical development very closely, is that the answer is something like this. I'm sure somebody will correct me... (0) People want to be able to make qualitative statements: "Manure makes the roses grow." "Electric shocks make mice do what you want them to." "If I buy kippers it will not rain." (1) In an attempt to be more scientific, instead of making absolute statements, people decide to use the idea of probability. They would _like_ to be able to say "There is a 99% probability that if you put manure on roses they will grow better." However, that does not fit in with traditional frequency-based probability, which only officially assigns probabilities to events which are "random", a phrase usually undefined or defined only inductively "You know, dice, urns, all that stuff." Roses are not random because you do not get to bet on them at Las Vegas. (Horses are dubious. Few people recommend using the outcome of the 2:30 to assign mice to treatment groups. ) (2)If there is going to be a probability involved, then, it has to involve the sampling technique, as that is the only place where the experimenter can introduce (or pretend to) an urn or pair of dice. (3)Even given randomization via sampling, we need to know how things really are to compute a probability. If we *did* know this we wouldn't be doing statistics. But we can make a "conditional" statement that _if_ something were true then the probability of observing something would be (4) In order to avoid circular logic, we *cannot* assume what we want to prove, in order to compute the probability. We can however assume it for a contradiction. Therefore: This (point 4) is certainly what we have been lead to believe, but I question the assumption. Do we not in fact teach that we are to act as if the null is true until we can demonstrate otherwise? I'm not sure where this assumption came from (Fisher, someone's interpretation of Fisher, someone other than Fischer) but there is no logical problem with assuming something that might plausibly be true and using it as our null. Isn't that what we do in our experiments all the time? We assume that our experimental manipulation has no effect, which is plausibly true at least for some time, and then we try to disprove that estimate of the effect. Failing to do so we act as if the effect were absent (or so small as to be absent for all practical purposes). In many cases, however, we are only interested in the presense/absence of an effect and this is plenty good enough. But, sometimes we want to estimate, model, an effect. In this case we want a parameter estimate that is reasonbly close to the population value for that effect. In this case we might prefer confidence intervals or some such, but we could certainly adopt our best estimate of the parameter and try to disprove it by using it as the value tested in the null hypothesis. There is no logical problem with adopting a plausibly true value for the null and accepting it if it survives efforts to discredit it. That is there is no logical problem with using a prediction approach. (5) There is some set of observations that will lead us to declare that that contradiction is reached, and others that won't. Hence the rejection region. (6) The only definite outcome is rejecting the hypothesis, the only situation in which we can compute the probability is when the hypothesis is true. Hence alpha. Yes. But, we can assume a genuinely true hypothesis as well as one that is in all likelihood false. That does not pose any problem for the computation of alpha levels. The problem is that in too many cases our predictions of what is true are too weak to allow point/narrow interval predictions of what is true. We can only predict 0. In this case it seems that we are stuck with rejecting a zero valued null in the correct direction as the strongest form of theory confirmation that we have. Robert is correct, predicting any particular value in this situation is arbitrary and pointless. But, if the theory is strong enough to make narrow predictions those should be used as the null and either disproved (rejected) or corroborated (accepted). (7) Back at the beginning we wanted a yes-or-no answer. Henced fixed alpha testing and the pretence that we "accept" null hypotheses. If the null is plausibly true we need no pretense. We accept the null as true until something better comes along. I personally have accepted the notion that psi powers do not exist despite the fact
scientific method
here are a few (fastly found i admit) urls about scientific method ... some are quite interesting http://dharma-haven.org/science/myth-of-scientific-method.htm#Overview http://teacher.nsrl.rochester.edu/phy_labs/AppendixE/AppendixE.html http://idt.net/~nelsonb/bridgman.html http://www.brint.com/papers/science.htm http://koning.ecsu.ctstateu.edu/Plants_Human/scimeth.html http://ldolphin.org/SciMeth2.html http://www.wsu.edu:8080/~meinert/SH.html http://www.phys.tcu.edu/~dingram/edu/pine.html now, i know there are tons more ... and, i offer no guarantees about the above ... === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
Just because Dennis has trouble with the null hypothesis, that does not mean that it is a bad idea to use them. On 10 Apr 2000 08:41:06 -0700, [EMAIL PROTECTED] (dennis roberts) wrote: the term 'null' does NOT mean 0 (zero) ... though it is misconstrued that way the term 'null' means a hypothesis that is the straw dog case ... for which we are hoping that sample data will allow us to NULLIFY ... - this seemed okay in the first sentence. However, I think that "straw dog case" is what I would call "straw man argument" and that is *not* the quality of argument of the null.The point-null is always false, but we state the null so that it is "reasonable" to accept it, or to require data in order to reject it. in some cases, the null happens to be 0 ... but in many cases, it does not cases in point: 1. null hypothesis is that the population variance for IQ is 225 snip, similar stuff, and other stuff -- I think I reject the null when it is stated, " ... is 225". Sure, the point-null is false. But state it, "The difference between the Variance and 225 is zero" -- and then, you require data to show that there is evidence, that difference should be accepted as other than 0. We either *accept* that the difference is (may be) zero, or we *reject* and have some other difference. We do not *conclude* that the difference is zero. There have been posts in the last two weeks on sci.stat.consult concerning the testing of bioequivalence -- and that is a case where the null is more complicated. Generally, the ALTERNATIVE is that the new drug falls between 80% and 125% of the potency of the old drug. (Those are the limits that the FDA cares about.) The null is that the new is Greater than 125% or Less than 80% of the old. If we have rotten evidence, with bad means, or huge standard deviations, then we have to accept the null; the new may be unlike the old. That is not a weak "strawman" - that is a reasonable, default alternative. The standard testing is called Two One Sided Tests, to show that the amount is definitely less that the Upper limit, and definitely greater than the lower. Basically, you need to construct a Confidence interval on the difference and have it fall completely within the limits. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
the logic behind the null hypothesis method is flawed ... IF you are looking for truth AND you keep following the logic of testing AGAINST a null ... first, say you reject the null of rho = 0 ... then, LOGICALLY ... this says that since we don't know what truth is ... just what we think it isn't ... we go second, make the null as rho = .05 ... then .1, then .15 ... on and on UNTIL we reach that magical spot (if ever) ... when we had the null of rho = .65 ... and we suddenly RETAINED the null! i guess we know what the truth is now, or, do we? At 05:20 PM 4/10/00 -0400, Rich Ulrich wrote: Just because Dennis has trouble with the null hypothesis, that does not mean that it is a bad idea to use them. maybe not ... but i don't see that many if any reasons why and the discussions are not swaying me ... (of course, that is not the posters faults ... maybe just mine) === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
- Original Message - From: Michael Granaas [EMAIL PROTECTED] Our current verbal lables leave much to be desired. Depending on who you ask the "null hypothesis" is a) a hypothesis of no effect (nil hypothesis) b) an a priori false hypothesis to be rejected (straw dog hypothesis) c) an a priori plausible hypothesis to be tested and falsified or corroborated (wish I had a term for this usage/real null?)-- The concept of a hypothesis is important. It can be used to teach an important statistical concept. Let us supose there are many plausible hypotheses. These include the "nil hypothesis" any priori hypotheses any idea at all that may be considered. Refer to these in terms of set of all plausible hypothesis (including that of no effect) that are to be tested. The process is to pick each hypothesis and test it. The outcome of the test is not only a probability, but a reality check (the investigators belief system). THE OUTCOME CAN ONLY BE BINARY, REJECTION OR NON-REJECTION. Non-rejection is not acceptance. It just means that under non-rejection, the hypothesis is in the set of all hypotheses that were not rejected. The process does not pick out the true hypothesis, it never can do that. It can only reject those hypothesis that have little chance of fitting the data. You can ignor them then. You have to use other techniques to pick the acceptable hypothesis out of all those in the 'not rejected" set. Any "verbal or mathematical summary" is acceptable (that is in the set of non-rejected hypothesis) (Pearson 1892, p22). As R.A. Fisher said (re. a level of 0.05 level of significance in testing a hypotheses) "does not mean that he allows himself to be deceived once in twenty experiments. The test of significance only tells him what to ignor, namely all experiments in which significant test results are nto obtained" (Fisher 1929b, p 191). Fisher also said "a test of significance contains no criteria for 'accepting' a hypothesis' (Fisher 1937, p 45). DAHeiser
cluster analysis
Can anyone help with good resources on the web, journals, books, etc on cluster analysis - simularity and ordination. Any recommended programs for this type of analysis too. Cheers Elisa Wood === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===