Forecasting Software
I will be teaching Time Series and Forecasting (an MBA course) in the Fall. I am looking for an inexpensive software package that is good for forecasting. Last year I used Minitab 12 and found it easy-to-use and accessible to students. It is available on our network with a site license so we will use it again this year. In addition I would like to find a package that is available to students at a reasonable price that includes some more advanced features, particularly the AIC (Akaike's Information Criterion). Any ideas? Brian ___ Brian E. Smith TEL: 514-398-4038 (Work) McGill University FAX: 514-398-3876 (Work) 1001 Sherbrooke St. WestFAX: 514-482-1639 (Home) Montreal, QC, Canada H3A 1G5EMAIL: [EMAIL PROTECTED] Url: http://www.management.mcgill.ca/homepage/profs/smithb ___ No human investigation can be called real science if it cannot be demonstrated mathematically. Leonardo da Vinci ___ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
> Except for posterior probability, none of these are tools > for the actual problems. And posterior probability is not > what is wanted; it is the posterior risk of the procedure. > > But even this relies on belief. An approach to rational > behavior makes the prior a weighting measure, without > ringing in belief. I suggest we keep it this way, and > avoid the philosophical aspects Disagree. 1. Determination of risk requires a model which is based on a belief system (.e.g. there is/is not a minimum level of tremolite that causes mesothelioma). Probability is difficult enough to deal with, let alone an additional swamp called "risk". Those of us who have thought about developing outcomes in terms of risk, basically have had to give it up. The difference in interpretation of a risk value between different people is much to great. 2. Weighing is again based on a belief system. Everything is not equal. Some are more important than others. >The data consists of what has been observed. The likelihood >principle then mandates that the probabilities of unobserved >events becomes irrelevant. This means that the typical test >procedures (NOT the test STATISTICS) would have to be wrong. 3. This does not make sense. It needs something in addition. I wrote: >>Let us supose there are many plausible hypotheses. These include the >>"nil hypothesis" any priori hypotheses any idea at all that may be >>considered. Refer to these in terms of set of all plausible hypothesis >>(including that of no effect) that are to be tested. >The set of all plausible hypotheses is generally uncountable, >even in the discrete case. >>The process is to pick each hypothesis and test it. >This cannot be done; there are too many. 4. If this is the result, then you have a really, really bad experiment. You haven't thought about the problem and defined a finite region for exploration. I sure could not do a PhD thesis and have it accepted if I didn't have a defined region and objectives for the research. 5. Let me quote R.A. Fisher "he (the investigator) should only claim that a phenomenon is experimentally demonstrated when he knows how to design an experiment so that it will rarely fail to give a significant result" (Fisher 1929b). The experiment is then the means to obtain data to test the chosen hypotheses. DAHeiser The outcome of the >test is not only a probability, but a reality check (the investigators >belief system). === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Hypothesis testing and magic - episode 2
At 09:30 AM 4/13/00 +1000, Alan McLean wrote: >In the soft sciences it is easy enough to identify a characteristic of >interest alan makes good points as usual ... but i totally object to the term 'soft' sciences ... what does soft imply? that the science is bad ... or, that merely that variables are more 'difficult' to measure ... if that is the case, these ought to be called the 'hard' sciences the unpleasant associations with the term 'soft' are uncalled for ... there are excellent 'scientists' (whatEVER that means) in all fields .. and some pretty weak ones too (and gee ... BOTH kinds get tenure!) ... science is science ... and some practice it well ... some don't ... should it be some demerit against them that they happen to have opted for a field of interest ... even if many of the variables are difficult to measure? perhaps that makes it even more challenging ... finally, i would not be so quick to claim that in the areas that are non social science based ... that variables are all the clear and clean cut ... there seems to be tremdous infighting about theories and how to 'validate' them in medicine ... astronomy ... physics ... it is not like everything there is so simple ... maybe don can pop in here with some relevant examples ... i am sure there are 'mean' differences in terms of these things but ... there is a lot more WITHin variation in terms of hardness/softness ... that between disciplines == dennis roberts, penn state university educational psychology, 8148632401 http://roberts.ed.psu.edu/users/droberts/droberts.htm === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: cluster analysis
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] says... > >Can anyone help with good resources on the web, journals, books, etc on >cluster analysis - simularity and ordination. Any recommended programs >for this type of analysis too. > >Cheers >Elisa Wood For a list of cluster analysis programs, go to http://www.recursive-partitioning.com/cluster.shtml For references, check out CSNA and Warren Sarle's bibliographies. The links are at http://www.kdcentral.com -- T.S. Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com __ Get paid to write a review! http://recursive-partitioning.epinions.com === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Data Mining
Data Mining = Statistics reborn with a new name. You ask the wrong crowd. Go to http://www.kdcentral.com and subscribe to datamine-l mailing list. In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] says... > >I suspect in this forum, almost as bad as the F-word or N-word are the >DM-words... Data Mining... I agree, but wonder about criteria. > >Often in our various research domains we have no choice but to use >retrospective data. A classic example might be validating an investment >approach by examining historical data, which some call backtesting. > >What are the criteria, how can we know when we have chance findings? > >I've argued that if the model is based on an a priori hypothesis, or can >be justfied by previously established theories, the possibility of data >mining may be ignored. When the pre-existing theory is less substantial, >one may ask if the discovered model fits data not included in the >original model (data which occurs after the model was discovered, or data >which precedes the data originally used to create the model). > >I'd like to hear the views of people on this forum. > >The specific situation I'm refering to is an investment model called the >Foolish Four (http://www.fool.com/school/dowinvesting/dowinvesting.htm) >which was found to beat the S&P500 and Dow 30 over the period from 1973 >through 1993. Since that date, and further backtested to 1961, it has not >similarly beat those traditional benchmark indexes, but also has not >performed worse (both of which could be due to lack of power). The >Foolish Four is based on a reasonable hypothesis that the worse >performing Dow Jones Industrial Average companies are poised to turn >around because they are simply too great to fail over the long term. The >judgement on poor performance is based on the stock yield (a high >yielding stock has a relatively high interest payment compared to price), >therefore a reasonable hypothesis is used to justify this approach. >Selection of 4 of the 5 worst performing Dow companies (the worst is >excluded because often these companies are in actual long term financial >trouble) is what makes up the Foolish Four. > >I am not affiliated with the Motley Fool (where this investment strategy >is touted) nor am I advertising for them. It is just an interesting >practical problem which raises a question I think many statiticians face, >how to explain when someone has conducted data mining and when they might >have sussed out a valid truth. > >Paul Bernhardt >University of Utah >Department of Educational Psychology -- T.S. Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com __ Get paid to write a review! http://recursive-partitioning.epinions.com === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Hypothesis testing and magic - episode 2
Some more comments on hypothesis testing: My impression of the hypothesis test controversy, which seems to exist primarily in the areas of psychology, education and the like (this is coming from someone who has been involved in education for all my working life, but with a scientific/mathematical background), is that it is at least partly a consequence of the sheer difficulty of carrying out quantitative research in those fields. A root of the problem seems to be definitional. I am referring here to the definition of the variables involved. In, say, an agricultural research problem it is usually easy enough to define the variables. For a very simple example, if one is interested in comparing two strains of a crop for yield, it is very easy to define the variable of interest. It is reasonably easy to design an experiment to vary fairly obvious factors and to carry out the experiment. In the soft sciences it is easy enough to identify a characteristic of interest the problem is how to measure it. If I am interested in the relationship between ability in statistics and ethnic background, for example, I measure the statistics ability using a test of some sort; I measure ethnic background by defining a set of ethnicities. There are literally an infinite number of combinations that I can use infinitely many different tests, all purporting to measure statistics ability (even if I change only one word in a test, I cannot be absolutely certain of its effect, so it is a different test!), and a very large number of definitions of ethnicity. This is of course not news to anyone reading this. But I am coming to my point. Suppose I carry out an experiment I apply the test to a group of people of varying ethnicity, score them on the test and analyse the results, including a hypothesis test to decide if statistics ability is related to ethnicity. This test might be a simple ANOVA, or a Kruskal-Wallis or a chi square test, depending on how I score the test. As I said earlier, a hypothesis test only helps the user to decide which of two models is probably better. The point of the above paragraphs is this: the definition of the models being compared includes the definition of the variables used. If I reject the null model (a label I prefer to null hypothesis) that is I decide that the alternative model is (likely to work) better I am NOT saying that there is a relationship between statistics ability and ethnicity. All I am saying is that there is a relationship between the two variables I used. Please note that the test is not saying this I am. The test merely gives me a measure of the strength of the evidence provided by the data (significant at 1% or p-value of .0135); this measure is only relevant if the models I have used are appropriate. I can use other evidence (experience is what we usually use! but there may be related tests that help) to decide if the model is appropriate. So there are three levels at which judgement is used to make decisions: deciding what variables are to be used to measure the characteristics of interest, and how any relationship between them relates to the characteristics deciding on the model to be used, and how to test it deciding the conclusion for the model In each of these there is evidence we use to help us make the decision. The hypothesis test itself provides the test for the third. Finally (at least for the moment) whether we choose the null or alternative model, it IS a decision. In research, accepting the null means that we decide to accept it at least for the moment, so it is not necessarily a committed decision. On the other hand, if a line of investigation is not yielding results, the researcher is likely to not continue on that line so it is a decision which does lead to an action. For non research applications such as in quality control, accepting the null model quite clearly is a decision to act on the basis of that. For example, with a bottle filling machine which is periodically tested as to the mean contents, the null is that the machine is filling the bottles correctly. Rejecting the null entails stopping the machine; accepting it means the machine will not be stopped. Traditional hypothesis testing does incorporate a decision-theoretic loss function the p-value. Regards again, Alan -- Alan McLean ([EMAIL PROTECTED]) Department of Econometrics and Business Statistics Monash University, Caulfield Campus, Melbourne Tel: +61 03 9903 2102Fax: +61 03 9903 2007 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and informa
summing standard errors within polynomial regression
A colleague sent the following to me at work today and after perusal of various texts (Neter et al, Pedhazur, Cohen, etc.) I am unable to give anything but an opinion...here is what he sent: "Can you answer me the following question. It concerns what is the appropriate standard error (SE) from a curve fitting program when what one wants to plot is derived from a COMBINATION of certain parameters, each of which has its own SE, specifically, I fit parabolas to some data, sensitivity (y) as a function of pupil position (x) y = ax^2 + bx + c >From trivial Calculus, the peak of this function is at -b/2a Now, I have seperate SEs for a, b and c from the fits. What is the best SE to use for -b/2a ? For example, do the SEs add?" At first he proposed an equation that took the square root of the variance of the estimates, i.e., sqrt(SE^2 lin + SE^2 quad). My feeling was that given the partialed nature of standard errors in a multiple regression context, it may be misleading to add the respective standard errors, e.g., SE for the linear component + SE for the quadratic component, etc., especially given the collinearity (if centering not performed) of the terms. However, I also understand that variance components can be additive. Anyway, if anyone has a general opinion I would be most appreciative as I am a bit stumped on this one...thank you...dale glaser === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Data Mining
I suspect in this forum, almost as bad as the F-word or N-word are the DM-words... Data Mining... I agree, but wonder about criteria. Often in our various research domains we have no choice but to use retrospective data. A classic example might be validating an investment approach by examining historical data, which some call backtesting. What are the criteria, how can we know when we have chance findings? I've argued that if the model is based on an a priori hypothesis, or can be justfied by previously established theories, the possibility of data mining may be ignored. When the pre-existing theory is less substantial, one may ask if the discovered model fits data not included in the original model (data which occurs after the model was discovered, or data which precedes the data originally used to create the model). I'd like to hear the views of people on this forum. The specific situation I'm refering to is an investment model called the Foolish Four (http://www.fool.com/school/dowinvesting/dowinvesting.htm) which was found to beat the S&P500 and Dow 30 over the period from 1973 through 1993. Since that date, and further backtested to 1961, it has not similarly beat those traditional benchmark indexes, but also has not performed worse (both of which could be due to lack of power). The Foolish Four is based on a reasonable hypothesis that the worse performing Dow Jones Industrial Average companies are poised to turn around because they are simply too great to fail over the long term. The judgement on poor performance is based on the stock yield (a high yielding stock has a relatively high interest payment compared to price), therefore a reasonable hypothesis is used to justify this approach. Selection of 4 of the 5 worst performing Dow companies (the worst is excluded because often these companies are in actual long term financial trouble) is what makes up the Foolish Four. I am not affiliated with the Motley Fool (where this investment strategy is touted) nor am I advertising for them. It is just an interesting practical problem which raises a question I think many statiticians face, how to explain when someone has conducted data mining and when they might have sussed out a valid truth. Paul Bernhardt University of Utah Department of Educational Psychology === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On 12 Apr 2000, Herman Rubin wrote: > >I have often wondered if an integrated course/course sequence might not be > >better. > > A course sequence of a rather different kind is definitely > in order. It would be at least three courses. > > The first course would be a general probability only course, > with the emphasis on understanding probability, not in carrying > out computations. This has nothing to do with the discipline > of the individual student, although the level should be such > that it uses as much mathematics as the student is going to know. > One might, at this stage, introduce the ideas of statistical > decision making, but most will need a full course in probability > first to understand probability well enough to use it in any > sensible manner. If probability is presented as merely the > limit of relative frequency, this might be quite difficult. > > The second course should be a course in probability modeling > in the student's department of application. The construction > of probability models, the making of assumptions, and the > meaning of those assumptions, is almost totally absent in > those using statistics today. There should be strong warnings > about the dangers of those assumptions being false, and that > in practice these assumptions might not be quite true. > > Only after this can one reasonably deal with the uncertainties > of inference. Dr. Rubin, Are there any texbooks that you would deem suitable for the 3 courses you describe above? -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
On Wed, 12 Apr 2000, Robert Dawson wrote: > > I'm afraid that I don't follow your definition of a "plausible null". > On the one hand, you say that my value (in the simulation I included) of > 102 for the mean IQ of a population is "a priori false"; you then say that > > "I like interval estimates because they give me a good > range for my plausibly true values for the null." > > But if I had computed a 95% confidence interval from almost any of those > simulated data sets, 102 would have been in it. > > Had I said that the mean IQ was actually 102 and that I was testing > the null hypothesis that it was 100, would you have called _that_ a > plausible null? My point - that repeated failures to reject the null > should *not* automatically increase one's belief in its truth - would > be equally valid. One of the reasons I have been enjoying this discussion is I am learning about the unshared assumptions that I have been making. I have in my own mind been using "plausible" to refer to a hypothesis that has not been refuted by data. We may certainly find at some point that the hypothesis is in fact false, but at the time we propose it it could be true. We may even wish it to be false at the time we propose it. But as of the time we propose it we cannot say with conviction that it is false. So, the values identified by a confidence interval would fit my usage of plausible. They are consistent with current knowledge, but some, most of them, will eventually be eliminated. While a value pulled from thin air is arguably plausible given no prior information, I would include a requirement for parsimony. Absent information a no effect hypothesis is the most parsimonious. I have been using "a priori false" to refer to a hypothesis that is known to be inconsistent with current knowledge. It is not even a reasonable guess at the correct value. If I choose to follow up on research in which a non-zero effect is well established but the parameter estimate has, to me, an unacceptably wide range I can use the previous estimate as my null and either find that my results are explainable as a chance deviation from the existing estimate or my results indicate that the existing estimate is too large/small. That is my results would either tend to support the status quo or refute it. In a discussion about the estimation of the speed of light the authors (sorry, I can't remember who or where. If anyone recognizes this example please point me to the reference) describe how the initial estimates of the speed of light were too high and had a very wide CI. Over the course of years with improved technology the best guess estimate of light speed changed and the CI narrowed. While the mechanics that we know as hypothesis testing were absent the researchers were clearly using the established best estimate as the equivilent of a null and modifying it as better data became available, and leaving it stand otherwise. The only context for your example was that the data were generated with a specification that they come from a population that was N(100, 15). We therefore have prior knowledge that 102 is not the correct answer. But, if I were in fact trying to guess at the IQ of a population, the data from a sample of n = 10 provides precious little information, as you clearly demonstrated. But, if I had to try, my likely null for an unknown population would be 100 since that is the normed mean IQ for some population and therefor is consistent with prior knowledge. That is a null of IQ=100 is a credible true value until I can get better data (it might even be the correct value). If n = 10 and I cannot reject a null of 100 I certainly agree that the corroboration value is low. But, if n = 100 and I can't reject a null of 100 I am starting to see support for 100 as a correct value. If n = 500 and I cannot reject a null of 100 would you still demand that I had no evidence supporting the null? How about if n = 1000? 10,000? How much power has to be present before failure to reject the null is support of the null? Michael > > -Robert Dawson > > > > > *** Michael M. Granaas Associate Professor[EMAIL PROTECTED] Department of Psychology University of South Dakota Phone: (605) 677-5295 Vermillion, SD 57069 FAX: (605) 677-6604 *** All views expressed are those of the author and do not necessarily reflect those of the University of South Dakota, or the South Dakota Board of Regents. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in terminat
Re: hyp testing
In article <004101bfa35b$54beb900$[EMAIL PROTECTED]>, David A. Heiser <[EMAIL PROTECTED]> wrote: >This is a multi-part message in MIME format. >--=_NextPart_000_003E_01BFA320.A7535300 >Content-Type: text/plain; > charset="iso-8859-1" >Content-Transfer-Encoding: quoted-printable >- Original Message - >From: Michael Granaas <[EMAIL PROTECTED]> >> Our current verbal lables leave much to be desired. >> Depending on who you ask the "null hypothesis" is >> a) a hypothesis of no effect (nil hypothesis) >> b) an a priori false hypothesis to be rejected (straw dog hypothesis) >> c) an a priori plausible hypothesis to be tested and falsified or >> corroborated (wish I had a term for this usage/real null?) >The concept of a hypothesis is important. It can be used to teach an >important statistical concept. >Let us supose there are many plausible hypotheses. These include the >"nil hypothesis" any priori hypotheses any idea at all that may be >considered. Refer to these in terms of set of all plausible hypothesis >(including that of no effect) that are to be tested. The set of all plausible hypotheses is generally uncountable, even in the discrete case. >The process is to pick each hypothesis and test it. This cannot be done; there are too many. The outcome of the >test is not only a probability, but a reality check (the investigators >belief system). The data consists of what has been observed. The likelihood principle then mandates that the probabilities of unobserved events becomes irrelevant. This means that the typical test procedures (NOT the test STATISTICS) would have to be wrong. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
hyp test:better def
it appears to me that we are having the same kinds of discussions on this topic as usual and we go round and round ... and where we stop depends on when people get tired of it is progress being made? i wonder ... perhaps some of this time would be better spent defining more what a hypothesis is within the general area of doing research ... FORGET ABOUT STATISTICS FOR A MOMENT ... then, if we agree that there are times ... within the framework of trying to better understand phenomena ... that it is helpful perhaps vital for us to formulate AND test (gather data about) one or more researchable hypotheses then we might get a better handle on 1. what the hypothesis is 2. what is a frameable version(s) of that hypothesis 3. what are some data handling (statistical?) ways of trying to collect and present evidence that will shed some light on how tenable or reasonable it is to keep that hypothesis as a work in progress ... or to decide to abandon it and search for better hypotheses or notions or explanations of phenomena we need to recognize however that 1. truth will not be found by this method ... that is, the absolute truth 2. our efforts at best will move us only closer to better understandings of phenomena .. 3. no matter what we find ... we always have to take it with a huge grain of salt ... finally, i WOULD LIKE to offer some summary points that do seem sensible to me A. the reliance on ... and dominance of ... traditional 'significance' testing ... in almost all of printed scientific literature ... across most disciplines ... is TOTALLY out of whack in terms of what this 'method' can tell us about phenomena B. the failure of statisticians in general, particularly those (me included) who TEACH students about this stuff, to build into their psyches 'priors', in some form, as herman and others have been preaching ... is tantamount to unethical statistical instructional practice and C. if we do A and don't do B ... we do a tremendous disservice to students we work with now, how we reinvent our strategies ... is difficult INdeed ... but, we must try === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
In article, Magill, Brett <[EMAIL PROTECTED]> wrote: >Seems to me that hypothesis testing remains an essential step. Take for >instance the following data that I made up just for the purpose of >illustration and the correlation matrix it produces: >VAR1 VAR2 >2.00 2.00 >3.00 2.00 >5.00 6.00 >4.00 2.00 >3.00 1.00 >Correlations > VAR1VAR2 >VAR1 >Pearson Correlation1.000 .765 >Sig. (2-tailed). .132 >N 5 5 >VAR2 >Pearson Correlation.7651.000 >Sig. (2-tailed).132. >N 5 5 >Now, .77 is probably a respectable correlation (depending of course on the >application). However, the question here is how much faith we have in this >estimate. Accepting the traditional alpha level of .05 (because it is not >real data and so no reason not to) we would say that this is beyond what we >will accept as the risk of making a Type I error, so we fail to reject the >null. This is not to say that the correlation is zero, but for practical >purposes with this sample, we must treat it as no effect (and here probably >take into consideration our power). Effect size is useless without >significance. Significance is meaningless without information on effect >size. One can make a case for hypothesis testing in SOME situations. However, the above example is one which shows some of what is wrong. Even classical statisticians, faced with the rudiments of decision theory, will agree that the significance level to be used should generally decrease with increasing sample size. While there can be problems with small samples, the converse is that it should increase with decreasing sample size. The choice of a significance level, without consideration of the consequences of incorrect acceptance if the null is false, fails when one considers the real problem. I have seen a paper claim that an effect was not important because it came out at the .052 level. This is bad statistics, and bad science. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
In article <[EMAIL PROTECTED]>, Michael Granaas <[EMAIL PROTECTED]> wrote: >On Tue, 11 Apr 2000, Robert Dawson wrote: . >> and Michael Granaas responded >> > This (point 4) is certainly what we have been lead to believe, but I >> > question the assumption. Do we not in fact teach that we are to act as if >> > the null is true until we can demonstrate otherwise? >> I certainly don't. We *compute* as if the null was true, whether we >> believe it or not; then we either conclude that (null + data) is implausible >> or that the data are consistent with the null. >And if the data are consistent with the null we do what? Act as if the >null were true? Act as if nothing were true? In a pure interpretation of >this approach we must act as if there were no knowledge (null not >rejected) or only very weak knowledge (effect is in the >direction). The first is a complete waste of effort and the second >provides only the weakest bit of sketchy knowledge. >Every research project should plausibly add to our knowledge base. But, >if the null is a priori false failure to reject is just that a failure and >waste of time. One might think that if the null is a priori false, then we should just go ahead and reject it without looking at the data. But we are asking the wrong question; what is usually wanted is to decide whether it is better to act as if the null is true. This is the usual situation; if statistical testing was available and used in science in the early days, most hypotheses would have been rejected. But without accepting a false hypothesis, it would not be possible to draw conclusions, and progress would not be made. In the physical sciences, it was sometimes the case that the data were sufficiently inexact that the error in the model got swamped by the errors in the data; in such a case, the null hypotheses might get through. Chemists would have had major problems in the early days if they could weigh their samples accurately enough; the isotope effect would mess up the theories. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
In article <048a01bfa483$85f46280$[EMAIL PROTECTED]>, Robert Dawson <[EMAIL PROTECTED]> wrote: >Michael Granaas wrote (in part): >> The problem is that interval estimation and null hypothesis testing are >> seen as distinct species. An interval that includes zero leads to the >> same logical problems as failure to reject a false null. Interval estimation at a fixed coverage probability also does not meet any decision concept; at best, it can be considered a descriptive statistic. If there is enough data, and there is no real null, one can use a flat prior, and act as if the posterior distribution of the parameter is essentially the normalized likelihood function. But interval estimation should take into account the size of the interval. The easiest from a computational standpoint happens to be linear in this; the action to be taken becomes quite simple. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
In article <[EMAIL PROTECTED]>, dennis roberts <[EMAIL PROTECTED]> wrote: >At 01:16 PM 4/10/00 -0300, Robert Dawson wrote: >>both leave the listener wondering "why 0.5?" If the only answer is "well, >>it was a round number close enough to x bar [or "to my guesstimate before >>the experiment"] not to seem silly, but far enough away that I thought I >>could reject it." then the test is pointless. -Robert Dawson >YOU HAVE made my case perfectly! ... this is why the notion of hypothesis >testing is outmoded, no longer useful ... not worth the time we put into >teaching it ... >in the case above ... i would ask: There are cases where .5 makes sense, or rather approximately .5 makes sense. This happens in genetic regression, if one assumes additivity, random mating, and the contributions of the parents are equal. However, Rubin's second commandment is that thou shalt not believe thy assumptions. The problem of testing approximate hypotheses is more difficult. From a decision-theoretic standpoint, if the width of the acceptance region in the parameter space is small compared to the standard error of the usual estimator, one can use a point null as a good approximation. If not, the users assumptions become more important. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: scientific method
On 10 Apr 2000 14:06:32 -0700, [EMAIL PROTECTED] (dennis roberts) wrote: > here are a few (fastly found i admit) urls about scientific method ... some > are quite interesting < snip; so that no one might think that I recommend the citations > I saved this note because it had references, but I was disappointed by them, now that I finally got around to checking -- after the first three, I quit checking. The first one had the tone such that I wondered if the author was going to point me to Biblical Creationism as what he *recommended* . Well, it is not the perspective you will see in your Social Science textbooks. I recommend, instead, that if you want to understand how a scientist's mind works, you might want to read a critique of that neurotic U.S. movement -- Try Stephen Jay Gould, when he is writing essays and book reviews (rather than the excellent Naturalist topics, which make up most of his books). I think "Urchins in the storm" is a book of reviews. He also wrote an excellent piece about the role of biasses in social science history, "The Mismeasure of Man". For deep consideration of the scientific method, I recommend, "Criticism and the Growth of Knowledge" (Lakatos, ed.). This book happens to be from the proceedings of a symposium devoted to exploring Thomas Kuhn's thesis about revolutions in scientific discovery, and it is a modern classic. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
In article <[EMAIL PROTECTED]>, Michael Granaas <[EMAIL PROTECTED]> wrote: >In thinking about my own failure to get students to ask follow up >questions to a null hypothesis test I have formulated a couple of possible >reasons. Let me know what you think. >1. Even when we teach statistics in the discipline areas we fail to >integrate it with research. We teach a course in statistics and a course >in research design/methodology as if they were two distinct topics. It >seems to me that this could easily encourage the type of thinking that >leads to substantive questions not being linked to the statistical >hypothesis/procedure selected. >I have often wondered if an integrated course/course sequence might not be >better. A course sequence of a rather different kind is definitely in order. It would be at least three courses. The first course would be a general probability only course, with the emphasis on understanding probability, not in carrying out computations. This has nothing to do with the discipline of the individual student, although the level should be such that it uses as much mathematics as the student is going to know. One might, at this stage, introduce the ideas of statistical decision making, but most will need a full course in probability first to understand probability well enough to use it in any sensible manner. If probability is presented as merely the limit of relative frequency, this might be quite difficult. The second course should be a course in probability modeling in the student's department of application. The construction of probability models, the making of assumptions, and the meaning of those assumptions, is almost totally absent in those using statistics today. There should be strong warnings about the dangers of those assumptions being false, and that in practice these assumptions might not be quite true. Only after this can one reasonably deal with the uncertainties of inference. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
I wrote: > >(a) that their discipline ought to be a science; and Herman Rubin responded: > > What is a science? The word means "knowledge". It did once, and does still in certain uses. I _think_ that everybody here is aware that the main meaning today is more restricted. > >Granted, if they did understand statistics, they would not test hypotheses > >nearly as often as they do. However, that said, I am not entirely persuaded > >that risk calculation is the whole story, either. In many pure research > >situations, "risk" is just not well defined. What is the risk involved in > >believing (say) that the universe is closed rather than open? > > Both "hypotheses" are highly composite. In a situation > like this, what is the ADVANTAGE of assuming one rather > than the other? What action is going to be taken? > > There may be a point in investigating the problem, but is > there one in drawing inferences? Yes, surely. Let me turn the question around, Herman: your statements seem to imply that any form of inference designed to do anything but choose between a set of _actions_ is at least pointless and probably immoral; and it seems as if you are advocating a philosophy of science from which the concepts of "fact", "truth", and "falsehood", even in a tentative sense, are to be eliminated, to be replaced by the concept of "utility". What benefit is there in this? As for the risks, I can see definite disadvantages in proposals such as: > The only general conclusion would be a summary of the likelihood > function, or a reduction of the data to a point where the loss of > information is not critical in computing a good approximation to > the "best" action. Part of communicating the results of a piece of research is summarizing them and interpreting them, so that it takes less time to read a scientific paper than it took to write it. If scientific writing were restricted as you suggest, the unfortunate person who *did* have to make a decision would have the following Herculean program to carry out: First, make up a list of all possible states of the universe; Then do the data analysis for all relevant research yourself, using the list assembled in the first step. In situations where one actually *is* balancing risks, I quite agree that one should analyze the data accordingly. However, I do not understand your apparent implicit claim that no question should be asked or answered in any other situation. For instance, supposing one is trying to decide whether to add fluoride to the drinking water of a town. The final decision should be a risk-benefit analysis. However, the possibility of ever getting to a final decision depends to a large extent on the fact that other researchers in the past did *not* end their papers with phrases like "Does fluorine cause an elevated risk of chicken pox? I'm not going to tell you, but if you plug your personal risk estimates into this calculation [Box IV] you can decide whether you think the risk outweighs the benefits." -Robert Dawson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
In article <01e301bfa2ee$b69f42b0$[EMAIL PROTECTED]>, Robert Dawson <[EMAIL PROTECTED]> wrote: >Dennis Roberts asked, imagining a testing-free universe: >>> what would the vast majority of folks who either do inferential work >>and/or >>> teach it ... DO >>> what analyses would they be doing? what would they be teaching? >I wrote: >> >* students would be told in their compulsory intro stats that >> >"a posterior probability of 95% or greater is called >> > "statistically significant", and we say 'we believe >> > the hypothesis'. Anything less than that is called >> >"not statistically significant", and we say 'we disbelieve >> > the hypothesis'". >and Herman Rubin responded: >> Why? What should be done is to use the risk of the procedure, >> not the posterior probability. The term "statistically significant" >> needs abandoning; it is whether the effect is important enough >> that it pays to take it into account. >Dennis asked what _would_ happen, not what _should_. Most of the abuses we >see around us are not the fault of hypothesis testing _per_se_, but of >statistics users who believe: >(a) that their discipline ought to be a science; What is a science? The word means "knowledge". >(b) that statistics must be used to make this so; The problem is that they expect statistics to take in their data and spew out the TRUTH. The capital letters are not an exaggeration. >(c) and that it is unreasonable to expect them to _understand_ >statistics just because of (a) and (b). They have elevated statistics to a religion, and as in many religions, the layman only has to carry out the sacrifices ordered by the priest to get the blessings of the gods. They do not need to understand statistical CALCULATIONS, and they do not have to be able to produce the proofs. What they need to understand are the concepts of probability and decision making, so that they can accurately communicate their problems to those who can help with the mechanics. >Granted, if they did understand statistics, they would not test hypotheses >nearly as often as they do. However, that said, I am not entirely persuaded >that risk calculation is the whole story, either. In many pure research >situations, "risk" is just not well defined. What is the risk involved in >believing (say) that the universe is closed rather than open? Both "hypotheses" are highly composite. In a situation like this, what is the ADVANTAGE of assuming one rather than the other? What action is going to be taken? There may be a point in investigating the problem, but is there one in drawing inferences? >Moreover, suppose we elected Herman to the post of Emperor of Inference, >(with the power of the "Bars and the Axes"?) to enforce a risk-based >approach to statistics (not that he'd take it, but bear with me...), would >the situation realy improve? >My own feeling is that, in many "soft" science papers of the sort where >the research is not immediately applied to the real world, but may affect >public policy and personal politics, a "risk" aproach would be disastrous. >If the researcher had to assign "risks" to outcomes that were merely a >matter of correct or incorrect belief, it would be all too tempting to >assign a large risk to an outcome that "would set back the cause of X fifty >years" and conversely a small risk to accepting a belief that might be >considered "if not true, at least a useful myth." (Exercise: provide your >own examples). Everything would be lowered to the level of Pascal's Wager - >surely the canonical example of the limitations of a risk-based approach? It is precisely in these situations that a risk approach is absolutely necessary. But the input to this, or any other, sound risk approach must be made by those who will benefit or suffer from the decision. In the case of medical procedures, unless there is a public health question like the spread of disease, it is the risk function of the patient which is the one which should be used. For public policy, it is the risk function of the government which is involved. On this topic, there is a fair book by Clemen, _Making Hard Decisions_. >One might argue that in such a situation the rare reader who intends to >take action, and not the writer, should do the statistics. Unfortunately, in >the real world, that won't wash. People want simple answers, and with the >flood of information that we have to deal with in keeping up with the >literature in any subject today, this is not entirely a foolish or lazy >desire. As Einstein stated, make things as simple as possible, but NO SIMPLER. It is a foolish desire, fanned by ignorance. We should be teaching that statistics at least as difficult as the Oracle of Delphi, and that understanding the Oracle can be as difficult as solving the problem otherwise. It is considered the author's responsibility to reach a conclusion, >not just to present a mass of undigested
Re: hyp testing
On 11 Apr 2000, Donald F. Burrill wrote: > On Mon, 10 Apr 2000, Bruce Weaver wrote in part, quoting Bob Frick: > -- >8 --- > > > > To put this argument another way, suppose the question is whether one > > variable influences another. This is a discrete probability space with > > only two answers: yes or no. Therefore, it is natural that both > > answers receive a nonzero probability. > > It may be (or seem) "natural"; that doesn't mean that it's so, > especially in view of the subsequent refinement: > > > Now suppose the question is changed into > > one concerning the size of the effect. This creates a continuous > > probability space, with the possible answer being any of an infinite > > number of real numbers and each one of these real numbers receiving an > > essentially zero probability. A natural tendency is to include 0 in this > > continuous probability space and assign it an essentially zero > > probability. However, the "no" answer, which corresponds to a size of > > zero, does not change probability just because the question is phrased > > differently. Therefore, it still has its nonzero probability; only the > > nonzero probability of the "yes" answer is spread over the real numbers. > > > > To this I have two objections: (1) It is not clear that the "no" answer > "does not change probability ...", as Bob puts it. If the question is > one that makes sense in a continuous probability space, it is entirely > possible (and indeed more usual than not, one would expect) that > constraining it to a two-value discrete situation ("yes" vs. "no") may > have entailed condensing a range of what one might call "small" values > onto the answer "no". That is, the question may already, and perhaps > unconsciously, have been "coarsened" to permit the discrete expression > of the question with which Bob started. I see your point. But one of the examples Frick gives concerns the existence of ESP. In the discrete space, it does or does not exist. For this particular example, I think one could justify using a 1-tailed test when moving to the continous space; and so the null hypothesis would encompass "less than or equal to 0", and the alternative "greater than 0". It seems to me that with a one-tailed alternative like this, the null hypothesis can certainly be true. > (2) My second objection is that if the positive-discrete > probability is retained for the value "0" (or whatever value the former > "no" is held to represent), the distribution of the observed quantity > cannot be one of the standard distributions. (In particular, it is not > normal.) One then has no basis for asserting the probability of error > in rejecting the null hypothesis (at least, not by invoking the standard > distributions, as computers do, or the standard tables, as humans do > when they aren't relying on computers). Presumably one could derive the > sampling distribution in enough detail to handle simple problems, but > that still looks like a lot more work than one can imagine most > investigators -- psychologists, say -- cheerfully undertaking. This would not be a problem if the alternative was one-tailed, would it? Cheers, Bruce -- Bruce Weaver [EMAIL PROTECTED] http://www.angelfire.com/wv/bwhomedir/ === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
In article <007a01bfa1c7$aa97c460$[EMAIL PROTECTED]>, David A. Heiser <[EMAIL PROTECTED]> wrote: >Lots of interesting replies. >A. The "community" Denis Roberts refers to wants statistics to tell them >which is better, which of two models is the correct one, how much more will >method B cost me,then method A, which process do I use that will make me >more money, which is the best advertisment strategy, which or two positions >that my candidate can take will get him the most votes, which (of several >strategies/models) will get me more money when I trade on NASDAQ, what is >the probability that I can get genome U1 patented, which treatment will get >patients out of the hospital quicker, etc, etc, etc. Part of that is because people have been taught that statistics can give such answers. >B. Statistics cannot do any of this. It can only tell you what is the >probability that what you have occured by chance. It can tell you this for the specific observation, and in each state of nature. >C. Whatever you use, hypothesis, confidence intervals posterior probability, >or any other stat metod are only tools. The bottom line is a probability, >not a definite answer. Except for posterior probability, none of these are tools for the actual problems. And posterior probability is not what is wanted; it is the posterior risk of the procedure. But even this relies on belief. An approach to rational behavior makes the prior a weighting measure, without ringing in belief. I suggest we keep it this way, and avoid the philosophical aspects. >D. The only definate answer is if the result works as intended. Or as Joe >Ward say's it does a good job of prediction. >E.The success is a human judgement, which the people in A want a machine and >software to do, and to be infalable (because human judgement is not). -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
a professor thought that he was producing a test of 50 items at 'about the 50%' difficulty level, that is .. on average, the scores would be about 50%. now, he collected data from a random sample of n=40 of his class ... gave them the test ... and then did a ttest using 25 as the null ... he found (now no fair tossing in other considerations like ... well, this is not planned properly etc just take it on face value the way we ACTUALLY see it in the vast majority of the literature) MTB > ttest 25 c1 One-Sample T: C1 Test of mu = 25 vs mu not = 25 Variable N Mean StDev SE Mean C1 40 32.20 9.86 1.56 Variable 95.0% CIT P C1( 29.05, 35.35) 4.62 0.000 <<- REJECT THE NULL -- -- 0 25 50 where on the number line might there 'real' level of performance be based on the rejected null? another prof looking at the data, and keeping in mind what the professor in the course thought did the following 95% ci ... MTB > MTB > tint c1 One-Sample T: C1 Variable N Mean StDev SE Mean 95.0% CI C1 40 32.20 9.86 1.56 ( 29.05, 35.35) <- CI -- 0 25 50 where are the number line do you think the 'real' level of performance is? now, folks on the list have been trying to argue about what truth is ... or whether we actually could find it ... and i would say that in this case, one might define 'truth' at least two ways: first truth: is the null true? second truth: what is mu? the first truth is of so little value (and only really says, we don't think it is 25) ... but the second gets at the heart of the problem ... what is going on with the students performance ... the first truth and our 'proof' of or NOT of it ... just says WE were good or BAD at formulating the hypothesis ... but does not really get us closer to the second truth ... which speaks directly to the parameter ... === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
In article <[EMAIL PROTECTED]>, dennis roberts <[EMAIL PROTECTED]> wrote: >i was not suggesting taking away from our arsenal of tricks ... but, since >i was one of those old guys too ... i am wondering if we were mostly lead >astray ...? >the more i work with statistical methods, the less i see any meaningful (at >the level of dominance that we see it) applications of hypothesis testing ... >here is a typical problem ... and we teach students this! >1. we design a new treatment >2. we do an experiment >3. our null hypothesis is that both 'methods', new and old, produce the >same results I presume you mean the same distribution of results. But this is at least next to impossible. Even if all you are doing is using a new batch of the same old material, there will be SOME difference. This may or may not be important. >4. we WANT to reject the null (especially if OUR method is better!) Some do, and some do not. >5. we DO a two sample t test (our t was 2.98 with 60 df) and reject the >null ... and in our favor! >6. what has this told us? >if this is ALL you do ... what it has told you AT BEST is that ... the >methods probably are not the same ... but, is that the question of interest >to us? >no ... the real question is: how much difference is there in the two methods? >our t test does NOT say anything about that >1 to 6 can be applied to all sorts of hyp tests ... and most lead us >essentially into a dead end One should approach the problem as a decision problem form the beginning. The real main question is, should the new treatment be used? There are many variations on this, and what may be the least useful action is to say either that there is a statistically significant difference, OR that there is no difference. It is easy to give reasonable examples where either variation of the current method is the opposite of what is wanted. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing
Michael Granaas wrote (in part): > The problem is that interval estimation and null hypothesis testing are > seen as distinct species. An interval that includes zero leads to the > same logical problems as failure to reject a false null. No; an interval that includes zero has additional information. Not (to open another can of worms) because of being a confidence interval; we can construct a 95% confidence region, the union of two intervals, consisting of precisely the *least* plausible values, and it is possible to construct a 95% CI that contains no information whatsoever about the value of the parameter! But as somebody (Kalbfleisch? George Gabor?) said once, the reason that confidence intervals as usually computed work as well as they do is that they are closely related to maximum-likelihood intervals. I'm afraid that I don't follow your definition of a "plausible null". On the one hand, you say that my value (in the simulation I included) of 102 for the mean IQ of a population is "a priori false"; you then say that "I like interval estimates because they give me a good range for my plausibly true values for the null." But if I had computed a 95% confidence interval from almost any of those simulated data sets, 102 would have been in it. Had I said that the mean IQ was actually 102 and that I was testing the null hypothesis that it was 100, would you have called _that_ a plausible null? My point - that repeated failures to reject the null should *not* automatically increase one's belief in its truth - would be equally valid. -Robert Dawson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
RE: Hypothesis testing and magic
Finally a voice of sanity!!! Henry M. Silvert Ph.D. Research Statistician The Conference Board 845 Third Ave. New York, NY 10022 Phone : (212)339-0438 Fax : (212)836-3825 Email : [EMAIL PROTECTED] > -Original Message- > From: Alan McLean [SMTP:[EMAIL PROTECTED]] > Sent: Tuesday, April 11, 2000 7:47 PM > To: EDSTAT list > Subject: Hypothesis testing and magic > > I have been reading all the back and forth about hypothesis testing with > some degree of fascination. It's a topic of particular interest to me - > I presented a paper called 'Hypothesis testing and the Westminster > System' at the ISI conference in Helsinki last year. > > What I find fascinating is the way that hypothesis testing is regarded > as a technique for finding out 'truth'. Just wave a magic wand, and > truth will appear out of a set of data (and mutter the magic number 0.05 > while you are waving it) Hypothesis testing does nothing of the sort > - of course. > > First, hypothesis testing is not restricted to statistics or 'research'. > If you are told some piece of news or gossip, you automatically check it > out for plausibility against your knowledge and experience. (This is > known colloquially as a 'shit filter'.) If you are at a seminar, you > listen to the presenter in the same way. If what you hear is consistent > with your knowledge and experience you accept that it is probably true. > If it is very consistent, you may accept that it IS true. If it is not > consistent, you will question it, conclude that it is probably not true. > > IF the news is something that requires some action on your part, you > will act according your assessment of the information. > > If the news is important to you, and you cannot decide which way to go > on prior knowledge, you will presumably go and get corroborative > information, hopefully in some sense objective information. > > This describes hypothesis testing almost exactly; the difference is a > matter of formalism. > > Next - a statistical hypothesis test compares two probability models of > 'reality'. If you are interested in the possible difference between two > populations on some numeric variable - for example, between heights of > men and heights of women in some population group - and you choose to > express the difference in terms of means, you are comparing a model > which says > height of a randomly chosen individual = overall mean + random > fluctuation > with one which says > height of a randomly chosen individual = overall mean + factor > due to sex + random fluctuation > You then make assumptions about the 'random fluctuations'. > > Note that one of these models is embedded within the other - the first > model is a particular case of the second. It is only in this situation > that standard hypothesis testing is applicable. > > Neither of these models is 'true' - but either or both may be good > descriptions of the two populations. Good in the sense that if you do > start to randomly select individuals, the results agree acceptably well > with what the model predicts. The role of hypothesis testing is to help > you decide which of these is (PROBABLY) the better model - or if neither > is. > > In standard hypothesis testing, one of these models is 'privileged' in > that it is assumed 'true' - that is, if neither model is better, then > you will use the privileged model. In most cases, this means the SIMPLER > model. > > More accurately - if you decide that the models are equally good (or > bad) you are saying that you cannot distinguish between them on the > basis of the information and the statistical technique used! To decide > between them you will need either to use a different technique, or more > realistically, some other criterion. For example, in a court case, if > you cannot decide between the models 'Guilty' and 'Innocent', you may > always choose 'Innocent'. > > There is no reason why one model is thus privileged. In my paper I > stressed my belief that this approach reflects our (and Fisher's) > cultural heritage rather than any need for it to be that way. One can > for example express the choice as between the embedded model and the > embedded model suggested by the data. For a test on the difference > between two means, this considers the models mu(diff) = 0 and mu(diff) = > xbar. The interesting thing is that this is what we actually do! > although it is dressed up in the language and technique of the general > model mu(diff) not= 0. (This dressing up is a lot of the reason why > students have trouble with hypothesis testing.) > > To conclude: hypothesis testing is NECESSARY. We do it all the time. > Assessment of effect sizes is also necessary, but the two should not be > confused. > > Regards, > Alan > > -- > Alan McLean ([EMAIL PROTECTED]) > Department of Econometrics and Business Statistics > Monash University, Caulfield Campus, Melbourne > Tel: +61 03 9903 2102Fax: +61 03 9903 2007 > > > > > ===
Re: cluster analysis
I distribute a program called COMPAH that does Lance-Williams combinatorial agglomerative clustering with 20-30 different similarity dissimilarity indices. It is a Fortran program that runs on Windows or DOS. It will cluster very large datasets (thousands of items) quickly. I provide documentation with references and I also distribute the source code free. It is set up to write and read Matlab binary files so that you can use Matlab for ordination and COMPAH for clustering. The program and documentation can be downloaded at http://www.es.umb.edu/edgwebp.htm In article <[EMAIL PROTECTED]>, Elisa Wood <[EMAIL PROTECTED]> wrote: > Can anyone help with good resources on the web, journals, books, etc on > cluster analysis - simularity and ordination. Any recommended programs > for this type of analysis too. > > Cheers > Elisa Wood > > -- Eugene D. Gallagher ECOS, UMASS/Boston Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Support Vector Book: Available Now
The Support Vector Book is now distributed and available (see http://www.support-vector.net for details). AN INTRODUCTION TO SUPPORT VECTOR MACHINES (and other kernel-based learning methods) N. Cristianini and J. Shawe-Taylor Cambridge University Press, 2000 ISBN: 0 521 78019 5 http://www.support-vector.net Contents - Overview 1 The Learning Methodology 2 Linear Learning Machines 3 Kernel-Induced Feature Spaces 4 Generalisation Theory 5 Optimisation Theory 6 Support Vector Machines 7 Implementation Techniques 8 Applications of Support Vector Machines Pseudocode for the SMO Algorithm Background Mathematics References Index Description This book is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory. The book also introduces Bayesian analysis of learning and relates SVMs to Gaussian Processes and other kernel based learning methods. SVMs deliver state-of-the-art performance in real-world applications such as text categorisation, hand-written character recognition, image classification, biosequences analysis, etc. Their first introduction in the early 1990s lead to a recent explosion of applications and deepening theoretical analysis, that has now established Support Vector Machines along with neural networks as one of the standard tools for machine learning and data mining. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and application of these techniques. The concepts are introduced gradually in accessible and self-contained stages, though in each stage the presentation is rigorous and thorough. Pointers to relevant literature and web sites containing software ensure that it forms an ideal starting point for further study. These are also available on-line through an associated web site www.support- vector.net, which will be kept updated with pointers to new literature, applications, and on-line software. Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===