Re: Hypothesis testing and magic - episode 2
Jerry Dallal wrote: As Tukey has pointed out, the null hypothesis of no effect is not that we think there is no effect, but we are uncertain of the direction. I wish I knew more about Delany and its application. One problem, pointed out by David Salsburg, is that a substances that eliminates one of many competing risks would appear to increase the other risks. For example, people no longer subject to heart disease would undoubtedly see an increased incidence of cancer, with all cause mortality holding steady at 100%. I would hope that such risks would be measured as probability per unit time, and so the first-order effects of `we all die' would be removed. Which still leaves the second-order effects due to the lengthy induction process of many cancers. BTW an even greater problem in animal testing seems to be due using feed-on-demand systems. The little critters are usually bored out of their minds and overeat, causing a variety of health problems. So any drug that makes them mildly unwell can easily spoil their appetite -- and make them look healthier. Peter === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Quick Portable Statisitcs
C., Bayard, Paschall, III wrote: I am looking for a source of "portable staistics", i.e. techniques that are easy to remember and use, that can be applied without a calculator or software program or and do not need reference tables. Examples are: Tukey-Duckworth two sample test, and the quadrant sum test for association (Omstead and Tukey). Are there others and is there a reference or source for these types of procedures? The stem-and-leaf plot springs to mind. There are lots of this sort of thing in Tukey's EDA book (which I find pretty unreadable). I would start with the much more readable Data Analysis and Regression: A Second Course in Statistics Frederick Mosteller, John W. Tukey (Contributor) Peter === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
Robert Dawson wrote: As far as random samples are concerned: it is *very* rare for a true random sample, based on an equal-probability sample of the population to which the inference is intended to extend, to be taken. Say a researcher is studying the behaviour of humans. (S)he may take a random sample from the student subject pool, but not from the human race; and yet the paper published will claim to be about "Artificially Inducing The Gag Reflex in Humans", not "Artificially Inducing The Gag Reflex in Students Enrolled in Psych 1000 at Miskatonic U. (Fall '00)". Even if some future world government were to allow researchers access to a list of all humans alive at some moment to use as a sampling frame, most researchers would not disclaim any applicability of their research to those dead or not yet born. The implicit "Platonic" population larger than that available for study is a problem that is always with us; a bad sample is one in which this causes bias. The situation in which the entire actual population is available for study is an extreme case, of course. I don't think the problem is as severe as you imply. Scientific hypotheses are about infinite populations, because scientists draw inferences about processes, theories and so. The paleontologist example is interesting, because it is obviously true that there is something about those 20 individuals as a group which disposes them to drive certain cars (price, salary, whatever). However, the (more) interesting claim is that being a paleontologist makes you drive a certain kind of car. This claim embraces Fred (presently a window cleaner) who becomes a paleontologist (after night school) and suddenly purchases a new car. The population is effectively infinite if you want to embrace paleontologist last year, next year etc. A true random sample is rarely possible and may not be a random sample of the population for which you wish to generalize to. However, generalization does not rest soley on statistics. In fact statistical generalization is necessary, but less important than generalization with respect to theory in most sciences. If we know about (i.e. have useful theories of) lung (or brain, or ...) function and development then we can generalize from one sample with lungs or brains to another sample with lungs (or brains, or ...) more powerfully than through statistics alone. Many of the problems with traditional statistics are really problems of weak theory or weak experimental design. Hypothesis testing can't solve these, but neither can any other statistical method. (Indeed some alternatives to hypothesis testing may be more susceptible to these problems. For example, effect size calculation, meta analysis etc. may place more emphasis on strong theory. This can be good if it forces a researcher back to theory, but I can see little evidence of this, so far.) Thom === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: hyp testing -Reply
I thought everone knew there was a difference in Anatomy between male and female professors! ;) At 12:19 PM 4/20/00 +0100, you wrote: dennis roberts wrote: At 10:32 AM 4/17/00 -0300, Robert Dawson wrote: There's a chapter in J. Utts' mostly wonderful but flawed low-math intro text "Seeing Through Statistics", in which she does much the same. She presents a case study based on some of her own work in which she looked at the question of gender discrimination in pay at her own university, and fails to reject the null hypothesis [no systemic difference in pay between male and female faculty]. She heads the example "Important, but not significant, differences in salaries"; comments (_perhaps_ technically correctly but misleadingly) that "a statistically naive reader could conclude that there is no problem" and in closing states: the flaw here is that ... she has population data i presume ... or about as close as one can come to it ... within the institution ... via the budget or comptroller's office ... THE salary data are known ... so, whatever differences are found ... DEMS are it! the notion of statistical significance in this case seems IRRELEVANT ... the real issue is ... given that there are a variety of factors that might account for such differences (numbers in ranks, time in ranks, etc. etc.) is the remaining difference (if there is one) IMPORTANT TO DEAL WITH ... Yes! This reminds me of a newspaper article and radio news item in the UK this year about female and male professors. They had data to show that there was a large salary difference. However, they went on to say that the largest difference was in Anatomy. I mentioned this to a female colleague of mine (who works in that area) who pointed out there was only one female professor of Anatomy in the UK. Thom === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === _ - | \ Jon Cryer[EMAIL PROTECTED] ( ) Department of Statistics http://www.stat.uiowa.edu\ \_ University and Actuarial Science office 319-335-0819 \ * \ of Iowa The University of Iowa dept. 319-335-0706\ / Hawkeyes Iowa City, IA 52242FAX319-335-3017 | ) - V === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: How to compare kappas?
On Thu, 20 Apr 2000 09:55:13 +0200, Mats Carlsson [EMAIL PROTECTED] wrote: Sorry if this has come up before, but here it goes. I can't say the precise question has come up here, before -- What *is* the precise question? Is there a way I can compare kappa-values? The backgound is as Well, I think kappa is okay as a number to compare 2x2 tables, and nothing bigger. Generalized kappa is very much like Pearson r, isn't it? What are you trying to learn, or what are you trying to show? If you were comparing r, it would be comparison of "correlated correlations" but that is better idea for correlations that are around .8 or lower, than for correlations of .95 -- with the latter, N=100, you might be trying to draw *statistical* conclusions from 2 or 3 discrepant judgments (and the failure to meet the assumptions of asymptotic behavior will invalidate testing). Where you have one rater with his own alternative judgments, do you just have a minor descriptive problem? or is there some independence between judgments, and something going one that is more complicated than moving a boundary between categories? follows: Four physicians has coded a 100 surgical notes. Each physician has coded each surgical note using all four different classifications. (thus coning the same note in four different ways). The classifications has differing numbers of catagories (one has 8, one 10, one 16 and so on). I've calculated the degree of agreement within each classification using generalized kappa. How can I compare these values? I'm not an experienced statistichian, so I'm kind of lost here. I've looked at Fleiss and Haas, but they don't seem to help in this issue. I think you want to compare judgments rather than comparing Kappas, but you need to define a purpose. In what fashion is something expected to be better or worse? -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: power and what it says
dennis roberts wrote: what confidence do we have that the treatment effect is AT LEAST 3 lbs? What Steve said, plus You can't make a Bayesian omlette without breaking some Bayesian eggs. :) === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: power and what it says
On 19 Apr 2000 15:22:14 -0700, [EMAIL PROTECTED] (dennis roberts) wrote: let's say that one designs a simple experiment about the effectiveness of a weight change program ... you set your sights on a power of .7 ... (beta therefore being .3) ... select a two tailed alpha of .05 ... because the situation is such that this program could actually make you gain weight though you hope that it will help you lose weight now, let's assume that you want to detect an effect of 3 pounds ... either gain or loss ... and you therefore go about estimating the n needed to WAIT! What do you mean by "want to detect an effect of 3 pounds"? 1) That might be the critical distance for a t-test -- so the CI just excludes 0, and you are ignoring complicated "power" while being satisfied with 50% power for the point estimate. 2) That might be the underlying effect size which you are willing to assume is expected, or would be important, while testing whether two groups *differ*. That is the usual basis for power computations. 3) That might be an effect size that you want to be SURE of, so you want to test for an effect that has to be GREATER than 3. That will take a somewhat-larger N, depending on the Standard deviation. If it is weight-gain in your pet elephants, then the N won't have to increase much. Those are at least 3 interpretations that are distinct and practical, and they imply different sample Ns; they are not distinguished by the fuzzily stated intention. (Maybe this is one of the advantages of relying on a textbook like Cohen's, compared to using a computer program or a shorter "cookbook" --The book gives you modeling of realistic statements. As Dennis illustrates in this post, a naive approximation to a power requirement is apt to be too vague to be usable.) achieve this goal of being able to reject the null with a p of .7 ... if in fact the null is not true ... and the gap between the null and the center of the treatment effect distribution being 3 ... now, what if you execute your study rigorously with the n you estimated you would need ... and then reject the null with a p = .02 (for illustration purposes only) ... at the moment, don't worry if it is a gain or loss ... just that you reject the null here is my question (you were wondering when i would get to it, right?) WHAT CAN WE SAY, BASED ON THIS REJECTION OF THE NULL, about the treatment effect being 3 lbs OR more ... ? - Of course, that depends on the statement of the Null. If we reject my (3), then the Confidence Interval is all above 3.0. what confidence do we have that the treatment effect is AT LEAST 3 lbs? It depends how much of the CI is above 3.0, doesn't it? -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Free Software to Perform IRT Unfolding Analyses
Dear Colleagues: I am happy to announce that the GGUM2000 software system is now available free of charge. The GGUM2000 system estimates parameters for a family of item response theory models for unfolding. The most general model implemented in the system is the generalized graded unfolding model (GGUM) that was described in the March 2000 issue of Applied Psychological Measurement (pp. 3- 32). In addition to this very general model, the GGUM2000 system also estimates seven other models that can be obtained by constraining item parameters from the GGUM in alternative ways. The system estimates item parameters using marginal maximum likelihood, and person parameters are estimated using an expected a posteriori (EAP) technique. The program allows for binary or polytomous responses, up to 100 items with 2-10 response categories, and up to 2000 respondents. The GGUM2000 system is a DOS-based program and is accompanied by an informative user's manual in WordPerfect 6.1 for Windows format. The program can be downloaded from a web site devoted to IRT models for unfolding. The site is located at: http://www.education.umd.edu/EDMS/tutorials/index.html To obtain the software, click on the "Free Software to Construct IRT Unfolding Models" and you will be taken to the GGUM2000 advertisement. Click on "Download GUMIT2.EXE", then do the same thing on the next screen that appears. The GGUM2000 system is supported by the author. Your feedback is appreciated and will be used to improve subsequent versions of the system. While you are at the web site, please notice the other features available to you. There is an extensive reference page that provides a current list of books and articles on IRT-based approaches to unfolding. There is also an example data sets page from which illustrative test data may be downloaded. Finally, there is a listing of commercially available IRT-based unfolding software. I hope you will stop by the web site soon and get your free copy of GGUM2000. For those readers who may not be familiar with IRT models for unfolding, I have included a clip from the user's manual below. Although it has increased the length of this post substantially, I hope some folks find it useful. Best Wishes, Jim Roberts What is GGUM2000? The GGUM2000 system is a software package that estimates parameters from a family of item response theory (IRT) models known as "unfolding models". These models assume that persons and items can be jointly represented as locations on a latent unidimensional continuum. A single- peaked, nonmonotonic response function is the key feature that distinguishes unfolding IRT models from traditional, "cumulative" IRT models. This response function suggests that a higher item score is more likely to the extent that an individual is located close to a given item on the underlying continuum. In contrast, cumulative IRT models imply that a higher item score is more likely when the location of the individual exceeds that for the item on the latent continuum. The unfolding IRT models implemented in the GGUM2000 system are appropriate for measuring a variety of constructs. For example, the models are well suited to measure individual attitudes using data from either Thurstone or Likert attitude questionnaires (Andrich, 1996; Roberts, 1995; Roberts, Laughlin Wedell, 1999). With these questionnaires, respondents indicate how much they disagree or agree with each statement. The response may be binary (0=disagree, 1=agree) or graded (0=strongly disagree, 1=disagree, 2=slightly disagree, 3=slightly agree, 4=agree, 5=strongly agree), but in each case, higher levels of agreement are coded with successive integers. In the context of attitude measurement, these unfolding models predict more agreement to the extent that an individual's opinion is similar to the sentiment expressed by the item. The individual's location on the continuum is a measure of the individual's attitude and the item's location is a measure of its sentiment (i.e., its scale value). These unfolding models are also relevant to preference measurement situations where a respondent indicates how much he/she prefers each stimulus in a set of I stimuli. Suppose preference judgments are obtained from a sample of respondents using a rating scale with 0 to C scale points where a response of C represents the highest degree of preference. In this situation, one might postulate that respondents and stimuli are jointly located on a unidimensional continuum. The location of a given respondent represents the respondent's "ideal point". A respondent is expected to prefer a stimulus to the extent that it is located close to this ideal point. Finally, the unfolding models implemented in the GGUM2000 system can be used to measure developmental processes that occur in stages
Re: Quick Portable Statisitcs
One of my favorites is "Table Polishing" or "Median Polishing", discussed in Tukey Mosteller's "Green Book", Data Analysis and Regression. David Cross On Wed, 19 Apr 2000 [EMAIL PROTECTED] wrote: I am looking for a source of "portable staistics", i.e. techniques that are easy to remember and use, that can be applied without a calculator or software program or and do not need reference tables. Examples are: Tukey-Duckworth two sample test, and the quadrant sum test for association (Omstead and Tukey). Are there others and is there a reference or source for these types of procedures? Thanks. Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: How to compare kappas?
You might want to check section 10.5 in Agresti, Categorical Data Analysis, 1990, Wiley. On Thu, 20 Apr 2000, Mats Carlsson wrote: Sorry if this has come up before, but here it goes. Is there a way I can compare kappa-values? The backgound is as follows: Four physicians has coded a 100 surgical notes. Each physician has coded each surgical note using all four different classifications. (thus coning the same note in four different ways). The classifications has differing numbers of catagories (one has 8, one 10, one 16 and so on). I've calculated the degree of agreement within each classification using generalized kappa. How can I compare these values? I'm not an experienced statistichian, so I'm kind of lost here. I've looked at Fleiss and Haas, but they don't seem to help in this issue. /Mats Carlsson === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Sample Size Question
I am somewhat green when it comes to stats and these may be basic questions but here goes. I am trying to determine the correct sample size for a 1 sample t. The population is 8,000 and I realize the n=30 rule, but what if this is descriptive stats with only two possibilites (y/n answer) Am I using the wrong tool? Thanks Terry Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Hypothesis testing and magic - episode 2
On Thu, 20 Apr 2000 10:48:38 +0100, "P.G.Hamer" [EMAIL PROTECTED] wrote: snip, interesting stuff about, proper age-adjusted life-tables, with proper adjustment of base-line Ns, would not show an increase in competing causes of death BTW an even greater problem in animal testing seems to be due using feed-on-demand systems. The little critters are usually bored out of their minds and overeat, causing a variety of health problems. So any drug that makes them mildly unwell can easily spoil their appetite -- and make them look healthier. I never knew that! But that might be similar, or that might underlie another thing that I once was told about laboratory rats. I had been impressed by the newspaper reports that rats lived longer if they were underfed, i.e., on very-low-calorie diets. Then my lab-tech friends told me that the lab rats tended to live to a certain *size* rather than age. The starved ones took 30% longer to reach that same size. So my friends were not at all impressed by those news reports. [ There may be newer data that are more impressive.] I later realized that humans and dogs are in the minority among mammals, in that we achieve "adult" size and then stop growing. For elephants and moose and bears, etc., the stereotype from childhood nature stories is not all invention. If the clever "old man of the woods/jungle/forest" is the wisest and the oldest, he is likely to be the biggest, because most critters never stop growing. That seemed to tie in to the rat-life-spans, too. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Gauss guide on line
In article [EMAIL PROTECTED], [EMAIL PROTECTED] says... Does anybody know a site where there is documentation on Gauss on line like the one we can find on TSP ? On my site (http://faculty.washington.edu/ezivot/gaussfaq.htm) I have links to several free gauss guides. ez === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Sample Size Question
On Thu, 20 Apr 2000 terry [EMAIL PROTECTED] wrote: I am somewhat green when it comes to stats and these may be basic questions but here goes. I am trying to determine the correct sample size for a 1 sample t. The population is 8,000 and I realize the n=30 rule, but what if this is descriptive stats with only two possibilites (y/n answer) Am I using the wrong tool? You need a few more constraints than you have described. By "1 sample t" do you mean that you want to test a null hypothesis using a t-test, or that you want a confidence interval based on the t distribution? You describe your observed variable as dichotomous, which implies that you're trying to estimate a proportion. If you're testing a hypothesis, what is the value of the population proportion specified by the hypothesis? And what minimum distance from that value do you want to be able to distinguish, with what probability? Equivalently, if you're interested in a confidence interval, how narrow do you want the interval to be, with what degree of confidence (often expressed as a %, like 95%)? You write, "what if this is descriptive stats". If this is the case, why are you dealing with t at all? Ordinarily one invokes the t distribution (either as a t-test or as the basis for a confidence interval) when one is trying to infer something about a population (your 8,000, I take it), not if the enterprise is only descriptive. You mention a "n = 30 rule". What, precisely, do you understand by that phrase? (One can imagine a variety of "rules" that might be so described, most of them rather idiosyncratic. It is not clear that any of them would actually apply in your situation; although it is quite possible that some folks would insist on one or another.) Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 603-535-2597 184 Nashua Road, Bedford, NH 03110 603-471-7128 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===