Heaven help us all!
Want a nice classroom exercise? If I weren't giving you the CNN URL, I wouldn't blame you for accusing me of making this up! http://www.cnn.com/2000/HEALTH/diet.fitness/08/21/fat.supplement/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: 2-level factorial design manual analysis
I give a tutorial (in the form of two heavily annotated computer programs) that illustrates a simple general matrix algebra ap- proach to computing sums of squares in unbalanced (and balanced) analysis of variance. The tutorial is in terms of Yates' ap- proach to visualizing such computations. (The programs are writ- ten in SAS IML, but one need not understand SAS or IML to under- stand the tutorial.) The tutorial is available at http://www.matstat.com/ss/ --- Donald B. Macnaughton MatStat Research Consulting Inc [EMAIL PROTECTED] Toronto, Canada --- Brian A. Bucher wrote (on 00/8/22) Bob Wheeler ([EMAIL PROTECTED]) wrote: : If you just want an answer, use a statistical : package such as MiniTab. This ensures that the : computations will at least be correct, and that : you will be provided with the appropriate error : estimates. Well, at the same time I want an answer I also want to learn the basic mechanics and have a general understanding of what I'm doing. Since there's a tradeoff between spending my time learning details about DOEs/stats and working on my other projects, it becomes an optimization task in itself! :) : If you must do it yourself, perhaps the best thing : for you and for this particular problem is to use : Yates' algorithm. It should be described in BHH, : but if not, you will find it in many other : statistical texts. Thanks for the info! Brian : Brian A Bucher wrote: : : I'd like to learn how to analyze a 4-factor, 2-level full- : factorial (and maybe fractional factorial) designs. My op- : tions at this moment are: : 1. Use Excel : 2. Learn and use R : : Could someone give me an estimate on how much time it will : take for me to accomplish either of these? I'm presently : reading/reviewing Statistics for Experimenters by Box, : Hunter, Hunter. : : Thanks for any info, : Brian = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student fears (was: Histograms etc.)
In article 41890A69A4EAD211A74600805F9F86F90276@ESC-S1, Olsen, Chris [EMAIL PROTECTED] wrote: Herman and All -- On the other hand, the type of clear measurements and formulation in these fields are not generally available in psychology. So they make the massive mistake of letting statistics do their thinking for them. A typical example of this is to convert their data to normality, and to scale it to a given mean and variance. This means that the data is a descriptive measure on this ONE group of individuals, or possibly even only of the sample taken, and that no conclusions should be drawn (although they are often done) about others. It is especially bad when they allow their moral judgments to cloak their models. If you will pardon the interloping of a lurker in this thread, I don't quite understand the above paragraph, or at least I don't see how the elements fit into a cohesive whole. I certainly do not wish to defend psychology, but am wondering why psych is singled out for these alleged sins. 1) I am unaware that there is a "natural" metric, ordained by mother nature, that would govern the choice of distribution for a set of data. There can be many natural metrics, but they are all determined without regard for a frequency distribution. In most cases, metrics are determined by additivity properties. Most of the units I am aware of in physics either have this property, or the reciprocals do, or they appear as coefficients in equations. The others are logarithmic metrics, with the scale so that equal differences correspond to equal ratios on an additive scale. Such is the case with scales for sound level, stellar magnitude, and earthquake intensity. There are some, such as hardness of minerals, but I doubt that anyone would use Moh's scale in a regression equation. If normal works for the task at hand, I don't see the problem. The choice of origin and size of the unit, such as temperature, is still the specification of a linear scale. At least it was intended to be that way. It would seem to me that transformations of variables, while perhaps clouding and making more difficult the interpretation of units, is practiced in the "hard" sciences as well. The transformations are natural transformations. There are very few kinds of transformations considered at all, and these all are related to absolute considerations, not some probability distribution of the results. The only exception which comes to mind offhand is "half life", and here it is still a value (time) on an absolute scale. The type of values encountered may influence choice of base points and units, but the type of scale is not so affected. In psychology one would be hard pressed to justify many of the quantities as possessing units, but presuming such units existed (perhaps time-to-proficiency on some discrimination task) I don't know why the psychology types are performing any worse a scaling sin than, say, using logs to convert concentrations of hydrogen ions into pH. The psychology types set up their scales so that a certain proportion of the group for norming lies in a certain interval; this is quite different than the choice of linear or logarithmic scale. A chemist is given the one piece of information that the pH scale gives the negative logarithm of the hydrogen ion concentration, and now knows the meaning of any pH value. He will know how much to use to titrate a solution, for example. If a scientist is told that three measurements are such that the middle one is equidistant from the other two, one know what this means in absolute terms. If a psychologist is told that one score on a scale determined by probabilities is between two others, this does not give that type of information; see further. 2) I don't understand how the conversion to normality "means" that "the data is a descriptive measure on this ONE group..." It would seem to me that generalizability is a function of sampling or experimental design, not scaling and transforming. There would be no more or less generalizability of a conclusion by virtue (or lack thereof) of a re-scaling or reexpression of the data. Suppose psychologist A sets up a scale by studying college graduates, and psychologist B by studying elementary school students. There will not be a simple relationship between these scales. On the other hand, if one geographer measures length in miles, another in kilometers, another in versts, there is no major problem; twice as long is still twice as long. If we encounter extraterrestrial physicists or chemists or astronomers, setting up the relations between the scales will be relatively simple. With one psychologist setting up a scale based on the normal distribution in China and another in Algeria, who knows? 3) I don't quite understand how the psychologists are "letting the statistics do their thinking for them," or at least how this might distinguish psychologists from, say, geologists or
Re: Which statistical test?
On Sun, 20 Aug 2000, jkroger wrote in part: I want to show that in some conditions, the difference between the length of A's response and B's response is greater than in other conditions: duration(A) - duration(B) is significantly greater in some conditions. I tried a t-test for each condition, subtracting B from A at each interval and using a t-test to determine if the resulting sample differed from 0. Yes, but this does not address the question you said you want to show: which is not that d(A) - d(B) differs from zero, but that (d(A) - d(B)) in condition 1 (say) (d(A) - d(B)) in condition 2. (As an aside, using a t-test would be arguably appropriate for a planned comparison; but it is much too sensitive for pursuing comparisons that were suggested by the fall of the data, so to speak, which I gather is the case in the present instance.) Presumably you have the mean durations for each cell of the design from the ANOVA you mentioned in a subsequent post, and appropriate error mean squares for testing assorted null hypotheses (or constructing confidence intervals, or both). Plug these into a post hoc contrast analysis (I'd recommend the method of Scheffe', since the phenomenon appears to be one you noticed in analyzing the data, not one you anticipated) for the contrast d(A1) - d(B1) - d(A2) + d(B2) (where for the hypotheses the d's represent population means, and for the analysis one would substitute the observed sample means), for which the null hypothesis is that the value of the contrast is zero and your conjecture is that the value is positive (although, since it IS a post hoc contrast, you should test it against a two-sided alternative hypothesis). You may in fact have a number of such conjectures that you want to pursue; the virtue of the Scheffe' method (and criterion) is that the Type I error rate is "experimentwise". On Sun, 20 Aug 2000, jkroger wrote: I have two timecourse measures, A and B. At 20 consecutive intervals, A and B are measured, and the results are plotted. Both signals rise quickly to about the same height, then fall. Sometimes A stays elevated longer. There are eight separate trials (representing eight conditions), producing eight pairs of curves. I want to show that in some conditions, the difference between the length of A's response and B's response is greater than in other conditions: duration(A) - duration(B) is significantly greater in some conditions. I tried a t-test for each condition, subtracting B from A at each interval and using a t-test to determine if the resulting sample differed from 0. Unfortunately, in a couple conditions where it appears the A response is about the same as the B response, but the t-test is so sensitive that even small differences between A and B produce significance. The t value for the condition (#1) which it is important to demonstrate has a longer A duration (as is clearly obvious on inspection) is over 38. The conditions in which A - B is minimal still have significant t's of 5 or 8 (when a p of .05 requires a t of around 2). So according to the test I've chosen, A-B in almost all of the conditions is significant. What test will allow me to reveal the much greater significance of condition #1 relative to the others? I thought of chi-square (sum(A), sum(B) for all intervals; crossed with 1-8), but as chi-square is for frequency data, I'm not sure if it's applicable here. Thanks for any guidance, Jim Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 603-535-2597 184 Nashua Road, Bedford, NH 03110 603-471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Plotting Distribution!!!
For openers, you're going to have to describe your problem with a good deal more precision, in order for anyone to provide any kind of useful help. On Fri, 18 Aug 2000, Veeral Patel wrote: I have a data whose histogram has a unique distribution exhibited by it. I am trying to fit different curves to the data and to see which one has the best fit. How did you have in mind assessing goodness of fit? None of your subsequent remarks address this point. The first one I am trying is gamma, i got my optimum alpha and beta values. And then simply fed my data (x values) into the gamma distribution function and got my f(x) values. At this point it would be reasonable to ask how well f(x) for each x agrees with the data h(x) [for your initial histogram of x ]. Now the question is how do I plot these. It is not entirely clear what you want to mean by "these". What information do you want to display, and what utility do you want the display to have? If you mean "plot f(x) vs. x", what do you expect the plot to tell you? I looked at books and for the distribution plots they like have f(x) on the vertical axis and quantile values on the horizontal. Now how do I obtain the quantiles or is there another way to do the plot of f(x) and x?? because if i plot f(x) and x i get weird looking lines on the graph. What do you mean by "plot f(x) and x"? If you were plotting f(x) vs. x (as one would expect), you should produce a plot of the gamma function. On the other hand, such a plot would provide no obvious information about how well the gamma function fits the original histogram. Perhaps there are some salient features of your analysis that you haven't yet told us. It is impossible to diagnose problems that are described merely as "weird looking". Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264 603-535-2597 184 Nashua Road, Bedford, NH 03110 603-471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student fears (was: Histograms etc.)
At 02:22 PM 8/22/00 -0500, Herman Rubin wrote: No geographer would take the heights of mountains and convert them to a probability scale. i beg to differ ... for, it is not totally an uninteresting question that someone might ask ... for all mountains ... what is the p value for selecting at random from all mountains ... one that is a height of 10,000 feet or more ... or, just to ask: what would a frequency distribution look like for all mountains ... say ... where the scale on the baseline goes from 0 to 2500, 2501 to 5000, etc. ... good geographers ... would/should have some knowledge of this ... not that they would spend their lives doing these tabulations but, it is part of the knowledge base in which they work if you have lived ONLY in pennsylvania ... some mountains look pretty TALL while others seem rather short ... while, to those who have lived in florida all their lives (and never seen pictures or surfed the WWW) ... ALL of these look like the alps ... but, if you lived in the alps ... the mountains of pennsylvania, even the tallEST ones, look like little bumps on the horizon = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
teaching software for stats/maths
Hi All Here at my university, the Student Learning Centre (SLC) provides additional maths and statistics tuition for those struggling with first year courses. (They teach many other things as well.) The SLC wants to know if there are any software teaching packages that they could get for students, for example on a CD, that the student could take away and work on themselves (or run from the university network). In particular, the SLC is interested in packages or tutorials that are useful for bridging from high school level (years 11 12) to first year university level. Of particular interest are the subjects calculus, and statistics. Price is not important at this stage in the search. Does anyone have experience using such software? Can anyone suggest products or URLs? Any help would be greatly appreciated. Thanks in advance. -- Andrew McLachlan, PhD Student. [EMAIL PROTECTED] Ecology Entomology Group, P.O. Box 84, Lincoln University, Canterbury, New Zealand. ph. +64 3 325-2811, fax. +64 3 325-3844. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
lowess
Hi, can someone explain to me the strength and weaknesses behind the lowess regression? In particular, what has its applications been in the biological sciences? Also, how good are the bootstrap methods of computing confidence regions? Thanks, Ming Hsu = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Book for second course in undergrad stats.
From Experience the Edwards Textbooks are quite appropriate for your needs (at least I believe they are). My choice would be the Fifth Edition of his Experimental Design. I am not, however, familiar with how readily available the books are. Further, they do not cover MANOVA, but it does cover all other topics you wanted and is quite good in doing so. K2 "Paul W. Jeffries" wrote: Dear list members, I would appreciate recommendations for a text to use in an advanced stats course. The students are undergraduate psychology majors who have taken the department's intro stats course. Since their math background is limited, I would like a book that develops ideas intuitively rather than mathematically. I would like to cover at least multiple regression/correlation, factorial ANOVA, repeated-measures ANOVA, MANOVA, ANCOVA/MANCOVA. Any suggestions for textbooks are welcome. Thank you, Paul W. Jeffries Department of Psychology SUNY--Stony Brook Stony Brook NY 11794-2500 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: teaching software for stats/maths
In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Andrew McLachlan) wrote: Hi All Here at my university, the Student Learning Centre (SLC) provides additional maths and statistics tuition for those struggling with first year courses. (They teach many other things as well.) The SLC wants to know if there are any software teaching packages that they could get for students, for example on a CD, that the student could take away and work on themselves (or run from the university network). In particular, the SLC is interested in packages or tutorials that are useful for bridging from high school level (years 11 12) to first year university level. Of particular interest are the subjects calculus, and statistics. Price is not important at this stage in the search. Does anyone have experience using such software? Can anyone suggest products or URLs? Any help would be greatly appreciated. Thanks in advance. -- Andrew McLachlan, PhD Student. [EMAIL PROTECTED] Ecology Entomology Group, P.O. Box 84, Lincoln University, Canterbury, New Zealand. ph. +64 3 325-2811, fax. +64 3 325-3844. You could try "Fathom: Dynamic Statistics Software". The URL is listed at http://www.kdcentral.com -- T.S. Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com _ Get paid to write reviews! http://recursive-partitioning.epinions.com Sent via Deja.com http://www.deja.com/ Before you buy. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
within group agreement for nominal/ordinal data
I'm trying to test whether a variable measures a group-level property, and so I'm looking for an analog to eta-squared, intra-class correlation etc for nominal or ordinal data. I have data comprising 2000 workplaces, within samples of individuals drawn from each (n=20,000). One variable has 4 categories (agree-neutral-disagree, don't know). 1. How can I estimate how much of the total variability derives from between groups (workplaces) and within groups? 2. Is there a rule-of-thumb for what would be evidence of strong within-group agreement? 3. Can I do this in SPSS? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =