RE: Student's t vs. z tests
On 24 Apr 2001, Mark W. Humphries wrote: I concur. As I mentioned at the start of this thread, I am self-learning statistics from books. I have difficulty telling what is being taught as necessary theoretical 'scaffolding' or 'superceded procedures', and what one would actually apply in a realistic case. I would love a textbook which walks through a realistic analysis step by step, while providing the 'theoretical scaffolding' as insets within this flow. Its frustrating to read 50 pages only to find that 'one never actually does it this way'. Jim Clark responded: My gut feeling is that this would be a terribly confusing way to _teach_ anything. Students would be started with a (relatively) advanced procedure and at various points have to be taken aside for lessons on sampling distributions, probability, whatever, and then brought back somehow to the flow of the current lesson. There is a logic to the way that statistics is developed in most intro texts (although some people might not agree with that logic in the absence of a direct empirical test of its efficacy). It would be an interesting study of course, and not that difficult to set up with some hypertext-like instruction. Students could be led through the material in a hierarchical manner or entered at some upper level with recursive links to foundational material. We might find some kind of interaction, with better students doing Ok by either procedure (and perhaps preferring the latter) and weaker students doing Ok by the hierarchical procedure but not the unstructured (for want of a better word) method. At least, that is my prediction. [snip] You're likely right. Currently, as I learn each new concept or statistical procedure, I test my understanding by writing small snippets of code (in awk would you believe). I get perplexed when I come across descriptions which seem heuristic, rather than algorithmic. i.e. I just started the chapter on the analysis of category data. The description of the chi-squared statistic ends with The approximation is very good provided all expected cell frequencies are 5 or greater. This is a conservative rule, and even smaller expected frequencies have resulted in good approximations. Such a statement makes me wonder if modern statistical methods actually use this particular approximation-cum-heuristic, or is there a more 'definite' algorithm. Am I learning 'real world' statistics, or a sanitized textbook version? And how can I tell? :) Cheers, Mark = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
Date: Fri, 20 Apr 2001 13:02:57 -0500 From: Jon Cryer [EMAIL PROTECTED] Could you please give us an example of such a situation? Consider first a set of measurements taken with a measuring instrument whose sampling errors have a known standard deviation (and approximately normal distribution). Sure. Suppose we use an instrument such as a micrometer, electronic balance or ohmmeter to measure a series of similar items. (For concreteness, suppose they are components coming off a mass production machine such as a screw machine.) As long as the measuring instrument isn't broken, we don't have to conduct an extensive series of repeated measurements every time we use it to determine its error variance with a part of the given conformation. Normality is also reasonably likely under those circumstances. Slightly more sophisticated version of the same: Supposed the operating characteristics of such a machine can be characterized by slow drift (due to tool wear, heat expansion of machine parts, settings that gradually shift, etc.) plus independent random noise that is approximately normal. It is plausible in that setting that the variance of measurements on a short series of parts would be fairly constant. (I'm not just making this up; it's consistent with my own experience in my former career as a machinist.) Again, you don't have to calibrate the error variance of the measurement (in this case, average measurement of several successive parts to estimate the current system mean) every time you do it. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
These examples come the closest I have seen to having a known variance. However, often measuring instruments, such as micrometers, quote their accuracy as a percentage of the size of the measurement. Thus, if you don't know the mean you also don't know the variance. Jon Cryer At 09:28 AM 4/23/01 -0400, you wrote: Date: Fri, 20 Apr 2001 13:02:57 -0500 From: Jon Cryer [EMAIL PROTECTED] Could you please give us an example of such a situation? Consider first a set of measurements taken with a measuring instrument whose sampling errors have a known standard deviation (and approximately normal distribution). Sure. Suppose we use an instrument such as a micrometer, electronic balance or ohmmeter to measure a series of similar items. (For concreteness, suppose they are components coming off a mass production machine such as a screw machine.) As long as the measuring instrument isn't broken, we don't have to conduct an extensive series of repeated measurements every time we use it to determine its error variance with a part of the given conformation. Normality is also reasonably likely under those circumstances. Slightly more sophisticated version of the same: Supposed the operating characteristics of such a machine can be characterized by slow drift (due to tool wear, heat expansion of machine parts, settings that gradually shift, etc.) plus independent random noise that is approximately normal. It is plausible in that setting that the variance of measurements on a short series of parts would be fairly constant. (I'm not just making this up; it's consistent with my own experience in my former career as a machinist.) Again, you don't have to calibrate the error variance of the measurement (in this case, average measurement of several successive parts to estimate the current system mean) every time you do it. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
At 1:18 PM -0500 23/4/01, Jon Cryer wrote: These examples come the closest I have seen to having a known variance. However, often measuring instruments, such as micrometers, quote their accuracy as a percentage of the size of the measurement. Thus, if you don't know the mean you also don't know the variance. Certainly many measurements do have errors that are best given as a percent of the reading. In such cases, the error usually is a constant percent, not a constant absolute amount. To put it another way, the log of the readings has a normally distributed error that is independent of the reading. So you should perform all your analyses on the log-transformed variable, and express all your outcomes as percent differences or changes. Otherwise your analyses are riddled with non-uniform error (heteroscedasticity). Will = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
Jon Cryer wrote: These examples come the closest I have seen to having a known variance. However, often measuring instruments, such as micrometers, quote their accuracy as a percentage of the size of the measurement. Thus, if you don't know the mean you also don't know the variance. You do if you log-transform... -Robert Dawson = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
the fundamental issue here is ... is it reasonably to expect ... that when you are making some inference about a population mean ... that you will KNOW the variance in the population? i suspect that the answer is no ... in all but the most convoluted cases ... or, to say it another way ... in 99.99% (or more) of the cases where we talk about making an inference about the mean in a population ... we have no more info about the variance than we do the mean ... ie, X bar is the best we can do as an estimate of mu ... and, S^2 is the best we can do as an estimate of sigma squared ... this is why i personally don't like to start with the case where you assume that you know sigma ... as a simplification ... since it is totally unrealistic start with the realistic case ... even if it takes a bit more doing to explain it = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
dennis roberts wrote: the fundamental issue here is ... is it reasonably to expect ... that when you are making some inference about a population mean ... that you will KNOW the variance in the population? No, Dennis, of course it isn't - at least in the social sciences and usually elsewhere as well. That's why I don't recommend teaching this (recall my comments about dangerous scaffolding) to the average life-sciences student who needs to know how to use the test and what it _means_, but not the theory behind it. In the case of the student with some mathematical background, who may actually need to do something theoretical with the distribution one day (and may actually have the ability to do so) I would introduce t by way of Z. A rough guide; If this group of students know what a maximum-likelihood estimator is, and have been or will be expected to derive, from first principles, a hypothesis test or confidence interval for (say) a singleton sample from an exponential distribution, then they ought to be introduced by way of Z. If not, then: (a) don't do it at all, or (b) put your chalk down and talk your way through it as an Interesting Historical Anecdote without giving them anything to write down. Draw a few pictures if you must. Or (c) give them a handout with DO NOT USE THIS TECHNIQUE! written on it in big letters. (I've tried all four approaches, as well as the wrong one.) -Robert Dawson = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
I can't help but be reminded of learning to ride a bicycle. 99.% of people ride one with two wheels (natch!) - but many children do start to learn with training wheels.. Alan dennis roberts wrote: the fundamental issue here is ... is it reasonably to expect ... that when you are making some inference about a population mean ... that you will KNOW the variance in the population? i suspect that the answer is no ... in all but the most convoluted cases ... or, to say it another way ... in 99.99% (or more) of the cases where we talk about making an inference about the mean in a population ... we have no more info about the variance than we do the mean ... ie, X bar is the best we can do as an estimate of mu ... and, S^2 is the best we can do as an estimate of sigma squared ... this is why i personally don't like to start with the case where you assume that you know sigma ... as a simplification ... since it is totally unrealistic start with the realistic case ... even if it takes a bit more doing to explain it = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = -- Alan McLean ([EMAIL PROTECTED]) Department of Econometrics and Business Statistics Monash University, Caulfield Campus, Melbourne Tel: +61 03 9903 2102Fax: +61 03 9903 2007 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
Hi On Fri, 20 Apr 2001, dennis roberts wrote: At 10:58 AM 4/20/01 -0500, jim clark wrote: What does a t-distribution mean to a student who does not know what a binomial distribution is and how to calculate the probabilities, and who does not know what a normal distribution is and how to obtain the probabilities? good question but, NONE of us have an answer to this ... i know of NO data that exists about going through various different routes and then assessing one's understanding at the end Just a couple of comments. (1) Not having specific evidence on a pedagogical question does not mean that any approach is just as justified as any other approach. We should base our practice on what information is available, appreciating its possible limitations (e.g., personal experience, cognitive models of concept learning, general principles of teaching, principles of task analysis, logic, feedback from students, ...). Only the very naivest sort of crude empiricism would dictate that specific findings are the only worthwhile factors in a science-based practice. (2) In general I suspect that there is much evidence supportive of a task-analytic approach to teaching mathematics, although I have not looked at the literature for many years. That is, mathematics, perhaps more than many other areas, requires a sensitivity to the kinds of prior knowledge presumed by the new knowledge to be acquired. to say that we know that IF we want students to learn about and understand something about t and its applications ... one must: 1. do binomial first ... 2. then do normal 3. then do t is mere speculation Only if you completely devalue many years of experience teaching a subject matter, a background in cognitive and educational psychology, the possibility that there might be certain logical entailments involved among the topics, and so on. Your statement makes it sound as though one is equally justified to promote any of the 3! = 6 possible permutations of all 3 tasks + the 3x2! = 6 permutations of 2 tasks + the 3 possible single tasks (+ the 1 possible 0 tasks, if one wants to be comprehensive). without some kind of an experiment where we try various combinations and orderings ... and see what happens to student's understandings, we know not of what we assert (including me) This is just too nihilistic a view of knowledge and teaching. There are certain constraints. For example, one normally expects that learning the alphabet is better done before learning words. Would you want an experiment before concluding that presenting the calculus of statistics is probably not the best approach to intro stats in non-mathematical disciplines? off the top of my head, i would say that one could learn alot about a t distribution studying it ... are you suggesting that one could not learn about calculating probabilities within a t distribution without having worked and learned about calculating probabilities in a normal distribution? as far as i know, the way students learn about calculating probabilities is NOT by any integrative process ... rather, they are shown a nice drawing of the normal curve, with lines up at -3 to +3 ... with values like .02, .14, .34 ... etc. within certain whole number boundaries under the curve, and then are shown tables on how to find areas (ps) for various kinds of problems (areas between points, below points, above points) if there is something real high level and particularly intuitive about this, let me know. you make it sound like there is some magical learning here ... some INductive principle being established ... and, i don't see it Of course you left off my starting point. For the binomial distribution, students can readily be shown how to actually calculate the probabilities in the sampling distribution. They do not have to take it purely on faith. Then when we move to the normal or t or F or whatever, we can say that these distributions are produced by more sophisticated mathematical techniques that are beyond our capabilities, but _analogous_ to what students did for the binomial. This is the foundation (with its own foundation in an adequate understanding of probability and counting principles). The normal distribution is the bridge between this foundation and the t-distribution (then F, whatever). I can't speak for other disciplines, but at least in psychology and education, it is probability worth noting that an understanding of the normal distribution is valuable in and of itself, irrespective of its role in hypothesis testing. Examples of normal distributions would occur in testing (e.g., understanding different test score transformations, such as T-scores, computed percentiles, and the like), in understanding certain transformations (e.g., of skewed reaction time distributions), and in perception (e.g., d-prime measures of sensitivity). i don't see one whit of difference between this and ... showing some t
Re: Student's t vs. z tests
Alan: Could you please give us an example of such a situation? "Consider first a set of measurements taken with a measuring instrument whose sampling errors have a known standard deviation (and approximately normal distribution)." Jon At 01:10 PM 4/20/01 -0400, you wrote: (This note is largely in support of points made by Rich Ulrich and Paul Swank.) I disagree with the claim (expressed in several recent postings) that z-tests are in general superseded by t-tests. The t-test (in simple one-sample problems) is developed under the assumption that independent observations are drawn from a normal distribution (and hence the mean and sample SD are independent and have specific distributional forms). It is widely applicable because it is fairly robust against violations of this assumptions. However, there are also situations in which the t-test is clearly inferior to a z-test. Consider first a set of measurements taken with a measuring instrument whose sampling errors have a known standard deviation (and approximately normal distribution). In this case, with a few observations (let's say 1 or 2, if you want to make it very clear), the z-based procedure that uses the known SD will give much more useful tests or intervals than a t-based procedure (which estimates the SD from the data at hand). snip Alan Zaslavsky Harvard Med School = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
Alan: I don't understand your comments about the estimation of a proportion. It sounds to me as if you are using the estimated standard error. (Surely you are not assuming a known standard error.) You are presumably, also using the normal approximation to the binomial (or perhaps the hypergeometric.) To do so requires a "large" sample size in which case it doesn't matter whether you use the normal or t distribution. Both would be acceptable approximations. (and both would be approximations.) So what is your point? Once more I think you need to separate the issues of what statistic to use and what distribution to use. Jon At 01:10 PM 4/20/01 -0400, you wrote: (This note is largely in support of points made by Rich Ulrich and Paul Swank.) snip Now consider estimation of a proportion. Using the information that the data consist only of 0's and 1's, and an approximate value of the proportion, we can calculate an approximate standard error more accurately (for p near 1/2) than we could without this information. The interval based on the usual variance formula p(1-p) and the z distribution is therefore better than the one based on the t distribution. This is why (as Paul pointed out) everybody uses z tests in comparing proportions, not t tests. The same applies to generalizations of tests of proportions as in logistic regression. snip Alan Zaslavsky Harvard Med School = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
alan and others ... perhaps what my overall concern is ... and others have expressed this from time to time in varying ways ... is that 1. we tend to teach stat in a vacuum ... 2. and this is not good the problem this creates is a disconnect from the question development phase, the measure development phase, the data collection phase, and THEN the analysis phase, but finally the "what do we make of it" phase. this disconnect therefore means that ... in the context of our basic stat course(s) ... we more or less have to ASSUME that the data ARE good ... because if we did not, like you say we would go dig ditches ...at this point, we are not in much of a position to question the data too much since, whether it be in a book we are using or, some of our own data being used for illustrative examples ... there is NOTHING we can do about it at this stage. it is not quite the same as when a student comes in with his/her data to YOU and asks for advice ... in this case, we can clearly say ... your data stink and, there is not a method to "cleanse" it but in a class about statistical methods, we plod on with examples ... always as far as i can tell making sufficient assumptions about the goodness of the data to allow us to move forward bottom line: i guess the frustration i am expressing is a more general one about the typical way we teach stat ... and that is in isolation from other parts of the question development, instrument construction, and data collection phases ... what i would like to see .. which is probably impossible in general (and has been discussed before) ... it a more integrated approach to data collection ... WITHIN THE SAME COURSE OR A SEQUENCE OF COURSES ... so that when you get to the analysis part ... that we CAN make some realistic assumptions about the quality of the data, quality of the data collection process, and make sense of the question or questions being investigated At 02:01 PM 4/20/01 +1000, Alan McLean wrote: All of your observations about the deficiencies of data are perfectly valid. But what do you do? Just give up because your data are messy, and your assumptions are doubtful and all that? Go and dig ditches instead? You can only analyse data by making assumptions - by working with models of the world. The models may be shonky, but they are presumably the best you can do. And within those models you have to assume the data is what you think it is. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
nice note mike Impossible? No. Requiring a great deal of effort on the part of some cluster of folks? Definitely! absolutely! There is some discussion of this very possibility in Psychology, although I've yet to see evidence of fruition. A very large part of the problem, in my mind, is breaking out of established stereotypes of what a Stats and Methods sequence should look like, and then finding the materials to support that vision. i think it may ONLY be possible within a large unit that requires their students to take their methods courses ... design, testing, statistics, etc. i think it will be very hard for a unit that PROVIDES SUBSTANTIAL cross unit service courses ... to do this for example, in our small edpsy program at penn state, most of the courses in research methods, measurement, and stat ... are for OTHERS ... even though our own students take most of them too. if we redesigned a sequence that would be more integrative ... for our own students, students from outside would NOT enroll for sure ... because they are looking for (or their advisors are) THE course in stat ... or THE course in research methods ... etc. they are not going to sit still for say a two/3 course sequence If I could find good materials that were designed specifically to support the integrated sequence, I might be able to get others to go along with it. i think the more serious problem would be agreeing what should be contained in what course ... that is, the layout of this more integrative approach if that could be done, i don't think it would be that hard to work on materials that fit the bill ... by having different faculty write some modules ... by finding good web links ... and, gathering a book of readings what you want is NOT necessarily a BOOK that does it this way but, a MANUAL you have developed over time that accomplishes the goals of this approach It can be done, but it will require someone with more energy and force of will than I. i doubt i have the energy either ... Mike *** Michael M. Granaas Associate Professor[EMAIL PROTECTED] Department of Psychology University of South Dakota Phone: (605) 677-5295 Vermillion, SD 57069 FAX: (605) 677-6604 *** All views expressed are those of the author and do not necessarily reflect those of the University of South Dakota, or the South Dakota Board of Regents. _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
However, rather than do that why not right on to F? Why do t at all when you can do anything with F that t can do plus a whole lot more? At 10:58 PM 4/19/01 -0400, you wrote: >students have enough problems with all the stuff in stat as it is ... but, >when we start some discussion about sampling error of means ... for use in >building a confidence interval and/or testing some hypothesis ... the first >thing observant students will ask when you say to them ... > >assume SRS of n=50 and THAT WE KNOW THAT THE POPULATION SD = 4 ... is: if >we are trying to do some inferencing about the population mean ... how come >we know the population sd but NOT the mean too? most find this notion >highly illogical ... but we and books trudge on ... > >and they are correct of course in the NON logic of this scenario > >thus, it makes a ton more sense to me to introduce at this point a t >distribution ... this is NOT hard to do ... then get right on with the >reality case > >asking something about the population mean when everything we have is an >estimate ... makes sense ... and is the way to go > >in the moore and mccabe book ... the way they go is to use z first ... >assume population is normal and we know sd ... spend alot of time on that >... CI and logic of hypothesis testing ... THEN get into applications of t >in the next chapter ... > >i think that the benefit of using z first ... then switching to reality ... >is a misguided order > >finally, if one picks up a SRS random journal and looks at some SRS random >article, the chance of finding a z interval or z test being done is close >to 0 ... rather, in these situations, t intervals or t tests are almost >always reported ... > >if that is the case ... why do we waste our time on z? > > > >At 08:52 PM 4/18/01 -0300, Robert J. MacG. Dawson wrote: >>David J Firth wrote: >> > >> > : You're running into a historical artifact: in pre-computer days, >> using the >> > : normal distribution rather than the t distribution reduced the size >> of the >> > : tables you had to work with. Nowadays, a computer can compute a t >> > : probability just as easily as a z probability, so unless you're in the >> > : rare situation Karl mentioned, there's no reason not to use a t test. >> > >> > Yet the old ways are still actively taught, even when classroom >> > instruction assumes the use of computers. >> >> The z test and interval do have some value as a pedagogical >>scaffold with the better students who are intended to actually >>_understand_ the t test at a mathematical level by the end of the >>course. >> >> For the rest, we - like construction crews - have to be careful >>about leaving scaffolding unattended where youngsters might play on it >>in a dangerous fashion. >> >> One can also justify teaching advanced students about the Z test so >>that they can read papers that are 50 years out of date. The fact that >>some of those papers may have been written last year - or next- is, >>however, unfortunate; and we should make it plain to *our* students that >>this is a "deprecated feature included for reverse compatibility only". >> >> -Robert Dawson >> >> >>= >>Instructions for joining and leaving this list and remarks about >>the problem of INAPPROPRIATE MESSAGES are available at >> http://jse.stat.ncsu.edu/ >>= > >_ >dennis roberts, educational psychology, penn state university >208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] >http://roberts.ed.psu.edu/users/droberts/drober~1.htm > > > >= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >= > Paul R. Swank, PhD. Professor Advanced Quantitative Methodologist UT-Houston School of Nursing Center for Nursing Research Phone (713)500-2031 Fax (713) 500-2033 soon to be moving to the Department of Pediatrics UT Houston School of Medicine = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
At 10:39 AM 4/19/01 -0500, Paul Swank wrote: However, rather than do that why not right on to F? Why do t at all when you can do anything with F that t can do plus a whole lot more? don't necessarily disagree with this but, i don't ever see in the literature in two group situations comparing means ... F tests done ... so, part of this has to do with educating students about what they will see in the journals, etc. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
Paul Swank wrote: However, rather than do that why not right on to F? Why do t at all when you can do anything with F that t can do plus a whole lot more? Because the mean, normalized using the hypothesized mean and the observed standard deviation, has a t distribution and not an F distribution. I am aware that the two are algebraically related,(and simply) but trying to get through statistics with only one table (or only one menu item on your stats software) seems pointless - like trying to do all your logic with NAND operations just because you can. -Robert Dawson = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
I agree. I still teach the t test also because of this, but at the same time I realize that what goes around, comes around, so what we are doing is ensuring that we will continue to see t tests in the literature. However, I find linear models easier to teach (once I erase the old stuff from their memories) than the basic inference course. It is so much more logical. At 12:41 AM 4/20/01 -0400, you wrote: >At 10:39 AM 4/19/01 -0500, Paul Swank wrote: >>However, rather than do that why not right on to F? Why do t at all when >>you can do anything with F that t can do plus a whole lot more? > > >don't necessarily disagree with this but, i don't ever see in the >literature in two group situations comparing means ... F tests done ... > >so, part of this has to do with educating students about what they will see >in the journals, etc. > > > Paul R. Swank, PhD. Professor Advanced Quantitative Methodologist UT-Houston School of Nursing Center for Nursing Research Phone (713)500-2031 Fax (713) 500-2033 soon to be moving to the Department of Pediatrics UT Houston School of Medicine = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
They are more than just related. One is a natural extension of the other just as chi-square is a natural extension of Z. With linear models, one can begin with a simple one sample model and build up to multiple factors and covariates using the same basic framework, which I find easier to make sense of logically and easier to teach. At 01:58 AM 4/19/01 -0300, you wrote: > > >Paul Swank wrote: >> >> However, rather than do that why not right on to F? Why do t at all when you can do anything with F that t can do plus a whole lot more? > > Because the mean, normalized using the hypothesized mean and the >observed standard deviation, has a t distribution and not an F >distribution. I am aware that the two are algebraically related,(and >simply) but trying to get through statistics with only one table (or >only one menu item on your stats software) seems pointless - like trying >to do all your logic with NAND operations just because you can. > > -Robert Dawson > Paul R. Swank, PhD. Professor Advanced Quantitative Methodologist UT-Houston School of Nursing Center for Nursing Research Phone (713)500-2031 Fax (713) 500-2033 soon to be moving to the Department of Pediatrics UT Houston School of Medicine = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
I agree. I normally start inference by using the binomial and then then the normal approximation to the binomial for large n. It might be best to begin all graduate students with nonparametric statistics followed by linear models. Then we could get them to where they can do something interesting without taking four courses. At 01:28 PM 4/19/01 -0500, you wrote: >Why not introduce hypothesis testing in a binomial setting where there are >no nuisance parameters and p-values, power, alpha, beta,... may be obtained >easily and exactly from the Binomial distribution? > >Jon Cryer > >At 01:48 AM 4/20/01 -0400, you wrote: >>At 11:47 AM 4/19/01 -0500, Christopher J. Mecklin wrote: >>>As a reply to Dennis' comments: >>> >>>If we deleted the z-test and went right to t-test, I believe that >>>students' understanding of p-value would be even worse... >> >> >>i don't follow the logic here ... are you saying that instead of their >>understanding being "bad" it will be worse? if so, not sure that this >>is a decrement other than trivial >> >>what makes using a normal model ... and say zs of +/- 1.96 ... any "more >>meaningful" to understand p values ... ? is it that they only learn ONE >>critical value? and that is simpler to keep neatly arranged in their mind? >> >>as i see it, until we talk to students about the normal distribution ... >>being some probability distribution where, you can find subpart areas at >>various baseline values and out (or inbetween) ... there is nothing >>inherently sensible about a normal distribution either ... and certainly i >>don't see anything that makes this discussion based on a normal >>distribution more inherently understandable than using a probability >>distribution based on t ... you still have to look for subpart areas ... >>beyond some baseline values ... or between baseline values ... >> >>since t distributions and unit normal distributions look very similar ... >>except when df is really small (and even there, they LOOK the same it is >>just that ts are somewhat wider) ... seems like whatever applies to one ... >>for good or for bad ... applies about the same for the other ... >> >>i would be appreciative of ANY good logical argument or empirical data that >>suggests that if we use unit normal distributions and z values ... z >>intervals and z tests ... to INTRODUCE the notions of confidence intervals >>and/or simple hypothesis testing ... that students somehow UNDERSTAND these >>notions better ... >> >>i contend that we have no evidence of this ... it is just something that we >>think ... and thus we do it that way >> >> >> >>= >>Instructions for joining and leaving this list and remarks about >>the problem of INAPPROPRIATE MESSAGES are available at >> http://jse.stat.ncsu.edu/ >>= >> >> > ___ >--- | \ >Jon Cryer, Professor [EMAIL PROTECTED] ( ) >Dept. of Statistics www.stat.uiowa.edu/~jcryer \\_University > and Actuarial Science office 319-335-0819 \ * \of Iowa >The University of Iowa dept. 319-335-0706 \/Hawkeyes >Iowa City, IA 52242 FAX319-335-3017 |__ ) >--- V > >"It ain't so much the things we don't know that get us into trouble. >It's the things we do know that just ain't so." --Artemus Ward > > >= >Instructions for joining and leaving this list and remarks about >the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ >= > Paul R. Swank, PhD. Professor Advanced Quantitative Methodologist UT-Houston School of Nursing Center for Nursing Research Phone (713)500-2031 Fax (713) 500-2033 soon to be moving to the Department of Pediatrics UT Houston School of Medicine = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
At 04:42 PM 4/19/01 +, Radford Neal wrote: In article [EMAIL PROTECTED], dennis roberts [EMAIL PROTECTED] wrote: I don't find this persuasive. nor the reverse ... since we have NO data on any of this ... only our own notions of how it MIGHT play itself out inside the heads of students I think that any student who has the abstract reasoning ability needed to understand the concepts involved will not have any difficult accepting a statement that "this situation doesn't come up often in practice, but we'll start with it because it's simpler". this in and of itself sounds strange ... "this situation doesn't come up often in practice ... but we will being with it ... (forget the reason why) ... " when does it EVER come up in practice, really? i know there must be some good examples out there for when it does but ... i have yet to see one ... where one would KNOW the sd but not the mean too ... for sure, it would not be based on data the investigator gathered ... since, to get the sd you would have to have the mean ... so, it must be (once again) one of those where you say "assume the sd in the population is ... " ... and hope the students buy that ... I have my doubts that introducing the t distribution is "NOT hard", if by that you mean that it's not hard to get them to understand what's actually happening. Of course, it's not very hard to get them to understand how to plug the numbers into the formula. just as i have doubts that the converse ... that introducing the z approach is easy ... as far as i can tell (again, no data ... just conjecture) the only thing that could make it easier is that (if one sticks to 95% CIs or .05 as a p value level criterion for a hypothesis test) ... you only have to remember 1.96 ... can someone elaborate on why fundamentally, using z would be easier OTHER than only 1 CV to remember? i don't see how it makes the basic notions of what CIs are and what you do to conduct hypothesis tests ... easier in some ideational or cognitive way what would the train of cognitive thought BE in the z approach that would make this easier? I think one could argue that introducing the z test first is MORE realistic. this seems inconsistent with your earlier suggestion that " ... this does not come up in practice very often ... " After seeing the z test, students will realize how lucky one is to have such a statistic, h ... this is a real stretch for most students, being "lucky" is finding out that he/she does NOT have to take a stat course and therefore can avoid all this mess! none of this applies to really good students ... you can introduce almost any notion to them and they will catch on to it AND quickly ... the problem is with the general batch which is usually 90% or more of all these students you have ... especially in first level intro stat courses ... Radford Neal = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
In article [EMAIL PROTECTED], dennis roberts [EMAIL PROTECTED] wrote: students have enough problems with all the stuff in stat as it is ... but, when we start some discussion about sampling error of means ... for use in building a confidence interval and/or testing some hypothesis ... the first thing observant students will ask when you say to them ... assume SRS of n=50 and THAT WE KNOW THAT THE POPULATION SD = 4 ... is: if we are trying to do some inferencing about the population mean ... how come we know the population sd but NOT the mean too? most find this notion highly illogical ... but we and books trudge on ... and they are correct of course in the NON logic of this scenario thus, it makes a ton more sense to me to introduce at this point a t distribution ... this is NOT hard to do ... then get right on with the reality case I don't find this persuasive. I think that any student who has the abstract reasoning ability needed to understand the concepts involved will not have any difficult accepting a statement that "this situation doesn't come up often in practice, but we'll start with it because it's simpler". I have my doubts that introducing the t distribution is "NOT hard", if by that you mean that it's not hard to get them to understand what's actually happening. Of course, it's not very hard to get them to understand how to plug the numbers into the formula. I think one could argue that introducing the z test first is MORE realistic. The situation where there are "nuisance" parameters that affect the distribution of the test statistic but are in practice unknown is TYPICAL. It's just a lucky break that the t statistic doesn't depend on sigma. After seeing the z test, students will realize how lucky one is to have such a statistic, and will realize that one shouldn't expect that to happen all the time. (Well, the really good ones might realize all this.) Radford Neal = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
At 11:47 AM 4/19/01 -0500, Christopher J. Mecklin wrote: As a reply to Dennis' comments: If we deleted the z-test and went right to t-test, I believe that students' understanding of p-value would be even worse... i don't follow the logic here ... are you saying that instead of their understanding being "bad" it will be worse? if so, not sure that this is a decrement other than trivial what makes using a normal model ... and say zs of +/- 1.96 ... any "more meaningful" to understand p values ... ? is it that they only learn ONE critical value? and that is simpler to keep neatly arranged in their mind? as i see it, until we talk to students about the normal distribution ... being some probability distribution where, you can find subpart areas at various baseline values and out (or inbetween) ... there is nothing inherently sensible about a normal distribution either ... and certainly i don't see anything that makes this discussion based on a normal distribution more inherently understandable than using a probability distribution based on t ... you still have to look for subpart areas ... beyond some baseline values ... or between baseline values ... since t distributions and unit normal distributions look very similar ... except when df is really small (and even there, they LOOK the same it is just that ts are somewhat wider) ... seems like whatever applies to one ... for good or for bad ... applies about the same for the other ... i would be appreciative of ANY good logical argument or empirical data that suggests that if we use unit normal distributions and z values ... z intervals and z tests ... to INTRODUCE the notions of confidence intervals and/or simple hypothesis testing ... that students somehow UNDERSTAND these notions better ... i contend that we have no evidence of this ... it is just something that we think ... and thus we do it that way = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
Why not introduce hypothesis testing in a binomial setting where there are no nuisance parameters and p-values, power, alpha, beta,... may be obtained easily and exactly from the Binomial distribution? Jon Cryer At 01:48 AM 4/20/01 -0400, you wrote: At 11:47 AM 4/19/01 -0500, Christopher J. Mecklin wrote: As a reply to Dennis' comments: If we deleted the z-test and went right to t-test, I believe that students' understanding of p-value would be even worse... i don't follow the logic here ... are you saying that instead of their understanding being "bad" it will be worse? if so, not sure that this is a decrement other than trivial what makes using a normal model ... and say zs of +/- 1.96 ... any "more meaningful" to understand p values ... ? is it that they only learn ONE critical value? and that is simpler to keep neatly arranged in their mind? as i see it, until we talk to students about the normal distribution ... being some probability distribution where, you can find subpart areas at various baseline values and out (or inbetween) ... there is nothing inherently sensible about a normal distribution either ... and certainly i don't see anything that makes this discussion based on a normal distribution more inherently understandable than using a probability distribution based on t ... you still have to look for subpart areas ... beyond some baseline values ... or between baseline values ... since t distributions and unit normal distributions look very similar ... except when df is really small (and even there, they LOOK the same it is just that ts are somewhat wider) ... seems like whatever applies to one ... for good or for bad ... applies about the same for the other ... i would be appreciative of ANY good logical argument or empirical data that suggests that if we use unit normal distributions and z values ... z intervals and z tests ... to INTRODUCE the notions of confidence intervals and/or simple hypothesis testing ... that students somehow UNDERSTAND these notions better ... i contend that we have no evidence of this ... it is just something that we think ... and thus we do it that way = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = ___ --- | \ Jon Cryer, Professor [EMAIL PROTECTED] ( ) Dept. of Statistics www.stat.uiowa.edu/~jcryer \\_University and Actuarial Science office 319-335-0819 \ * \of Iowa The University of Iowa dept. 319-335-0706 \/Hawkeyes Iowa City, IA 52242 FAX319-335-3017 |__ ) --- V "It ain't so much the things we don't know that get us into trouble. It's the things we do know that just ain't so." --Artemus Ward = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
All of your observations about the deficiencies of data are perfectly valid. But what do you do? Just give up because your data are messy, and your assumptions are doubtful and all that? Go and dig ditches instead? You can only analyse data by making assumptions - by working with models of the world. The models may be shonky, but they are presumably the best you can do. And within those models you have to assume the data is what you think it is. I agree that we do not, in general, make it sufficiently clear to students that all statistical analysis deals with models, and those models involve assumptions which are frequently heroic - but you do have to get down to doing some analysis at some time, you can't just whinge about the lousy data, and to do that analysis you pick the techniques appropriate to the models you are working with. Alan dennis roberts wrote: At 08:46 AM 4/20/01 +1000, Alan McLean wrote: So the two good reasons are - that the z test is the basis for the t, and the understanding that knowledge has a very direct value. I hasten to add that 'knowledge' here is always understood to be 'assumed knowledge' - as it always is in statistics. My eight cents worth. Alan the problem with all these details is that ... the quality of data we get and the methods we use to get it ... PALE^2 in comparison to what such methods might tell us IF everything were clean DATA ARE NOT CLEAN! but, we prefer it seems to emphasize all this minutiae .. rather than spend much much more time on formulating clear questions to ask and, designing good ways to develop measures and collect good data every book i have seen so causally says: assume a SRS of n=40 ... when SRS are nearly impossible to get we dust off assumptions (like normality) with the flick of a cigarette ash ... we pay NO attention to whether some measure we use provides us with reliable data ... the lack of random assignment in even the simplest of experimental designs ... seems to cause barely a whimper we pound statistical significance into the ground when, it has such LIMITED application and the list goes on and on and on but yet, we get in a tizzy (me too i guess) and fight tooth and nail over such silly things as should we start the discussion of hypothesis testing for a mean with z or t? WHO CARES? ... the difference is trivial at best in the overall process of research and gathering data ... the process of analysis is the LEAST important aspect of it ... let's face it ... errors that are made in papers/articles/research projects are rarely caused by faulty analysis applications ... though sure, now and then screw ups do happen ... the biggest (by a light year) problem is bad data ... collected in a bad way ... hoping to chase answers to bad questions ... or highly overrated and/or unimportant questions NO analysis will salvage these problems ... and to worry and agonize over z or t ... and a hundred other such things is putting too much weight on the wrong things AND ALL IN ONE COURSE TOO! (as some advisors are hoping is all that their students will EVER have to take!) -- Alan McLean ([EMAIL PROTECTED]) Department of Econometrics and Business Statistics Monash University, Caulfield Campus, Melbourne Tel: +61 03 9903 2102Fax: +61 03 9903 2007 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = == dennis roberts, penn state university educational psychology, 8148632401 http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = -- Alan McLean ([EMAIL PROTECTED]) Department of Econometrics and Business Statistics Monash University, Caulfield Campus, Melbourne Tel: +61 03 9903 2102Fax: +61 03 9903 2007 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
"Mark W. Humphries" wrote: If I understand correctly the t test, since it takes into account degrees of freedom, is applicable whatever the sample size might be, and has no drawbacks that I could find compared to the z test. Have I misunderstood something? From my class notes (which, in this case, are a reporting of comments made by Mosteller and Tukey)... Frederick Mosteller and John Tukey, on pages 5-7 of Data Analysis and Regression [Reading, MA: Addison-Wesley Publishing Company, Inc., 1997] provide insight into what Student really did and how it should affect our choice of test. The value of Student's work lay not in great numerical change, but in: recognition that one could, if appropriate assumptions held, make allowances for the "uncertainties" of small samples, not only in Student's original problem, but in others as well; provision of a numerical assessment of how small the necessary numerical adjustment of confidence points were in Student's problem... presentation of tables that could be used--in setting confidence limits, in making significance tests--to assess the uncertainty associated with even very small samples. Besides its values, Student's contribution had its drawbacks, notably: it made it too easy to neglect the proviso "if appropriate assumptions held"; it overemphasized the "exactness of Student's solution for his idealized problem"; it helped to divert the attention of theoretical statisticians to the development of "exact" ways of treating other problems; and it failed to attack the "problem of multiplicity": the difficulties and temptation associated with the application of large numbers of tests to the same data. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
Eric -- Good comment! Also, it is helpful to keep in mind that: t^2 (df2) = F(1,df2) -- Joe Joe Ward 167 East Arrowhead Dr. San Antonio, TX 78228-2402 Home phone: 210-433-6575 Home fax: 210-433-2828 Email: [EMAIL PROTECTED] http://www.ijoa.org/joeward/wardindex.html Health Careers High School 4646 Hamilton Wolfe San Antonio, TX 78229 Phone: 210-617-5400 Fax: 210-617-5423 - Original Message - From: "Eric Bohlman" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, April 16, 2001 3:43 PM Subject: Re: Student's t vs. z tests Mark W. Humphries [EMAIL PROTECTED] wrote: Hi, I am attempting to self-study basic multivariate statistics using Kachigan's "Statistical Analysis" (which I find excellent btw). Perhaps someone would be kind enough to clarify a point for me: If I understand correctly the t test, since it takes into account degrees of freedom, is applicable whatever the sample size might be, and has no drawbacks that I could find compared to the z test. Have I misunderstood something? You're running into a historical artifact: in pre-computer days, using the normal distribution rather than the t distribution reduced the size of the tables you had to work with. Nowadays, a computer can compute a t probability just as easily as a z probability, so unless you're in the rare situation Karl mentioned, there's no reason not to use a t test. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Student's t vs. z tests
If you knew the population SD (not likely if you are estimating the population mean), you would have more power with the z statistic (which requires that you know the population SD rather than estimating it from the sample) than with t. -Original Message- If I understand correctly the t test, since it takes into account degrees of freedom, is applicable whatever the sample size might be, and has no drawbacks that I could find compared to the z test. Have I misunderstood something? Mark = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Student's t vs. z tests
Mark W. Humphries [EMAIL PROTECTED] wrote: Hi, I am attempting to self-study basic multivariate statistics using Kachigan's "Statistical Analysis" (which I find excellent btw). Perhaps someone would be kind enough to clarify a point for me: If I understand correctly the t test, since it takes into account degrees of freedom, is applicable whatever the sample size might be, and has no drawbacks that I could find compared to the z test. Have I misunderstood something? You're running into a historical artifact: in pre-computer days, using the normal distribution rather than the t distribution reduced the size of the tables you had to work with. Nowadays, a computer can compute a t probability just as easily as a z probability, so unless you're in the rare situation Karl mentioned, there's no reason not to use a t test. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =