Re: Logarithms (was: When to Use t and When to Use z Revisited)
On Tue, 11 Dec 2001, Vadim and Oxana Marmer wrote: besides, who needs those tables? we have computers now, don't we? I was told that there were tables for logarithms once. I have not seen one in my life. Is not it the same kind of stuff? If you _want_ to see one, you have no farther to go than to Sterling Library and look up what there is under mathematical tables. (Unless, in the years since I worked there as an undergraduate, they've thrown them all out, which I would hope to be unlikely.) -- DFB. Donald F. Burrill [EMAIL PROTECTED] 184 Nashua Road, Bedford, NH 03110 603-471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
Ronny Richardson wrote: A few weeks ago, I posted a message about when to use t and when to use z. In reviewing the responses, it seems to me that I did a poor job of explaining my question/concern so I am going to try again. I have included a few references this time since one responder doubted the items to which I was referring. The specific references are listed at the end of this message. Bluman has a figure (2, page 333) that is suppose to show the student When to Use the z or t Distribution. I have seen a similar figure in several different textbooks. The figure is a logic diagram and the first question is Is sigma known? If the answer is yes, the diagram says to use z. I do not question this; however, I doubt that sigma is ever known in a business situation and I only have experience with business statistics books. If the answer is no, the next question is Is n=30? If the answer is yes, the diagram says to use z and estimate sigma with s. This is the option I question and I will return to it briefly. In the diagram, if the answer is no to the question about n=30, you are to use t. I do not question this either. Now, regarding using z when n=30. If we always use z when n=30, then you would never need a t table with greater than 28 degrees of freedom. (n=29 would always yield df=28.) Bluman cuts his off at 28 except for the infinity row so he is consistent. (The infinity row shows that t becomes z at infinity.) However, other authors go well beyond 30. Aczel (3, inside cover) has values for 29, 30, 40, 60, and 120, in addition to infinity. Levine (4, pages E7-E8) has values for 29-100 and then 110 and 112, along with infinity. I could go on, but you get the point. If you always switch to z at 30, then why have t tables that go above 28? Again, the infinity entry I understand, just not the others. Berenson states (1, page 373), However, the t distribution has more area in the tails and less in the center than down the normal distribution. This is because sigma is unknown and we are using s to estimate it. Because we are uncertain of the value of sigma, the values of t that we observe will be more variable than for Z. So, Berenson seems to me to be saying that you always use t when you must estimate sigma using s. Levine (4, page 424) says roughly the same thing, However, the t distribution has more area in the tails and less in the center than does the normal distribution. This is because sigma is unknown and we are using s to estimate it. Because we are uncertain of the value sigma, the values of t that we observe will be more variable than for Z. So, I conclude 1) we use z when we know the sigma and either the data is normally distributed or the sample size is greater than 30 so we can use the central limit theorem. 2) When n30 and the data is normally distributed, we use t. 3) When n is greater than 30 and we do not know sigma, we must estimate sigma using s so we really should be using t rather than z. Now, every single business statistics book I have examined, including the four referenced below, use z values when performing hypothesis testing or computing confidence intervals when n30. Are they 1. Wrong 2. Just oversimplifying it without telling the reader They are not oversimplifying, they are complexifying. To quote Polya How to solve it : If you need rules, use this one first: 1) Use your own brains first. Sigma is hardly ever known, so you must use t. Then why not simply tell the students: use the t table as far as it goes, (usually around n=120), and after that, use the n=\infty line (which corresponds to the normal distribution). Then there is no need for a rule for when to use z, when to use t. Kjetil Halvorsen or am I overlooking something? Ronny Richardson References -- (1) Basic Business Statistics, Seventh Edition, Berenson and Levine. (2) Elementary Statistics: A Step by Step Approach, Third Edition, Bluman. (3) Complete Business Statistics, Fourth Edition, Aczel. (4) Statistics for Managers Using Microsoft Excel, Second Edition, Levine, Berenson, Stephan. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
At 04:14 AM 12/10/01 +, Jim Snow wrote: Ronny Richardson [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... A few weeks ago, I posted a message about when to use t and when to use z. I did not see the earlier postings, so forgive me if I repeat advice already given.:-) 1. The consequences of using the t distribution instead of the normal distribution for sample sizes greater than 30 are of no importance in practice. what's magical about 30? i say 33 ... no actually, i amend that to 28 2. There is no good reason for statistical tables for use in practical analysis of data to give figures for t on numbers of degrees of freedom over 30 except that it makes it simple to routinely use one set of tables when the variance is estimated from the sample. with software, there is no need for tables ... period! 3. There are situations where the error variance is known. They generally arise when the errors in the data arise from the use of a measuring instrument with known accuracy or when the figures available are known to be truncated to a certain number of decimal places. For example: Several drivers use cars in a car pool. The distance tavelled on each trip by a driver is recorded, based on the odometer reading. Each observation has an error which is uniformly distributed in (0,0.2). The variance of this error is (0.2)^2)/12 = .00 and standard deviation 0.0578 . To calculate confidence limits for the average distance travelled by each driver, the z statistic should be used. this is pure speculation ... i have yet to hear of any convincing case where the variance is known but, the mean is not _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
Dennis Roberts wrote: this is pure speculation ... i have yet to hear of any convincing case where the variance is known but, the mean is not What about that other application used so prominently in texts of business statistics, testing for a proportion? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
But then you should use a binomial (or hypergeometric) distribution. Jon Cryer p.s. Of course, you might approximate by an appropriate normal distribution. At 11:39 AM 12/10/01 -0400, you wrote: Dennis Roberts wrote: this is pure speculation ... i have yet to hear of any convincing case where the variance is known but, the mean is not What about that other application used so prominently in texts of business statistics, testing for a proportion? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = Jon Cryer, Professor Emeritus Dept. of Statistics www.stat.uiowa.edu/~jcryer and Actuarial Science office 319-335-0819 The University of Iowa home 319-351-4639 Iowa City, IA 52242 FAX 319-335-3017 It ain't so much the things we don't know that get us into trouble. It's the things we do know that just ain't so. --Artemus Ward
Re: When to Use t and When to Use z Revisited
Dennis Roberts wrote: this is pure speculation ... i have yet to hear of any convincing case where the variance is known but, the mean is not A scale (weighing device) with known precision. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
I always thought that the precision of a scale was proportional to the amount weighed. So don't you have to know the mean before you know the standard deviation? But wait a minute - we are trying assess the size of the mean! Jon Cryer At 03:42 PM 12/10/01 +, you wrote: Dennis Roberts wrote: this is pure speculation ... i have yet to hear of any convincing case where the variance is known but, the mean is not A scale (weighing device) with known precision. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = Jon Cryer, Professor Emeritus Dept. of Statistics www.stat.uiowa.edu/~jcryer and Actuarial Science office 319-335-0819 The University of Iowa home 319-351-4639 Iowa City, IA 52242 FAX 319-335-3017 It ain't so much the things we don't know that get us into trouble. It's the things we do know that just ain't so. --Artemus Ward
Re: When to Use t and When to Use z Revisited
the sample mean of the dichotomous (one_zero, dummy) variable is known, It is the proportion. Gus Gassmann wrote: Dennis Roberts wrote: this is pure speculation ... i have yet to hear of any convincing case where the variance is known but, the mean is not What about that other application used so prominently in texts of business statistics, testing for a proportion? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
Art Kendall wrote: (putting below the previous quotes for readability) Gus Gassmann wrote: Dennis Roberts wrote: this is pure speculation ... i have yet to hear of any convincing case where the variance is known but, the mean is not What about that other application used so prominently in texts of business statistics, testing for a proportion? the sample mean of the dichotomous (one_zero, dummy) variable is known, It is the proportion. Sure. But when you test Ho: p = p0, you know (or pretend to know) the population variance. So if the CLT applies, you should use a z-table, no? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
Usually I would use software. As I tried to show is the sample syntax I posted earlier, it doesn't usually make much difference whether you use z or t. Gus Gassmann wrote: Art Kendall wrote: (putting below the previous quotes for readability) Gus Gassmann wrote: Dennis Roberts wrote: this is pure speculation ... i have yet to hear of any convincing case where the variance is known but, the mean is not What about that other application used so prominently in texts of business statistics, testing for a proportion? the sample mean of the dichotomous (one_zero, dummy) variable is known, It is the proportion. Sure. But when you test Ho: p = p0, you know (or pretend to know) the population variance. So if the CLT applies, you should use a z-table, no? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
Only as an approximation. At 12:57 PM 12/10/01 -0400, you wrote: Art Kendall wrote: (putting below the previous quotes for readability) Gus Gassmann wrote: Dennis Roberts wrote: this is pure speculation ... i have yet to hear of any convincing case where the variance is known but, the mean is not What about that other application used so prominently in texts of business statistics, testing for a proportion? the sample mean of the dichotomous (one_zero, dummy) variable is known, It is the proportion. Sure. But when you test Ho: p = p0, you know (or pretend to know) the population variance. So if the CLT applies, you should use a z-table, no? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
At 03:42 PM 12/10/01 +, Jerry Dallal wrote: Dennis Roberts wrote: this is pure speculation ... i have yet to hear of any convincing case where the variance is known but, the mean is not A scale (weighing device) with known precision. as far as i know ... knowing the precision is expressed in terms of ... 'accurate to within' ... and if there is ANY 'within' attached ... then accuracy for SURE is not known = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
On Mon, 10 Dec 2001 12:57:29 -0400, Gus Gassmann [EMAIL PROTECTED] wrote: Art Kendall wrote: (putting below the previous quotes for readability) Gus Gassmann wrote: Dennis Roberts wrote: this is pure speculation ... i have yet to hear of any convincing case where the variance is known but, the mean is not What about that other application used so prominently in texts of business statistics, testing for a proportion? the sample mean of the dichotomous (one_zero, dummy) variable is known, It is the proportion. GG Sure. But when you test Ho: p = p0, you know (or pretend to know) the population variance. So if the CLT applies, you should use a z-table, no? That is the textbook justification for chi-squared and z tests in the sets of 'nonparametric tests' which are based on rank-order transformations and dichotomizing. The variance is known, so the test statistic has the shorter tails. (It works for ranks when you don't have ties.) -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
besides, who needs those tables? we have computers now, don't we? I was told that there were tables for logarithms once. I have not seen one in my life. Is not it the same kind of stuff? 3. Outdated. on the grounds that when sigma is unknown, the proper distribution is t (unless N is small and the parent population is screwy) regardless how large the sample size may be. The main (if not the only) reason for the apparent logical bifurcation at N = 30 or thereabouts was that, when one's only sources of information about critical values were printed tables, 30 lines was about what fit on one page (plus maybe a few extra lines for 40, 60, 120 d.f.) and one could not (or at any rate did not) expect one's business students to have convenient access to more extensive tables of the t distribution. And, one suspects latterly, authors were skeptical that students would pay attention to (or perhaps be able to master?) the technique of interpolating by reciprocals between 30 df and larger numbers of df (particularly including infinity). But currently, _I_ would not expect business students to carry out the calculations for hypothesis tests, or confidence intervals, by hand, except maybe half a dozen times in class for the good of their souls: I'd expect them to learn to invoke a statistical package, or else something like Excel that pretends to supply adequate statistical routines. And for all the packages I know of, there is a built-in function for calculating, or approximating, the cumulative distribution of t for ANY number of df. The advice in any _current_ business- statistics text ought to be, therefore, to use t _whenever_ sigma is not known. And if the textbook isn't up to that standard, the instructor jolly well should be. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
3) When n is greater than 30 and we do not know sigma, we must estimate sigma using s so we really should be using t rather than z. you are wrong. you use t-distribution not because you don't know sigma, but because your statistic has EXACT t-distribution under certain conditions. I know that the textbook says if we knew sigma then the distribution would be normal, but because we used s instead the distribution turned out to be t. It does not say how exactly it becomes t, so you make the conclusion: use t instead of normal whenever you use s instead of sigma. But it's wrong, it does not go like this. when you don't know underlying distribution of the sample you may use normal distribution (under certain regularity conditions), as an APPROXIMATION to the actual distribution of your statistic. approximate distribution in most cases is not parameter-free, it may depend, for example, on unknown sigma. in such situation you may replace the unknown parameter by its consistent estimator.the approximate distribution is still normal. think about it as iterated approximation. first you approximate the actual distribution by N(0,sigma^2), then you approximate it by N(0,S^2), where S^2 is a consistent estimator for sigma. there are formal theorems that allow you to do this kind of thigs. The essential difference between two approaches is that the first one tries to derive the EXACT disribution, second says I will use APPROXIMATION. number 30 has no importance at all, throw away all the tables you have. I cannot believe they still teach you this stuff. I wish it was that simle:30! Your confusion is the result of oversimplification and desire to provide students with simple stratagies which present in basic statistics textbooks. I guess it makes teaching very simple, but it mislead students. Your confusion is an example. The problem is that there is no simple strategies, and things are much-much more complicated than they appear in basic textbooks. Basic text books don't tell you the whole story, and they don't even try, because you simply cannot do this at their level. Don't make any strong conclusions after reading only basic textbooks. In practice, in business and economics statistics, nobody uses t-tests, but normal and chi-square approximations are used a lot. The assumptions that you have to make for t-test are too strong. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
Sigma is hardly ever known, so you must use t. Then why not simply tell the students: use the t table as far as it goes, (usually around n=120), and after that, use the n=\infty line (which corresponds to the normal distribution). Then there is no need for a rule for when to use z, when to use t. but the data is not normal either in 99.9(9) of the cases. Furthermore, the data that you see in economics/business is very often is not an iid sample either. So, one way or another you end up with normal or chi-square. actually, there is an alternative to both approaches. it's bootstrap. but it does not always work and should not be used blindly. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
When to Use t and When to Use z Revisited
A few weeks ago, I posted a message about when to use t and when to use z. In reviewing the responses, it seems to me that I did a poor job of explaining my question/concern so I am going to try again. I have included a few references this time since one responder doubted the items to which I was referring. The specific references are listed at the end of this message. Bluman has a figure (2, page 333) that is suppose to show the student When to Use the z or t Distribution. I have seen a similar figure in several different textbooks. The figure is a logic diagram and the first question is Is sigma known? If the answer is yes, the diagram says to use z. I do not question this; however, I doubt that sigma is ever known in a business situation and I only have experience with business statistics books. If the answer is no, the next question is Is n=30? If the answer is yes, the diagram says to use z and estimate sigma with s. This is the option I question and I will return to it briefly. In the diagram, if the answer is no to the question about n=30, you are to use t. I do not question this either. Now, regarding using z when n=30. If we always use z when n=30, then you would never need a t table with greater than 28 degrees of freedom. (n=29 would always yield df=28.) Bluman cuts his off at 28 except for the infinity row so he is consistent. (The infinity row shows that t becomes z at infinity.) However, other authors go well beyond 30. Aczel (3, inside cover) has values for 29, 30, 40, 60, and 120, in addition to infinity. Levine (4, pages E7-E8) has values for 29-100 and then 110 and 112, along with infinity. I could go on, but you get the point. If you always switch to z at 30, then why have t tables that go above 28? Again, the infinity entry I understand, just not the others. Berenson states (1, page 373), However, the t distribution has more area in the tails and less in the center than down the normal distribution. This is because sigma is unknown and we are using s to estimate it. Because we are uncertain of the value of sigma, the values of t that we observe will be more variable than for Z. So, Berenson seems to me to be saying that you always use t when you must estimate sigma using s. Levine (4, page 424) says roughly the same thing, However, the t distribution has more area in the tails and less in the center than does the normal distribution. This is because sigma is unknown and we are using s to estimate it. Because we are uncertain of the value sigma, the values of t that we observe will be more variable than for Z. So, I conclude 1) we use z when we know the sigma and either the data is normally distributed or the sample size is greater than 30 so we can use the central limit theorem. 2) When n30 and the data is normally distributed, we use t. 3) When n is greater than 30 and we do not know sigma, we must estimate sigma using s so we really should be using t rather than z. Now, every single business statistics book I have examined, including the four referenced below, use z values when performing hypothesis testing or computing confidence intervals when n30. Are they 1. Wrong 2. Just oversimplifying it without telling the reader or am I overlooking something? Ronny Richardson References -- (1) Basic Business Statistics, Seventh Edition, Berenson and Levine. (2) Elementary Statistics: A Step by Step Approach, Third Edition, Bluman. (3) Complete Business Statistics, Fourth Edition, Aczel. (4) Statistics for Managers Using Microsoft Excel, Second Edition, Levine, Berenson, Stephan. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
On Sun, 9 Dec 2001, Ronny Richardson wrote in part: Bluman has a figure (2, page 333) that is supposed to show the student When to Use the z or t Distribution. I have seen a similar figure in several different textbooks. So have I, sometimes as a diagram or flow chart, sometimes in paragraph or outline form. The figure is a logic diagram and the first question is Is sigma known? If the answer is yes, the diagram says to use z. I do not question this; however, I doubt that sigma is ever known in a business situation and I only have experience with business statistics books. Depends partly on what parameter one is addressing (either as a hypothesis test or as a confidence interval). For the mean of an unknown empirical distribution, I expect you're right. But for the proportion of persons in a population who would want to purchase (for a currently topical example) a Segway, the population variance is a known function of the proportion (the underlying distribution being, presumably, binomial), and for this case the t distribution is simply inappropriate, and one ought to use either the proper binomial distribution function, or else the normal approximation to the binomial (perhaps after satisfying oneself that N is sufficiently large for the approximation to be credible with the hypothesized (or observed) value of the proportion; various textbook authors offer assorted recipes for this purpose). { Snip, discourse on N = 30, although I'd think it were rather on df = 30. } However, other authors go well beyond 30. Aczel (3, inside cover) has values for 29, 30, 40, 60, and 120, in addition to infinity. Levine (4, pages E7-E8) has values for 29-100 and then 110 and 112, along with infinity. I could go on, but you get the point. If you always switch to z at 30, then why have t tables that go above 28? Again, the infinity entry I understand, just not the others. { Snip, assorted quotes ... } So, Berenson seems to me to be saying that you always use t when you must estimate sigma using s. Levine (4, page 424) says roughly the same thing, ... So, I conclude {slightly edited -- DB} 1) we use z when we know the sigma and either the data are normally distributed or the sample size is greater than 30 so we can use the central limit theorem. I would amend this to the sample size is large enough that we can... Whether 30 is in fact large enough or not depends rather heavily on what the true shape of the parent population actually is. (If it's roughly symmetrical and bell-shaped, 30 may be O.K.) 2) When n30 and the data are normally distributed, we use t. 3) When n is greater than 30 and we do not know sigma, we must estimate sigma using s so we really should be using t rather than z. Now, every single business statistics book I have examined, including the four referenced below, use z values when performing hypothesis testing or computing confidence intervals when n30. Are they 1. Wrong 2. Just oversimplifying it without telling the reader or am I overlooking something? I vote for both 1. and 2., since 2. is in my view a subset of 1, although others may not share this opinion. I would add 3. Outdated. on the grounds that when sigma is unknown, the proper distribution is t (unless N is small and the parent population is screwy) regardless how large the sample size may be. The main (if not the only) reason for the apparent logical bifurcation at N = 30 or thereabouts was that, when one's only sources of information about critical values were printed tables, 30 lines was about what fit on one page (plus maybe a few extra lines for 40, 60, 120 d.f.) and one could not (or at any rate did not) expect one's business students to have convenient access to more extensive tables of the t distribution. And, one suspects latterly, authors were skeptical that students would pay attention to (or perhaps be able to master?) the technique of interpolating by reciprocals between 30 df and larger numbers of df (particularly including infinity). But currently, _I_ would not expect business students to carry out the calculations for hypothesis tests, or confidence intervals, by hand, except maybe half a dozen times in class for the good of their souls: I'd expect them to learn to invoke a statistical package, or else something like Excel that pretends to supply adequate statistical routines. And for all the packages I know of, there is a built-in function for calculating, or approximating, the cumulative distribution of t for ANY number of df. The advice in any _current_ business- statistics text ought to be, therefore, to use t _whenever_ sigma is not known. And if the textbook isn't up to that standard, the instructor jolly well should be. { Snip, references. See the original post for more details. } -- DFB.
Re: When to Use t and When to Use z Revisited
Ronny Richardson [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... A few weeks ago, I posted a message about when to use t and when to use z. I did not see the earlier postings, so forgive me if I repeat advice already given.:-) 1. The consequences of using the t distribution instead of the normal distribution for sample sizes greater than 30 are of no importance in practice. The difference in the numbers given as confidence limits are so small that no sensible person would change their course of action based on that miniscule variation. In the case of a significance test a result just over or just under, say, the 5% level should always be examined in the knowledge that the 5% is an arbitrary level and that a level of 4.9% or 5.1% could equally well have been chosen. 2. There is no good reason for statistical tables for use in practical analysis of data to give figures for t on numbers of degrees of freedom over 30 except that it makes it simple to routinely use one set of tables when the variance is estimated from the sample. Another reason that books of tables do not include t values for degrees of freedom between 30,60,sometimes 120 and infinity is that there is no need,even for the extreme tails of the distribution and when ,for whatever reason, high accuracy is required, because the intermediate values can be obtained by harmonic interpolation. That is, the tail entries in the distribution can be obtained by linear interpolation on 1/n. 3. There are situations where the error variance is known. They generally arise when the errors in the data arise from the use of a measuring instrument with known accuracy or when the figures available are known to be truncated to a certain number of decimal places. For example: Several drivers use cars in a car pool. The distance tavelled on each trip by a driver is recorded, based on the odometer reading. Each observation has an error which is uniformly distributed in (0,0.2). The variance of this error is (0.2)^2)/12 = .00 and standard deviation 0.0578 . To calculate confidence limits for the average distance travelled by each driver, the z statistic should be used. A similar situation could arise in dealing with data in which the error arises from the rounding of all numbers to the nearest thousand. This is an uncommon situation in a business context, but it arises quite often in scientific work where the inherent accuracy of a measuring instrument may be known from long experience and need not be estimated from the small sample currently being examined. 4. You seem to think the Central Limit Theorem is behind the validity of t vs z tables. This is not so. The CLT only bears on the Normal shape and the relation of the variance of an average or sum to the population variance. Commenting specifically on points in your posting: Ronny Richardson [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... A few weeks ago, I posted a message about when to use t and when to use z. (snip) So, I conclude 1) we use z when we know the sigma and either the data is normally distributed or the sample size is greater than 30 Yes, but the difference if you use t is tiny and of no importance. so we can use the central limit theorem. No. The CLT is not the reason. The CLT ensures that the average and sum are Normally distributed for large enough n. Unless the data is very skewed or bimodal, n=5 is usually large enough in practice. This is a separate issue to the choice of Normal or t distribution for inference. 2) When n30 and the data is normally distributed, we use t. 3) When n is greater than 30 and we do not know sigma, we must estimate sigma using s so we really should be using t rather than z. but the difference in the resulting numbers is miniscule and of no importance. Now, every single business statistics book I have examined, including the four referenced below, use z values when performing hypothesis testing or computing confidence intervals when n30. Are they 1. Wrong 2. Just oversimplifying it without telling the reader or am I overlooking something? Ronny Richardson I hope that helps Jim Snow = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
[EMAIL PROTECTED] (Ronny Richardson) wrote in message news:[EMAIL PROTECTED]... A few weeks ago, I posted a message about when to use t and when to use z. In reviewing the responses, it seems to me that I did a poor job of explaining my question/concern so I am going to try again. I have included a few references this time since one responder doubted the items to which I was referring. The specific references are listed at the end of this message. Bluman has a figure (2, page 333) that is suppose to show the student When to Use the z or t Distribution. I have seen a similar figure in several different textbooks. The figure is a logic diagram and the first question is Is sigma known? If the answer is yes, the diagram says to use z. I do not question this; however, I doubt that sigma is ever known in a business situation and I only have experience with business statistics books. If the answer is no, the next question is Is n=30? If the answer is yes, the diagram says to use z and estimate sigma with s. This is the option I question and I will return to it briefly. In the diagram, if the answer is no to the question about n=30, you are to use t. I do not question this either. Now, regarding using z when n=30. If we always use z when n=30, then you would never need a t table with greater than 28 degrees of freedom. (n=29 would always yield df=28.) Bluman cuts his off at 28 except for the infinity row so he is consistent. (The infinity row shows that t becomes z at infinity.) However, other authors go well beyond 30. Aczel (3, inside cover) has values for 29, 30, 40, 60, and 120, in addition to infinity. Levine (4, pages E7-E8) has values for 29-100 and then 110 and 112, along with infinity. I could go on, but you get the point. If you always switch to z at 30, then why have t tables that go above 28? Again, the infinity entry I understand, just not the others. Berenson states (1, page 373), However, the t distribution has more area in the tails and less in the center than down the normal distribution. This is because sigma is unknown and we are using s to estimate it. Because we are uncertain of the value of sigma, the values of t that we observe will be more variable than for Z. So, Berenson seems to me to be saying that you always use t when you must estimate sigma using s. Yes, but as n becomes large the difference becomes extremely small. The question is, when is small small enough? Levine (4, page 424) says roughly the same thing, However, the t distribution has more area in the tails and less in the center than does the normal distribution. This is because sigma is unknown and we are using s to estimate it. Because we are uncertain of the value sigma, the values of t that we observe will be more variable than for Z. So, I conclude 1) we use z when we know the sigma and either the data is normally distributed or the sample size is greater than 30 so we can use the central limit theorem. 2) When n30 and the data is normally distributed, we use t. 3) When n is greater than 30 and we do not know sigma, we must estimate sigma using s so we really should be using t rather than z. Uh, wait a sec. i) The CLT doesn't kick in at the same point for every distribution. If the distribution is close to normal, you don't need anything like n=30. If the distribution is (say) highly skew, then n=30 may not be anywhere near close enough. ii) Even at a given distribution, a sample size that's close enough for one application won't necessarily be close enough for another application. iii) How much accuracy you get also depends on how far into the tails you need precision. There's no point knowing the 2.5% points aren't far out if you need it (for your application) to be accurate near the 0.25% points. iv) the rate at which the variance approaches the appropriate multiple of a chi-square depends on the sampling frequency. It's possible it may never do so, but with large sample size you should generally still get normality because of Slutzky's theorem. Even if n=30 was right when we're talking about the mean, it won't in general also be just right when we're dealing with what's happening with the variance (see above). v) the degree to which the dependence between the mean and variance affects the distribution of the t statistic itself depends on the distribution you're sampling from (but again, Slutzky should save you eventually). For these sorts of reasons, n=30 is oversimplistic. Sometimes it's far too stringent, sometimes too weak. Better to make some assessment of the effect of what you regard as possible situations and see if the consequences are okay for your situation. Now, every single business statistics book I have examined, including the four referenced below, use z values when performing hypothesis testing or computing confidence intervals when n30. Are they 1. Wrong 2. Just oversimplifying it without telling the reader or