A few weeks ago, I posted a message about when to use t and when to use z. In reviewing the responses, it seems to me that I did a poor job of explaining my question/concern so I am going to try again.
I have included a few references this time since one responder doubted the items to which I was referring. The specific references are listed at the end of this message. Bluman has a figure (2, page 333) that is suppose to show the student "When to Use the z or t Distribution." I have seen a similar figure in several different textbooks. The figure is a logic diagram and the first question is "Is sigma known?" If the answer is yes, the diagram says to use z. I do not question this; however, I doubt that sigma is ever known in a business situation and I only have experience with business statistics books. If the answer is no, the next question is "Is n>=30?" If the answer is yes, the diagram says to use z and estimate sigma with s. This is the option I question and I will return to it briefly. In the diagram, if the answer is no to the question about n>=30, you are to use t. I do not question this either. Now, regarding using z when n>=30. If we always use z when n>=30, then you would never need a t table with greater than 28 degrees of freedom. (n<=29 would always yield df<=28.) Bluman cuts his off at 28 except for the infinity row so he is consistent. (The infinity row shows that t becomes z at infinity.) However, other authors go well beyond 30. Aczel (3, inside cover) has values for 29, 30, 40, 60, and 120, in addition to infinity. Levine (4, pages E7-E8) has values for 29-100 and then 110 and 112, along with infinity. I could go on, but you get the point. If you always switch to z at 30, then why have t tables that go above 28? Again, the infinity entry I understand, just not the others. Berenson states (1, page 373), "However, the t distribution has more area in the tails and less in the center than down the normal distribution. This is because sigma is unknown and we are using s to estimate it. Because we are uncertain of the value of sigma, the values of t that we observe will be more variable than for Z." So, Berenson seems to me to be saying that you always use t when you must estimate sigma using s. Levine (4, page 424) says roughly the same thing, "However, the t distribution has more area in the tails and less in the center than does the normal distribution. This is because sigma is unknown and we are using s to estimate it. Because we are uncertain of the value sigma, the values of t that we observe will be more variable than for Z." So, I conclude 1) we use z when we know the sigma and either the data is normally distributed or the sample size is greater than 30 so we can use the central limit theorem. 2) When n<30 and the data is normally distributed, we use t. 3) When n is greater than 30 and we do not know sigma, we must estimate sigma using s so we really should be using t rather than z. Now, every single business statistics book I have examined, including the four referenced below, use z values when performing hypothesis testing or computing confidence intervals when n>30. Are they 1. Wrong 2. Just oversimplifying it without telling the reader or am I overlooking something? Ronny Richardson References ---------- (1) Basic Business Statistics, Seventh Edition, Berenson and Levine. (2) Elementary Statistics: A Step by Step Approach, Third Edition, Bluman. (3) Complete Business Statistics, Fourth Edition, Aczel. (4) Statistics for Managers Using Microsoft Excel, Second Edition, Levine, Berenson, Stephan. ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================