A few weeks ago, I posted a message about when to use t and when to use z.
In reviewing the responses, it seems to me that I did a poor job of
explaining my question/concern so I am going to try again.

I have included a few references this time since one responder doubted the
items to which I was referring. The specific references are listed at the
end of this message.

Bluman has a figure (2, page 333) that is suppose to show the student "When
to Use the z or t Distribution." I have seen a similar figure in several
different textbooks. The figure is a logic diagram and the first question
is "Is sigma known?" If the answer is yes, the diagram says to use z. I do
not question this; however, I doubt that sigma is ever known in a business
situation and I only have experience with business statistics books.

If the answer is no, the next question is "Is n>=30?" If the answer is yes,
the diagram says to use z and estimate sigma with s. This is the option I
question and I will return to it briefly.

In the diagram, if the answer is no to the question about n>=30, you are to
use t. I do not question this either.

Now, regarding using z when n>=30. If we always use z when n>=30, then you
would never need a t table with greater than 28 degrees of freedom. (n<=29
would always yield df<=28.) Bluman cuts his off at 28 except for the
infinity row so he is consistent. (The infinity row shows that t becomes z
at infinity.)

However, other authors go well beyond 30. Aczel (3, inside cover) has
values for 29, 30, 40, 60, and 120, in addition to infinity. Levine (4,
pages E7-E8) has values for 29-100 and then 110 and 112, along with
infinity. I could go on, but you get the point. If you always switch to z
at 30, then why have t tables that go above 28? Again, the infinity entry I
understand, just not the others.

Berenson states (1, page 373), "However, the t distribution has more area
in the tails and less in the center than down the normal distribution. This
is because sigma is unknown and we are using s to estimate it. Because we
are uncertain of the value of sigma, the values of t that we observe will
be more variable than for Z." So, Berenson seems to me to be saying that
you always use t when you must estimate sigma using s.

Levine (4, page 424) says roughly the same thing, "However, the t
distribution has more area in the tails and less in the center than does
the normal distribution. This is because sigma is unknown and we are using
s to estimate it. Because we are uncertain of the value sigma, the values
of t that we observe will be more variable than for Z."

So, I conclude 1) we use z when we know the sigma and either the data is
normally distributed or the sample size is greater than 30 so we can use
the central limit theorem.

2) When n<30 and the data is normally distributed, we use t.

3) When n is greater than 30 and we do not know sigma, we must estimate
sigma using s so we really should be using t rather than z.

Now, every single business statistics book I have examined, including the
four referenced below, use z values when performing hypothesis testing or
computing confidence intervals when n>30.

Are they

1. Wrong
2. Just oversimplifying it without telling the reader

or am I overlooking something?

Ronny Richardson



References
----------
(1) Basic Business Statistics, Seventh Edition, Berenson and Levine.

(2) Elementary Statistics: A Step by Step Approach, Third Edition, Bluman.

(3) Complete Business Statistics, Fourth Edition, Aczel.

(4) Statistics for Managers Using Microsoft Excel, Second Edition, Levine,
Berenson, Stephan.



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to