[EMAIL PROTECTED] (Ronny Richardson) wrote in message 
news:<[EMAIL PROTECTED]>...
> A few weeks ago, I posted a message about when to use t and when to use z.
> In reviewing the responses, it seems to me that I did a poor job of
> explaining my question/concern so I am going to try again.
> 
> I have included a few references this time since one responder doubted the
> items to which I was referring. The specific references are listed at the
> end of this message.
> 
> Bluman has a figure (2, page 333) that is suppose to show the student "When
> to Use the z or t Distribution." I have seen a similar figure in several
> different textbooks. The figure is a logic diagram and the first question
> is "Is sigma known?" If the answer is yes, the diagram says to use z. I do
> not question this; however, I doubt that sigma is ever known in a business
> situation and I only have experience with business statistics books.
> 
> If the answer is no, the next question is "Is n>=30?" If the answer is yes,
> the diagram says to use z and estimate sigma with s. This is the option I
> question and I will return to it briefly.
> 
> In the diagram, if the answer is no to the question about n>=30, you are to
> use t. I do not question this either.
> 
> Now, regarding using z when n>=30. If we always use z when n>=30, then you
> would never need a t table with greater than 28 degrees of freedom. (n<=29
> would always yield df<=28.) Bluman cuts his off at 28 except for the
> infinity row so he is consistent. (The infinity row shows that t becomes z
> at infinity.)
> 
> However, other authors go well beyond 30. Aczel (3, inside cover) has
> values for 29, 30, 40, 60, and 120, in addition to infinity. Levine (4,
> pages E7-E8) has values for 29-100 and then 110 and 112, along with
> infinity. I could go on, but you get the point. If you always switch to z
> at 30, then why have t tables that go above 28? Again, the infinity entry I
> understand, just not the others.
> 
> Berenson states (1, page 373), "However, the t distribution has more area
> in the tails and less in the center than down the normal distribution. This
> is because sigma is unknown and we are using s to estimate it. Because we
> are uncertain of the value of sigma, the values of t that we observe will
> be more variable than for Z." So, Berenson seems to me to be saying that
> you always use t when you must estimate sigma using s.

Yes, but as n becomes large the difference becomes extremely small.

The question is, when is small "small enough"?

> Levine (4, page 424) says roughly the same thing, "However, the t
> distribution has more area in the tails and less in the center than does
> the normal distribution. This is because sigma is unknown and we are using
> s to estimate it. Because we are uncertain of the value sigma, the values
> of t that we observe will be more variable than for Z."
> 
> So, I conclude 1) we use z when we know the sigma and either the data is
> normally distributed or the sample size is greater than 30 so we can use
> the central limit theorem.
>
> 2) When n<30 and the data is normally distributed, we use t.
> 
> 3) When n is greater than 30 and we do not know sigma, we must estimate
> sigma using s so we really should be using t rather than z.


Uh, wait a sec. 

i) The CLT doesn't kick in at the same point for every distribution.
If the distribution is close to normal, you don't need anything like
n=30. If the distribution is (say) highly skew, then n=30 may not be
anywhere near close enough.
ii) Even at a given distribution, a sample size that's "close enough"
for one application won't necessarily be close enough for another
application.
iii) How much accuracy you get also depends on how far into the tails
you need precision. There's no point knowing the 2.5% points aren't
far out if you need it (for your application) to be accurate near the
0.25% points.
iv) the rate at which the variance approaches the appropriate multiple
of a chi-square depends on the sampling frequency. It's possible it
may never do so, but with large sample size you should generally still
get normality because of Slutzky's theorem. Even if n=30 was right
when we're talking about the mean, it won't in general also be just
right when we're dealing with what's happening with the variance (see
above).
v) the degree to which the dependence between the mean and variance
affects the distribution of the t statistic itself depends on the
distribution you're sampling from (but again, Slutzky should save you
eventually).


For these sorts of reasons, n=30 is oversimplistic. Sometimes it's far
too stringent, sometimes too weak. Better to make some assessment of
the effect of what you regard as possible situations and see if the
consequences are okay for your situation.



> Now, every single business statistics book I have examined, including the
> four referenced below, use z values when performing hypothesis testing or
> computing confidence intervals when n>30.
> 
> Are they
> 
> 1. Wrong
> 2. Just oversimplifying it without telling the reader
> 
> or am I overlooking something?

Mostly (2). 

Almost every rule of thumb like this will be nonsense at some point.

Best to decide for yourself, in knowledge of the circumstances. You're
the one that can tell if you can bear your significance level perhaps
being 7% when it's nominally 5%, or your power being 30% when
normality indicates it's 50%.

Glen


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to