subject:"When to Use t and When to Use z Revisited"

When to Use t and When to Use z Revisited

2001-12-09 Thread Ronny Richardson


A few weeks ago, I posted a message about when to use t and when to use z.
In reviewing the responses, it seems to me that I did a poor job of
explaining my question/concern so I am going to try again.

I have included a few references this time since one responder doubted the
items to which I was referring. The specific references are listed at the
end of this message.

Bluman has a figure (2, page 333) that is suppose to show the student "When
to Use the z or t Distribution." I have seen a similar figure in several
different textbooks. The figure is a logic diagram and the first question
is "Is sigma known?" If the answer is yes, the diagram says to use z. I do
not question this; however, I doubt that sigma is ever known in a business
situation and I only have experience with business statistics books.

If the answer is no, the next question is "Is n>=30?" If the answer is yes,
the diagram says to use z and estimate sigma with s. This is the option I
question and I will return to it briefly.

In the diagram, if the answer is no to the question about n>=30, you are to
use t. I do not question this either.

Now, regarding using z when n>=30. If we always use z when n>=30, then you
would never need a t table with greater than 28 degrees of freedom. (n<=29
would always yield df<=28.) Bluman cuts his off at 28 except for the
infinity row so he is consistent. (The infinity row shows that t becomes z
at infinity.)

However, other authors go well beyond 30. Aczel (3, inside cover) has
values for 29, 30, 40, 60, and 120, in addition to infinity. Levine (4,
pages E7-E8) has values for 29-100 and then 110 and 112, along with
infinity. I could go on, but you get the point. If you always switch to z
at 30, then why have t tables that go above 28? Again, the infinity entry I
understand, just not the others.

Berenson states (1, page 373), "However, the t distribution has more area
in the tails and less in the center than down the normal distribution. This
is because sigma is unknown and we are using s to estimate it. Because we
are uncertain of the value of sigma, the values of t that we observe will
be more variable than for Z." So, Berenson seems to me to be saying that
you always use t when you must estimate sigma using s.

Levine (4, page 424) says roughly the same thing, "However, the t
distribution has more area in the tails and less in the center than does
the normal distribution. This is because sigma is unknown and we are using
s to estimate it. Because we are uncertain of the value sigma, the values
of t that we observe will be more variable than for Z."

So, I conclude 1) we use z when we know the sigma and either the data is
normally distributed or the sample size is greater than 30 so we can use
the central limit theorem.

2) When n<30 and the data is normally distributed, we use t.

3) When n is greater than 30 and we do not know sigma, we must estimate
sigma using s so we really should be using t rather than z.

Now, every single business statistics book I have examined, including the
four referenced below, use z values when performing hypothesis testing or
computing confidence intervals when n>30.

Are they

1. Wrong
2. Just oversimplifying it without telling the reader

or am I overlooking something?

Ronny Richardson



References
--
(1) Basic Business Statistics, Seventh Edition, Berenson and Levine.

(2) Elementary Statistics: A Step by Step Approach, Third Edition, Bluman.

(3) Complete Business Statistics, Fourth Edition, Aczel.

(4) Statistics for Managers Using Microsoft Excel, Second Edition, Levine,
Berenson, Stephan.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-09 Thread Donald Burrill


On Sun, 9 Dec 2001, Ronny Richardson wrote in part:

> Bluman has a figure (2, page 333) that is supposed to show the student
> "When to Use the z or t Distribution."  I have seen a similar figure in
> several different textbooks. 

So have I, sometimes as a diagram or flow chart, sometimes in paragraph 
or outline form.

> The figure is a logic diagram and the first question is "Is sigma
> known?" If the answer is yes, the diagram says to use z. I do not 
> question this;  however, I doubt that sigma is ever known in a business 
> situation and I only have experience with business statistics books. 

Depends partly on what parameter one is addressing (either as a 
hypothesis test or as a confidence interval).  For the mean of an unknown 
empirical distribution, I expect you're right.  But for the proportion of 
persons in a population who would want to purchase (for a currently 
topical example) a Segway, the population variance is a known function of 
the proportion (the underlying distribution being, presumably, binomial), 
and for this case the t distribution is simply inappropriate, and one 
ought to use either the proper binomial distribution function, or else 
the normal approximation to the binomial (perhaps after satisfying 
oneself that N is sufficiently large for the approximation to be credible 
with the hypothesized (or observed) value of the proportion;  various 
textbook authors offer assorted recipes for this purpose).

{  Snip, discourse on N >= 30, although I'd 
   think it were rather on  df >= 30.  }

> However, other authors go well beyond 30.  Aczel (3, inside cover) has
> values for 29, 30, 40, 60, and 120, in addition to infinity.  Levine 
> (4, pages E7-E8) has values for 29-100 and then 110 and 112, along with 
> infinity.  I could go on, but you get the point.  If you always switch 
> to z at 30, then why have t tables that go above 28?  Again, the 
> infinity entry I understand, just not the others. 

{  Snip, assorted quotes ...  }

> So, Berenson seems to me to be saying that you always use t when you
> must estimate sigma using s.  Levine (4, page 424) says roughly the 
> same thing, ...

> So, I conclude  {slightly edited -- DB}

> 1) we use z when we know the sigma and either the data are normally
> distributed or the sample size is greater than 30 so we can use the
> central limit theorem. 

I would amend this to "the sample size is large enough that we can..." 
Whether 30 is in fact large enough or not depends rather heavily on what 
the true shape of the parent population actually is.  (If it's roughly 
symmetrical and bell-shaped, 30 may be O.K.)

> 2) When n<30 and the data are normally distributed, we use t. 

> 3) When n is greater than 30 and we do not know sigma, we must estimate 
> sigma using s so we really should be using t rather than z. 

> Now, every single business statistics book I have examined, including 
> the four referenced below, use z values when performing hypothesis 
> testing or computing confidence intervals when n>30. 

> Are they 

> 1. Wrong 
> 2. Just oversimplifying it without telling the reader 

> or am I overlooking something? 

I vote for both 1. and 2., since 2. is in my view a subset of 1, although 
others may not share this opinion.  I would add 

  3.  Outdated.

on the grounds that when sigma is unknown, the proper distribution is t 
(unless N is small and the parent population is screwy) regardless how 
large the sample size may be.  The main (if not the only) reason for the 
apparent logical bifurcation at N = 30 or thereabouts was that, when 
one's only sources of information about critical values were printed 
tables, 30 lines was about what fit on one page (plus maybe a few extra 
lines for 40, 60, 120 d.f.) and one could not (or at any rate did not) 
expect one's business students to have convenient access to more 
extensive tables of the t distribution.  And, one suspects latterly, 
authors were skeptical that students would pay attention to (or perhaps 
be able to master?) the technique of interpolating by reciprocals between 
30 df and larger numbers of df (particularly including infinity). 

But currently, _I_ would not expect business students to carry out the 
calculations for hypothesis tests, or confidence intervals, by hand, 
except maybe half a dozen times in class for the good of their souls:  
I'd expect them to learn to invoke a statistical package, or else 
something like Excel that pretends to supply adequate statistical 
routines.  And for all the packages I know of, there is a built-in 
function for calculating, or approximating, the cumulative distribution 
of t for ANY number of df.  The advice in any _current_ business-
statistics text ought to be, therefore, to use t _whenever_ sigma is not 
known.  And if the textbook isn't up to that standard, the instructor 
jolly well should be.

{  Snip, references.  See the original post for more details.  }

-

Re: When to Use t and When to Use z Revisited

2001-12-09 Thread Jim Snow


"Ronny Richardson" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...

> A few weeks ago, I posted a message about when to use t and when to use z.

I did not see the earlier postings, so forgive me if I repeat advice already
given.:-)

1. The consequences of using the t distribution instead of the normal
distribution for sample sizes greater than 30 are of no importance in
practice. The difference in the numbers given as confidence limits are so
small that no sensible person would change their course of action based on
that miniscule variation. In the case of a significance test a result just
over or just under, say, the 5% level should always be examined in the
knowledge that the 5% is an arbitrary level and that a level of 4.9%  or
5.1%  could equally well have been chosen.

2. There is no good reason for statistical tables for use in practical
analysis of data to give figures for t on numbers of degrees of freedom over
30 except that it makes it simple to routinely use one set of tables when
the variance is estimated from the sample.
Another reason that books of tables do not include t values for degrees of
freedom between 30,60,sometimes 120 and infinity is that there is no
need,even for the extreme tails of the distribution and when ,for whatever
reason, high accuracy is required, because the intermediate values can be
obtained by harmonic interpolation. That is, the tail entries in the
distribution can be  obtained by linear interpolation on 1/n.

3. There are situations where the error variance is known. They
generally arise when the errors in the data arise from the use of a
measuring instrument with known accuracy or when the figures available are
known to be truncated to a certain number of decimal places. For example:
Several drivers use cars in a car pool. The distance tavelled on each
trip by a driver is recorded, based on the odometer reading. Each
observation has an error which is uniformly distributed in (0,0.2). The
variance of this error is (0.2)^2)/12  = .00  and standard deviation
0.0578  . To calculate confidence limits for the average distance travelled
by each driver, the z statistic should be used.

A similar situation could arise in dealing with data in which the error
arises from the rounding of all numbers to the nearest thousand.

   This is an uncommon situation in a business context, but it arises
quite often in scientific work where the inherent accuracy of a measuring
instrument may be known from long experience and need not be estimated from
the small sample currently being examined.

4. You seem to think the Central Limit Theorem is behind the validity of
t vs z tables. This is not so. The CLT only bears on the Normal shape and
the relation of the variance of an average or sum to the population
variance.

Commenting specifically on points in your posting:

"Ronny Richardson" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...

> A few weeks ago, I posted a message about when to use t and when to use z.
(snip)
> So, I conclude 1) we use z when we know the sigma and either the data is
> normally distributed or the sample size is greater than 30

   Yes, but the difference if you use t is tiny and of no importance.

>so we can use the central limit theorem.

No. The CLT is not the reason. The CLT ensures that the average and
sum are Normally distributed for large enough n. Unless the data is very
skewed or bimodal, n=5 is usually large enough in practice. This is a
separate issue to the choice of Normal or t distribution for inference.
>
> 2) When n<30 and the data is normally distributed, we use t.
>
> 3) When n is greater than 30 and we do not know sigma, we must estimate
> sigma using s so we really should be using t rather than z.

but the difference in the resulting numbers is miniscule and of no
importance.
>
> Now, every single business statistics book I have examined, including the
> four referenced below, use z values when performing hypothesis testing or
> computing confidence intervals when n>30.
>
> Are they
>
> 1. Wrong
> 2. Just oversimplifying it without telling the reader
>
> or am I overlooking something?
>
> Ronny Richardson
>
I hope that helps
Jim Snow




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-09 Thread Glen


[EMAIL PROTECTED] (Ronny Richardson) wrote in message 
news:<[EMAIL PROTECTED]>...
> A few weeks ago, I posted a message about when to use t and when to use z.
> In reviewing the responses, it seems to me that I did a poor job of
> explaining my question/concern so I am going to try again.
> 
> I have included a few references this time since one responder doubted the
> items to which I was referring. The specific references are listed at the
> end of this message.
> 
> Bluman has a figure (2, page 333) that is suppose to show the student "When
> to Use the z or t Distribution." I have seen a similar figure in several
> different textbooks. The figure is a logic diagram and the first question
> is "Is sigma known?" If the answer is yes, the diagram says to use z. I do
> not question this; however, I doubt that sigma is ever known in a business
> situation and I only have experience with business statistics books.
> 
> If the answer is no, the next question is "Is n>=30?" If the answer is yes,
> the diagram says to use z and estimate sigma with s. This is the option I
> question and I will return to it briefly.
> 
> In the diagram, if the answer is no to the question about n>=30, you are to
> use t. I do not question this either.
> 
> Now, regarding using z when n>=30. If we always use z when n>=30, then you
> would never need a t table with greater than 28 degrees of freedom. (n<=29
> would always yield df<=28.) Bluman cuts his off at 28 except for the
> infinity row so he is consistent. (The infinity row shows that t becomes z
> at infinity.)
> 
> However, other authors go well beyond 30. Aczel (3, inside cover) has
> values for 29, 30, 40, 60, and 120, in addition to infinity. Levine (4,
> pages E7-E8) has values for 29-100 and then 110 and 112, along with
> infinity. I could go on, but you get the point. If you always switch to z
> at 30, then why have t tables that go above 28? Again, the infinity entry I
> understand, just not the others.
> 
> Berenson states (1, page 373), "However, the t distribution has more area
> in the tails and less in the center than down the normal distribution. This
> is because sigma is unknown and we are using s to estimate it. Because we
> are uncertain of the value of sigma, the values of t that we observe will
> be more variable than for Z." So, Berenson seems to me to be saying that
> you always use t when you must estimate sigma using s.

Yes, but as n becomes large the difference becomes extremely small.

The question is, when is small "small enough"?

> Levine (4, page 424) says roughly the same thing, "However, the t
> distribution has more area in the tails and less in the center than does
> the normal distribution. This is because sigma is unknown and we are using
> s to estimate it. Because we are uncertain of the value sigma, the values
> of t that we observe will be more variable than for Z."
> 
> So, I conclude 1) we use z when we know the sigma and either the data is
> normally distributed or the sample size is greater than 30 so we can use
> the central limit theorem.
>
> 2) When n<30 and the data is normally distributed, we use t.
> 
> 3) When n is greater than 30 and we do not know sigma, we must estimate
> sigma using s so we really should be using t rather than z.


Uh, wait a sec. 

i) The CLT doesn't kick in at the same point for every distribution.
If the distribution is close to normal, you don't need anything like
n=30. If the distribution is (say) highly skew, then n=30 may not be
anywhere near close enough.
ii) Even at a given distribution, a sample size that's "close enough"
for one application won't necessarily be close enough for another
application.
iii) How much accuracy you get also depends on how far into the tails
you need precision. There's no point knowing the 2.5% points aren't
far out if you need it (for your application) to be accurate near the
0.25% points.
iv) the rate at which the variance approaches the appropriate multiple
of a chi-square depends on the sampling frequency. It's possible it
may never do so, but with large sample size you should generally still
get normality because of Slutzky's theorem. Even if n=30 was right
when we're talking about the mean, it won't in general also be just
right when we're dealing with what's happening with the variance (see
above).
v) the degree to which the dependence between the mean and variance
affects the distribution of the t statistic itself depends on the
distribution you're sampling from (but again, Slutzky should save you
eventually).


For these sorts of reasons, n=30 is oversimplistic. Sometimes it's far
too stringent, sometimes too weak. Better to make some assessment of
the effect of what you regard as possible situations and see if the
consequences are okay for your situation.



> Now, every single business statistics book I have examined, including the
> four referenced below, use z values when performing hypothesis testing or
> computing confidence intervals when n>30.
> 
>

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread kjetil halvorsen




Ronny Richardson wrote:
> 
> A few weeks ago, I posted a message about when to use t and when to use z.
> In reviewing the responses, it seems to me that I did a poor job of
> explaining my question/concern so I am going to try again.
> 
> I have included a few references this time since one responder doubted the
> items to which I was referring. The specific references are listed at the
> end of this message.
> 
> Bluman has a figure (2, page 333) that is suppose to show the student "When
> to Use the z or t Distribution." I have seen a similar figure in several
> different textbooks. The figure is a logic diagram and the first question
> is "Is sigma known?" If the answer is yes, the diagram says to use z. I do
> not question this; however, I doubt that sigma is ever known in a business
> situation and I only have experience with business statistics books.
> 
> If the answer is no, the next question is "Is n>=30?" If the answer is yes,
> the diagram says to use z and estimate sigma with s. This is the option I
> question and I will return to it briefly.
> 
> In the diagram, if the answer is no to the question about n>=30, you are to
> use t. I do not question this either.
> 
> Now, regarding using z when n>=30. If we always use z when n>=30, then you
> would never need a t table with greater than 28 degrees of freedom. (n<=29
> would always yield df<=28.) Bluman cuts his off at 28 except for the
> infinity row so he is consistent. (The infinity row shows that t becomes z
> at infinity.)
> 
> However, other authors go well beyond 30. Aczel (3, inside cover) has
> values for 29, 30, 40, 60, and 120, in addition to infinity. Levine (4,
> pages E7-E8) has values for 29-100 and then 110 and 112, along with
> infinity. I could go on, but you get the point. If you always switch to z
> at 30, then why have t tables that go above 28? Again, the infinity entry I
> understand, just not the others.
> 
> Berenson states (1, page 373), "However, the t distribution has more area
> in the tails and less in the center than down the normal distribution. This
> is because sigma is unknown and we are using s to estimate it. Because we
> are uncertain of the value of sigma, the values of t that we observe will
> be more variable than for Z." So, Berenson seems to me to be saying that
> you always use t when you must estimate sigma using s.
> 
> Levine (4, page 424) says roughly the same thing, "However, the t
> distribution has more area in the tails and less in the center than does
> the normal distribution. This is because sigma is unknown and we are using
> s to estimate it. Because we are uncertain of the value sigma, the values
> of t that we observe will be more variable than for Z."
> 
> So, I conclude 1) we use z when we know the sigma and either the data is
> normally distributed or the sample size is greater than 30 so we can use
> the central limit theorem.
> 
> 2) When n<30 and the data is normally distributed, we use t.
> 
> 3) When n is greater than 30 and we do not know sigma, we must estimate
> sigma using s so we really should be using t rather than z.
> 
> Now, every single business statistics book I have examined, including the
> four referenced below, use z values when performing hypothesis testing or
> computing confidence intervals when n>30.
> 
> Are they
> 
> 1. Wrong
> 2. Just oversimplifying it without telling the reader 

They are not oversimplifying, they are  complexifying. To quote Polya
"How to solve it" : "If you need rules, use this one first: 1) Use your
own brains first".

Sigma is hardly ever known, so you must use t. Then why not simply tell
the students: "use the t table as far as it goes, (usually around
n=120), and after that, use the n=\infty line (which corresponds to the
normal distribution). Then there is no need for a rule for "when to use
z, when to use t".

Kjetil Halvorsen
> 
> or am I overlooking something?
> 
> Ronny Richardson
> 
> References
> --
> (1) Basic Business Statistics, Seventh Edition, Berenson and Levine.
> 
> (2) Elementary Statistics: A Step by Step Approach, Third Edition, Bluman.
> 
> (3) Complete Business Statistics, Fourth Edition, Aczel.
> 
> (4) Statistics for Managers Using Microsoft Excel, Second Edition, Levine,
> Berenson, Stephan.
> 
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Art Kendall


This is a multi-part message in MIME format.
--F89CEF3F1CDF5660163AA634
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

If your conclusion differs whether you use t or z, your decision is "at
the
edge".


The total uncertainty (T)  in a decision has two parts, sampling error
(S) ,
and everything else (N). We can get a rough handle on the sampling error
which
the t or z help put in perspective.  The "nonsampling" uncertainty has
to be
taken into account subjectively. However, think of both as
"nonnegative". So
the sampling uncertainty (S) is the bare minimum of total uncertainty
(T).

If you always use t, your confidence intervals will be a little wider.
You will
make slightly more conservative decisions. The question you have to ask
yourself is whether with 30 cases it really makes a difference if , for
example, your margin of error is +/- $1000 or is +/- $1042.  If you are
doing
"back of the envelope" calculations, you can (a) look up a t, (b) use
1.96 or
(c) commit heresy for both camps and use 2 which easier to multiply by.

If you are writing a program as an exercise use t.

Run the following SPSS syntax to get a handle on how far off  you might
be by using t instead of z.

*your margin of error (one side of confidence interval)
* by using t instead of z
*will be a small percentage wider.  see the variable mult.
new file.
input program.
loop df = 30 to 200.
compute t=idf.t(.975,df).
compute z=idf.normal(.975,0,1).
end case.
end loop.
end file.
end input program.
compute mult = (100*t)/z.
formats t z (f6.2) mult (pct8.2).
list.
execute.
cache.

Ronny Richardson wrote:

> A few weeks ago, I posted a message about when to use t and when to use z.
> In reviewing the responses, it seems to me that I did a poor job of
> explaining my question/concern so I am going to try again.
>
> I have included a few references this time since one responder doubted the
> items to which I was referring. The specific references are listed at the
> end of this message.
>
> Bluman has a figure (2, page 333) that is suppose to show the student "When
> to Use the z or t Distribution." I have seen a similar figure in several
> different textbooks. The figure is a logic diagram and the first question
> is "Is sigma known?" If the answer is yes, the diagram says to use z. I do
> not question this; however, I doubt that sigma is ever known in a business
> situation and I only have experience with business statistics books.
>
> If the answer is no, the next question is "Is n>=30?" If the answer is yes,
> the diagram says to use z and estimate sigma with s. This is the option I
> question and I will return to it briefly.
>
> In the diagram, if the answer is no to the question about n>=30, you are to
> use t. I do not question this either.
>
> Now, regarding using z when n>=30. If we always use z when n>=30, then you
> would never need a t table with greater than 28 degrees of freedom. (n<=29
> would always yield df<=28.) Bluman cuts his off at 28 except for the
> infinity row so he is consistent. (The infinity row shows that t becomes z
> at infinity.)
>
> However, other authors go well beyond 30. Aczel (3, inside cover) has
> values for 29, 30, 40, 60, and 120, in addition to infinity. Levine (4,
> pages E7-E8) has values for 29-100 and then 110 and 112, along with
> infinity. I could go on, but you get the point. If you always switch to z
> at 30, then why have t tables that go above 28? Again, the infinity entry I
> understand, just not the others.
>
> Berenson states (1, page 373), "However, the t distribution has more area
> in the tails and less in the center than down the normal distribution. This
> is because sigma is unknown and we are using s to estimate it. Because we
> are uncertain of the value of sigma, the values of t that we observe will
> be more variable than for Z." So, Berenson seems to me to be saying that
> you always use t when you must estimate sigma using s.
>
> Levine (4, page 424) says roughly the same thing, "However, the t
> distribution has more area in the tails and less in the center than does
> the normal distribution. This is because sigma is unknown and we are using
> s to estimate it. Because we are uncertain of the value sigma, the values
> of t that we observe will be more variable than for Z."
>
> So, I conclude 1) we use z when we know the sigma and either the data is
> normally distributed or the sample size is greater than 30 so we can use
> the central limit theorem.
>
> 2) When n<30 and the data is normally distributed, we use t.
>
> 3) When n is greater than 30 and we do not know sigma, we must estimate
> sigma using s so we really should be using t rather than z.
>
> Now, every single business statistics book I have examined, including the
> four referenced below, use z values when performing hypothesis testing or
> computing confidence intervals when n>30.
>
> Are they
>
> 1. Wrong
> 2. Just oversimplifying it without telling the reader
>
> or am

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Robert J. MacG. Dawson

Ronny Richardson wrote:
> 

> Are they
> 
> 1. Wrong
> 2. Just oversimplifying it without telling the reader

Neither, really. The MAIN objection to "z over 30" is that it adds an
an unnecessary step to the decision process. If it actually simplified
things greatly I reckon we could live with the slightly wonky p-values
(as we do when we use ANOVA in the knowledge that we do not have perfect
homoscedasticity).  But it makes things more complicated...

A true cynic might say that there is one advantage to keeping the
procedure in the textbooks - it wil occasionally give the readers of
articles warning that the writer has learned statistics by rote.

-Robert Dawson

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Dennis Roberts


At 04:14 AM 12/10/01 +, Jim Snow wrote:
>"Ronny Richardson" <[EMAIL PROTECTED]> wrote in message
>[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
>
> > A few weeks ago, I posted a message about when to use t and when to use z.
>
>I did not see the earlier postings, so forgive me if I repeat advice already
>given.:-)
>
> 1. The consequences of using the t distribution instead of the normal
>distribution for sample sizes greater than 30 are of no importance in
>practice.

what's magical about 30? i say 33 ... no actually, i amend that to 28

> 2. There is no good reason for statistical tables for use in practical
>analysis of data to give figures for t on numbers of degrees of freedom over
>30 except that it makes it simple to routinely use one set of tables when
>the variance is estimated from the sample.

with software, there is no need for tables ... period!


> 3. There are situations where the error variance is known. They
>generally arise when the errors in the data arise from the use of a
>measuring instrument with known accuracy or when the figures available are
>known to be truncated to a certain number of decimal places. For example:
> Several drivers use cars in a car pool. The distance tavelled on each
>trip by a driver is recorded, based on the odometer reading. Each
>observation has an error which is uniformly distributed in (0,0.2). The
>variance of this error is (0.2)^2)/12  = .00  and standard deviation
>0.0578  . To calculate confidence limits for the average distance travelled
>by each driver, the z statistic should be used.

this is pure speculation ... i have yet to hear of any convincing case 
where the variance is known but, the mean is not


_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Gus Gassmann

Dennis Roberts wrote:

> this is pure speculation ... i have yet to hear of any convincing case
> where the variance is known but, the mean is not

What about that other application used so prominently in texts of
business statistics, testing for a proportion?

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Jon Cryer

But then you should use a binomial (or hypergeometric)
distribution.
Jon Cryer
p.s. Of course, you might approximate
by an appropriate normal distribution.
At 11:39 AM 12/10/01 -0400, you wrote:
Dennis Roberts wrote:
> this is pure speculation ... i have yet to hear of any convincing
case
> where the variance is known but, the mean is not
What about that other application used so prominently in texts of
business statistics, testing for a proportion?
=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

http://jse.stat.ncsu.edu/
=

Jon Cryer, Professor Emeritus
Dept. of Statistics
www.stat.uiowa.edu/~jcryer

 and Actuarial Science   office 319-335-0819
The University of Iowa home   319-351-4639
Iowa City, IA 52242    FAX    319-335-3017 
"It ain't so much the things we don't know that get us into trouble. 
It's the things we do know that just ain't so." --Artemus Ward

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Jerry Dallal


Dennis Roberts wrote:

> this is pure speculation ... i have yet to hear of any convincing case
> where the variance is known but, the mean is not

A scale (weighing device) with known precision.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Jon Cryer

I always thought that the precision of a scale was
proportional
to the amount weighed. So don't you have to know the mean
before you
know the standard deviation? But wait a minute - we are trying
assess
the size of the mean!
Jon Cryer
At 03:42 PM 12/10/01 +, you wrote:
Dennis Roberts wrote:
> this is pure speculation ... i have yet to hear of any convincing
case
> where the variance is known but, the mean is not
A scale (weighing device) with known precision.

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at

http://jse.stat.ncsu.edu/
=

Jon Cryer, Professor Emeritus
Dept. of Statistics
www.stat.uiowa.edu/~jcryer

 and Actuarial Science   office 319-335-0819
The University of Iowa home   319-351-4639
Iowa City, IA 52242    FAX    319-335-3017 
"It ain't so much the things we don't know that get us into trouble. 
It's the things we do know that just ain't so." --Artemus Ward

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Art Kendall


the sample mean of the dichotomous (one_zero, dummy) variable is known, It
is the proportion.

Gus Gassmann wrote:

> Dennis Roberts wrote:
>
> > this is pure speculation ... i have yet to hear of any convincing case
> > where the variance is known but, the mean is not
>
> What about that other application used so prominently in texts of
> business statistics, testing for a proportion?



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Gus Gassmann


Art Kendall wrote:

(putting below the previous quotes for readability)

> Gus Gassmann wrote:
>
> > Dennis Roberts wrote:
> >
> > > this is pure speculation ... i have yet to hear of any convincing case
> > > where the variance is known but, the mean is not
> >
> > What about that other application used so prominently in texts of
> > business statistics, testing for a proportion?

> the sample mean of the dichotomous (one_zero, dummy) variable is known, It
> is the proportion.

Sure. But when you test Ho: p = p0, you know (or pretend to  know) the
population variance. So if the CLT applies, you should use a z-table, no?





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Art Kendall


Usually I would use software.  As I tried to show is the sample syntax I posted
earlier, it doesn't usually make much difference whether you use z or t.

Gus Gassmann wrote:

> Art Kendall wrote:
>
> (putting below the previous quotes for readability)
>
> > Gus Gassmann wrote:
> >
> > > Dennis Roberts wrote:
> > >
> > > > this is pure speculation ... i have yet to hear of any convincing case
> > > > where the variance is known but, the mean is not
> > >
> > > What about that other application used so prominently in texts of
> > > business statistics, testing for a proportion?
>
> > the sample mean of the dichotomous (one_zero, dummy) variable is known, It
> > is the proportion.
>
> Sure. But when you test Ho: p = p0, you know (or pretend to  know) the
> population variance. So if the CLT applies, you should use a z-table, no?



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Jon Cryer


Only as an approximation.

At 12:57 PM 12/10/01 -0400, you wrote:
>Art Kendall wrote:
>
>(putting below the previous quotes for readability)
>
> > Gus Gassmann wrote:
> >
> > > Dennis Roberts wrote:
> > >
> > > > this is pure speculation ... i have yet to hear of any convincing case
> > > > where the variance is known but, the mean is not
> > >
> > > What about that other application used so prominently in texts of
> > > business statistics, testing for a proportion?
>
> > the sample mean of the dichotomous (one_zero, dummy) variable is known, It
> > is the proportion.
>
>Sure. But when you test Ho: p = p0, you know (or pretend to  know) the
>population variance. So if the CLT applies, you should use a z-table, no?
>
>
>
>
>
>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
>=



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Gus Gassmann

Jon Cryer wrote:

>  But then you should use a binomial (or hypergeometric) distribution.
>
> Jon Cryer
>
> p.s. Of course, you might approximate by an appropriate normal
> distribution.

Quite, and then you are in a situation where you know (or at least
pretend to know)
the population variance, the situation Dennis Roberts was interested
in.

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Dennis Roberts


At 03:42 PM 12/10/01 +, Jerry Dallal wrote:
>Dennis Roberts wrote:
>
> > this is pure speculation ... i have yet to hear of any convincing case
> > where the variance is known but, the mean is not
>
>A scale (weighing device) with known precision.

as far as i know ... knowing the precision is expressed in terms of ... 
'accurate to within' ... and if there is ANY 'within' attached ... then 
accuracy for SURE is not known





>=
>Instructions for joining and leaving this list and remarks about
>the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
>=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Rich Ulrich


On Mon, 10 Dec 2001 12:57:29 -0400, Gus Gassmann
<[EMAIL PROTECTED]> wrote:

> Art Kendall wrote:
> 
> (putting below the previous quotes for readability)
> 
> > Gus Gassmann wrote:
> >
> > > Dennis Roberts wrote:
> > >
> > > > this is pure speculation ... i have yet to hear of any convincing case
> > > > where the variance is known but, the mean is not
> > >
> > > What about that other application used so prominently in texts of
> > > business statistics, testing for a proportion?
> 
> > the sample mean of the dichotomous (one_zero, dummy) variable is known, It
> > is the proportion.
GG > 
> Sure. But when you test Ho: p = p0, you know (or pretend to  know) the
> population variance. So if the CLT applies, you should use a z-table, no?
> 

That is the textbook justification for chi-squared and z  tests
in the sets of 'nonparametric tests'  which are based on 
rank-order transformations and dichotomizing.

The variance is known, so the test statistic has the shorter tails.
(It works for ranks when you don't have ties.)

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Vadim and Oxana Marmer


besides, who needs those tables? we have computers now, don't we?
I was told that there were tables for logarithms once. I have not seen one
in my life. Is not it the same kind of stuff?

>
>   3.  Outdated.
>
> on the grounds that when sigma is unknown, the proper distribution is t
> (unless N is small and the parent population is screwy) regardless how
> large the sample size may be.  The main (if not the only) reason for the
> apparent logical bifurcation at N = 30 or thereabouts was that, when
> one's only sources of information about critical values were printed
> tables, 30 lines was about what fit on one page (plus maybe a few extra
> lines for 40, 60, 120 d.f.) and one could not (or at any rate did not)
> expect one's business students to have convenient access to more
> extensive tables of the t distribution.  And, one suspects latterly,
> authors were skeptical that students would pay attention to (or perhaps
> be able to master?) the technique of interpolating by reciprocals between
> 30 df and larger numbers of df (particularly including infinity).
>
> But currently, _I_ would not expect business students to carry out the
> calculations for hypothesis tests, or confidence intervals, by hand,
> except maybe half a dozen times in class for the good of their souls:
> I'd expect them to learn to invoke a statistical package, or else
> something like Excel that pretends to supply adequate statistical
> routines.  And for all the packages I know of, there is a built-in
> function for calculating, or approximating, the cumulative distribution
> of t for ANY number of df.  The advice in any _current_ business-
> statistics text ought to be, therefore, to use t _whenever_ sigma is not
> known.  And if the textbook isn't up to that standard, the instructor
> jolly well should be.
>



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Vadim and Oxana Marmer


> 3) When n is greater than 30 and we do not know sigma, we must estimate
> sigma using s so we really should be using t rather than z.


you are wrong. you use t-distribution not because you don't know sigma,
but because your statistic has EXACT t-distribution under certain
conditions. I know that the textbook says "if we knew sigma then the
distribution would be normal, but because we used s instead the
distribution turned out to be t". It does not say how exactly it becomes
t, so you make the conclusion: use t instead of normal whenever you use s
instead of sigma. But it's wrong, it does not go like this.

when you don't know underlying distribution of the sample you may use
normal distribution (under certain regularity conditions),
as an APPROXIMATION to the actual distribution of your statistic.
approximate distribution in most cases is not parameter-free, it may
depend, for example, on unknown sigma. in such situation you may replace
the
unknown parameter by its consistent estimator.the  approximate
distribution is
still normal. think about it as iterated approximation. first you
approximate the actual distribution by N(0,sigma^2), then you approximate
it by N(0,S^2), where S^2 is a consistent estimator for sigma. there are
formal theorems that allow you to do this kind of thigs.

The essential difference between two approaches is that the first one
tries to derive the
EXACT disribution, second says I will use APPROXIMATION.

number 30 has no importance at all, throw away all the tables you have. I
cannot believe they still teach you this stuff. I wish it was that
simle:30!

Your confusion is the result of oversimplification and desire to provide
students with simple stratagies which present in basic statistics
textbooks. I guess it makes teaching very simple, but it mislead students.
Your confusion is an example. The problem is that there is no simple strategies,
and things are much-much more complicated than they appear in basic textbooks.
Basic text books don't tell you the whole story, and they don't even try,
because you simply cannot do this at their level. Don't make any strong
conclusions after reading only basic textbooks.

In practice, in business and economics statistics, nobody uses
t-tests, but normal and chi-square approximations are used a lot. The
assumptions that you have to make for t-test are too strong.







=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Vadim and Oxana Marmer


>
> Sigma is hardly ever known, so you must use t. Then why not simply tell
> the students: "use the t table as far as it goes, (usually around
> n=120), and after that, use the n=\infty line (which corresponds to the
> normal distribution). Then there is no need for a rule for "when to use
> z, when to use t".
>

but the data is not normal either in 99.9(9) of the cases. Furthermore,
the data that you see in economics/business is very often is not  an iid
sample either. So, one way or another you end up with normal or
chi-square.

actually, there is an alternative to both approaches. it's bootstrap. but
it does not always work and should not be used blindly.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Logarithms (was: When to Use t and When to Use z Revisited)

2001-12-25 Thread Donald Burrill

On Tue, 11 Dec 2001, Vadim and Oxana Marmer wrote:

> besides, who needs those tables? we have computers now, don't we?
> I was told that there were tables for logarithms once. I have not seen 
> one in my life. Is not it the same kind of stuff?

If you _want_ to see one, you have no farther to go than to Sterling 
Library and look up what there is under "mathematical tables".  (Unless, 
in the years since I worked there as an undergraduate, they've thrown 
them all out, which I would hope to be unlikely.)

-- DFB.

 Donald F. Burrill [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110  603-471-7128

=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

Re: Logarithms (was: When to Use t and When to Use z Revisited)

2001-12-25 Thread Vadim and Oxana Marmer


closed, winter break. no chance to see it this year.

On 25 Dec 2001, Donald Burrill wrote:

> On Tue, 11 Dec 2001, Vadim and Oxana Marmer wrote:
>
> > besides, who needs those tables? we have computers now, don't we?
> > I was told that there were tables for logarithms once. I have not seen
> > one in my life. Is not it the same kind of stuff?
>
> If you _want_ to see one, you have no farther to go than to Sterling
> Library and look up what there is under "mathematical tables".  (Unless,
> in the years since I worked there as an undergraduate, they've thrown
> them all out, which I would hope to be unlikely.)
>
>   -- DFB.
>  
>  Donald F. Burrill [EMAIL PROTECTED]
>  184 Nashua Road, Bedford, NH 03110  603-471-7128
>
>
>
> =
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>   http://jse.stat.ncsu.edu/
> =
>



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=

When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: When to Use t and When to Use z Revisited

Re: Logarithms (was: When to Use t and When to Use z Revisited)

Re: Logarithms (was: When to Use t and When to Use z Revisited)

24 matches

Site Navigation

Mail list logo

Footer information