Hi Don,

Thanks for your reply - as usual, thorough, with lots to think about.

As far as I can make out, the address for the ANZStat list must have changed - my own email did not go through to it.

On Monday, November 18, 2002, at 02:32 PM, Donald Burrill wrote:

[I suspect this reply will not be broadcast to ANZstat, as I am not a
member of that list;  you may (or not!) want to forward it, Alan.]

On Mon, 18 Nov 2002, Alan McLean wrote:

I have a couple of questions, one of which has been bubbling round
in my mind for some years, the other is more recent. The recent one
is the following:

The use of the t distribution in inference on the mean is on the
whole straightforward;  my question relates to the theory underlying
this use.  If Z = (X - mu)/sigma is ~ N(0, 1), then is
 T = (X - mu)/s (where s is the sample SD based on a simple random
sample of size n) ~ t(n-1)?
Short answer: Yes.
Longer answer: the number of degrees of freedom for the t distribution
for such a statistic is the number of degrees of freedom associated with
the variance estimate (well, with its square root) in the denominator.
This is certainly the case. However, my uncertainty lies in that the sample on which the variance estimate s^2 is based is not in itself linked to the value of X. A related question is whether or not in the usual application T = (Xbar - mu)/[s/sqrt(n)] the sample SD used, s, has to refer to the same sample as Xbar?


My second question is on the matter of confidence intervals.  <snip>

The expression P(Xbar - 1.96 x SE < mu < Xbar + 1.96 x SE) = 0.95 is
a perfectly good prediction interval - it expresses the probability
of getting a sample mean which satisfies this inequality.

Now replace the RV Xbar by the observed sample value to give the
interval:  xbar - 1.96 x SE < mu < xbar + 1.96 x SE. This is of course
the confidence interval on the population mean mu.
Minor quibble:  Provided "SE" is the population standard error of the
mean (and supposing that you're specifying a 95% C.I.), which is
consistent with the notation you specified.
That was to be understood.


Whatever is said in the text books, this is understood by most
people as a statement that "mu lies in the interval with probability
0.95" - or something very close to this.  In effect, we define a
secondary notional variable Y which imagines that we could find out
the 'true' value of mu; Y = 1 if this true value is in the
confidence interval, = 0 otherwise - and we estimate the probability
that Y = 1 as 0.95.
Interesting concept,  I'll have to think about that "notional variable"
a bit.

I have been teaching statistics for 30-odd years and have become
more and more disillusioned with the treatment of confidence
intervals in the text books!
I think the earlier approaches that began with hypothesis testing before
introducing the idea of confidence intervals were superior to what I've
been encountering of late, where C.I.s are introduced first (often, one
suspects, before many students have managed to internalize the idea of
probability at all thoroughly), and hypothesis testing appears later.

So my question is:  how do YOU explain to students what a confidence
interval REALLY is?
A C.I. is an observed instance of a random variable, representing the
range of values that one might specify in a null hypothesis and NOT have
the hypothesis rejected.
How can this be? The acceptance region of the test is based on the hypothesised value, while the confidence interval is based on the observed value. This is I think another way of expressing the root of my uncertainty.

Where an acceptance region (which I hope I'll
have had occasion to explain previously!) is an interval centered on the
value specified in a null hypothesis, and represents the range of
possible values of the sample mean that would NOT lead to rejection of
that hypothesis; a C.I. is an interval centered on the observed sample
mean (which is why it is a value of a random variable: tomorrow, if you
went out and looked again, you'd probably find a different value of the
sample mean, hence a different C.I.), and represents (as above) the
range of values of possible null hypotheses that could NOT be rejected
under the conditions of the current experiment.

I usually added, in introducing a C.I. in the first place, that
sometimes one has an obvious null hypothesis (that is, an obvious null-
hypothetical value of a parameter) to test, and in that circumstance an
hypothesis test is clearly appropriate. But sometimes there isn't an
obvious value to specify for mu (or sigma, or rho, or beta, or ...), and
then one might be interested in knowing what (potential) values of mu
(or whatever) would be consistent with the data in hand.

Is this any help? -- Don.
It certainly stimulates the aging brain cells....

Regards,
Alan

-----------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816
[was: 184 Nashua Road, Bedford, NH 03110 (603) 471-7128]

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to