Re: Inference about the slope

Donald Burrill Fri, 15 Nov 2002 04:42:40 -0800

On Thu, 14 Nov 2002, Jon Cryer wrote in part:

> Actually, if e is normal so is y (and vise [sic] versa)


This is true if the predictor x is distributed normally, but need not be
true (and in general would not be true) otherwise:  unless what you mean
by "y" here is "the conditional distribution of y for a given value of
x", which I suspect is not what most of our readers would take you to
have meant.

Counterexample:  let x have two values only, say values near the
extremes of the range of possible values of x (as Draper & Smith (1966,
page 18) point out, this is a desirable set of values IF the model is
correct and one wishes to minimize the variance of b_1).
 Suppose further that the distance between the mean of y for x_1 and the
mean of y for x_2 is large compared to sigma (= sqrt(var(e)).  Then the
distribution of y will be distinctly bimodal.

(Of course, the (conditional) distribution of the y's at x = x_1 is
normal, as is the distribution at x = x_2.  But all the values of y
taken together are distributed as a mixture of two normal distributions
with different means.  Such a distribution is not itself normal.)

> but the regression assumptions account for the changing mean of y
> with x.

Indeed;  but the assumptions you list below are not the basic
assumptions to be found in regression textbooks (see, e.g., Draper &
Smith (op.cit.), page 17):  "that, in the model Y_i = beta_0 + beta_1*x
+ e_i, i = 1,2,...,n,
  (1) e_i is a random variable with mean zero and variance sigma^2
(unknown), that is, E(e_i) = 0, V(e_i) = sigma^2;
  (2) e_i and e_j are uncorrelated for i <> j:  cov(e_i,e_j) = 0;
  (3) e_i is normally distributed.

Assumption (3) is not necessary for any of the preliminary algebra, of
course (this includes finding the values to be entered into the ANOVA
summary table), but it (or a logically equivalent distributional
assumption) is needed for inference (e.g., assigning a p-value to the
calculated F value in the ANOVA table, finding confidence intervals
around b_1 and b_0, etc.).

> The assumptions:
>
> 1) y's are normal
> 2) y has mean beta_0 + beta_1*x
> 3) y's have common variance
> 4) y' are independent
>
> are equivalent to the usual assumption about e's in the model y =
> beta_0 + beta_1*x + e.
>
> Jon Cryer

I believe this can be true only if by "y" you mean the conditional value
of y at specified x.  Your assumption 2), for example, cannot lead to a
single value for the mean of y unless it refers to a single value for x.
     -- DFB.
 -----------------------------------------------------------------------
 Donald F. Burrill                                            [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110                 (603) 626-0816
 [was:  184 Nashua Road, Bedford, NH 03110               (603) 471-7128]

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Inference about the slope

Reply via email to