On Mon, 15 May 2000, Eric Scharin wrote:
> I was confused by many of the responses to Mike's post... maybe because
> I'm not a statistician. But I'm guessing Mike isn't a statistician
> either, so maybe he is as confused as I.
>
> My (limited) understanding is that the homogeneity of variances
> requirement for a regression is in the response variable.
There are two misunderstandings here, I think. One is in the usage
"require", which makes it sound as though homogeneity of variance is a
*necessary* condition. This is not so: it is a *sufficient* condition,
and essentially what it suffices for is that the usual regression F
statistic follow Snedecor's F-distribution. (I am using "necessary" and
"sufficient" here in their logical/mathematical sense: one can prove a
theorem stating that, if the variance be homogeneous, the statistic is
properly F. One cannot prove the converse proposition: that if the
statistic follows Snedecor's F, the variance must be homogeneous.)
The second misunderstanding is that this property (homog. of var.)
applies to the response variable. It does not: it applies to the
residual variable, the "e" in the model y = a + bx + e
(or, more generally, y = f(x1, x2, ...) + e).
> Basically, when you do a least squares regression, the technique gives a
> "pooled" estimate for the variance in the response variable.
Not the response variable, but the residual.
> If the response variance changes across the range of the regressor
> variable, this pooled estimate is not appropriate.
"Not appropriate" for what purpose(s)? We may agree that under the
condition you specify the estimate is not constant across the range of
the predictor. This circumstance may or may not be a problem, depending
in part on what you want to use the residual mean square for after the
regression analysis, and in part on just how variable the variance
appears to be.
> There are many examples in my area of background (analytical chemistry)
> where the response variance is not constant, but the %CV ("coefficient
> of variation", also called the RSD or "relative standard deviation") is
> constant. In cases like this, the analyst often uses a transformation
> of the response variable (such as ln(Y) or Y^0.5) to make the variances
> homogeneous prior to the regression.
Yes, such transformations may be useful, especially if the transformed
variable has some theoretical meaning (as may rather more often be the
case in analytical chemistry than it is in, say, public policy...).
> As for tests for homogeneity of variance, ...
There are in general two problems with tests for homogeneity of variance.
(1) The tests may in fact be applied to the overall raw (or for that
matter the transformed) response variable, rather than to the residuals
representing the deviation of that variable from the regression model
(or, equivalently, to the conditional distributions of the response
variable given fixed values of the predictor; your comment below about
JMP requiring the predictor to be specified as a set of ordered
categories rather than as a continuous variable suggests that the tests
may be properly applied in that program).
(2) Many, if not most, such tests are rather more sensitive to
departures of the variables from normality (that is, from a Gaussian
distribution) than they are to heterogeneity of variance.
(When I was first studying statistics as a graduate student,
I was much impressed by the comment (whose source I cannot now recall)
that to use Bartlett's test before undertaking an analysis of variance
was rather like sending out a rowboat to see if the ocean was calm
enough for the Queen Mary to sail.)
> I know that in JMP (the low-end SAS program) there are several tests
> for homogeneity within the Fit Y by X - ANOVA platform. (To get this
> to work in JMP, you need to change the X variable from a continuous to
> an ordinal variable). The tests available include O'Brien,
> Brown-Forsythe, Levene, and Bartlett. I am unable to comment on the
> strengths and weaknesses of these tests, but would be interested in
> hearing what the more experienced list-members have to say with regards
> to them. (I am also interested to find out how off-the-mark my comments
> are!)
I hope this hasn't added to your confusion! If I have inadvertently
erred in any of the particulars above, doubtless someone on the list
will correct my error(s).
-- DFB.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
MSC #29, Plymouth, NH 03264 603-535-2597
184 Nashua Road, Bedford, NH 03110 603-471-7128
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================