Hello all,
when discussing linear regression assumptions with a colleague, we
noticed that we were unable to explain WHY heteroscedasticity has
the well known ill effects on the estimators' properties. I know
WHAT the consequences are (loss of efficiency, tendency to
underestimate the standard errors) and I also know why these
consequences are undesirable. What I'm lacking is a substantial
understanding of HOW the presence of inhomogeneous error variances
increases the variability of the coefficients, and HOW the
estimation of the standard errors fails to reflect this.
I consulted a number of (obviously too basic) textbooks, all but
one only state the problems that arise from het.sc. The one that
isn't a total blank (Kmenta's Elements of Econometrics, 1986) tries
to give an intuitive explanation (along with a proof of the
inefficiency of the =DF estimators with het.sc.), but I don't fully
understand that.
Kmenta writes:
"The standard least squares principle involves minimizing
[equation: sum of squared errors], which means that each squared
disturbance is given equal weight. This is justifiable wehn each
disturbance comes from the same distribution. Under het.sc.,
however, different disturbances come from different distributions
with different variances. Clearly, those disturbances that come
from distributions with a smaller variance give more precise
information about the regression line than those coming from
distributions with a larger variance. To use sample information
efficiently, one should give more weight to the observations with
less dispersed disturbances than to those with more dispersed
disturbances." p. 272
I see that the conditional distributions of the disturbances
obviously differ if het.sc. is present (well, this is the
definition of het.sc., right?), and that, IF I want to compensate
for this, I can weight the data accordingly (Kmenta goes on to
explain WLS estimation). But firstly, I still don't see why
standard errors increased in the first place... And secondly, is it
really legitimate to claim that OLS is 'wrong', if it treats
differing conditional disturbances with equal weight?
Assume the simple case of increasing variances of Y with increasing
values of X, and therefore het.sc. present. With differing
precision of prediction for different X values, the standard error
(SE) of the regression coefficient (b) should become conditional on
the value of X, the higher X, the higher SE, with E(b) constant
over all values of X - correct? Then, isn't the standard error as
estimated by OLS implicitly an _average_ over all these conditional
SEs (just following intuition here)? How can we claim that the
specific SE at the X value with the lowest disturbance is the
'true' one? (Exception: het.sc. is due to uneven measurement error
for Y - I can see that the respective data points are less
reliable.)
Regarding the first question: Can this be answered at all without
the formal proof?
Thanks for your patience, MQ
--
________________________________________________________________
Markus Quandt
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================