On Tue, 2 May 2000, Alan McLean wrote:
> 'No collinearity' *means* the X variables are uncorrelated!
This is not my understanding. "Uncorrelated" means that the correlation
between two variables is zero, or that the intercorrelations among
several variables are all zero. "Not collinear" means that there is not
a linear dependency lurking among the variables (or some subset of them).
"Uncorrelated" is a much stronger condition than "not collinear".
> The basic OLS method assumes the variables are uncorrelated
> (as you say).
Not as presented in, e.g., Draper & Smith; who go to some trouble to
show how one can produce from a set of correlated variables a set of
orthogonal (= mutually uncorrelated) variables, and remark on the
advantages that accrue if the X-matrix is orthogonal. But it is clear
that they expect predictors to be correlated as a general rule.
> In practice there is usually some correlation, but the estimates are
> reasonably robust to this. If there is *substantial* collinearity you
> are in trouble.
If there is collinearity _at_all_ you are in trouble; further, if the
correlations among some of the predictors are high enough (= close enough
to unity), a computing system with finite precision may be unable to
detect the difference between a set of variables that are technically not
collinear but are highly correlated, and a set of variables that _are_
collinear. (E.g., X and X^4 are not collinear; but if the range of X
in the data is, say, 101 to 110, a plot of X^4 vs X will look very much
like a straight line.) For this reason various safety features are
usually built in to regression programs: variables whose tolerance value
with respect to the other predictors is lower than a certain threshold
(or whose variance inflation factor -- the reciprocal of tolerance -- is
above a corresponding threshold) are usually excluded from an analysis;
although it is often possible to override the system defaults if one
thinks it necessary. The existence of such defaults is clear evidence
that at least the persons responsible for system packages expected that
variables would often have substantial intercorrelations.
And if it were a requirement (= assumption) that predictors be
uncorrelated, it would not be necessary to worry about inverting a pxp
submatrix of predictors: the simple linear regression coefficient for
predicting Y from X_j alone would be unaffected by the presence of other
predictors in the model.
-- Don.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
MSC #29, Plymouth, NH 03264 603-535-2597
184 Nashua Road, Bedford, NH 03110 603-471-7128
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================