Thursday, May 18, 2000
Subject: Re: Correlation


>  (2)  Otherwise, an errors-in-variables regression may be called for, of
> the kind that simultaneously deals with uncertainty in Y and uncertainty
> in X.  All such approaches suffer from a common problem:  one must decide
> (or let the program decide ! ) how to weigh the deviations in Y and the
> deviations in X.  For problems where Y and X are in the same units, it
> may be reasonable to weigh the two deviations equally in generating sums
> of squares (or the equivalent of SS).  But if Y and X are in different
> units, the solution one obtains depends on the units in which one chose
> to measure the variables, and what counts as "equal weighting" is VERY
> poorly defined.  (Consider Y in pounds-mass and X in inches;  now think
> of pounds & feet;  now think of kilograms & cm;  ...  See?  A
> least-squares solution cannot be invariant with respect to changes in
> scale [i.e., changes in unit of measurement] -- unless, of course, the
> same change in scale is imposed on both variables.  This disadvantage
> alone may be enough to drive one to ordinary regression [or to drink, or
> both] as one contemplates explaining a set of results to a client.)
> NOTE, by the way:  standardizing both variables (to, say, zero
> mean and unit variance) may seem like a way out of this impasse;  but all
> THAT does is to specify a certain change of scale in each variable, and
> to specify it in a way that may not be reproducible in subsequently
> observed samples, depending as it does on the sample means and variances
> in the present sample.
This is s a good comment here.

Another common term for this is "Orthangonal Regression". I would reccomend
Ruppert's article in the American Statistian, Feb 1996, vol 50, No. 1, p 1
as a good background to the problem of doing a regression with assumed
errors in X. As Ruppert points out, the problem is that the non-linear
aspects of the X-Y relationship confound the results, even if you can make
prior estimates that are reasonably valid such as the ratio of the Y error
variance to the X error variance.

If the X is a matrix, then one can use total least squares methods to obtain
estimates of the X variable errors. Total least squares methods are very
involved computationally (especially if it is a mixed model in which some of
the X variables have no error) and depend on certain characteristics of the
YX matrix in order to find a reasonable solution.It involves rank reduction
methods. A nice aspect of this method is that Y can be itself a matrix
representing different Y variables. The current problem is that the
statistics of the parameter solution are only asymptotically known for large
samples, and they do not converge uniformly over parameter space.


