Re: Correlation

Donald F. Burrill Thu, 18 May 2000 10:56:25 -0700
On Wed, 17 May 2000, mbattagl wrote in part:
 
> The regression analysis is also somewhat confusing.  Regression analysis
> is based on the fact that the Y (dependent variable) is random and the X
> (independent variable) is fixed with no error. 

Not so much "on the fact that ..." as "on assigning all random and 
measurement error to the measurement of Y".  The alleged "fact" is not 
always a fact...

> For my case, both X and Y are random and have some measurement error. 
> Is it correct to use simple linear regression for this analysis or is 
> there another type of analysis to obtain predictions? 

It is not really INcorrect;  there exist alternatives that may be more 
appropriate, depending.  How large is the measurement error in X?  If 
measurement error is small compared to the distance between adjacent 
values in X, use regression analysis without qualms.  (In most designed 
experiments, the nominal values of X are deliberately chosen to be fairly 
widely spaced, partly so that one may assume that the measurement error 
in X, if not zero, is at any rate negligible in context.  There is then 
no particular advantage to be had in using analytical methods that work 
with a random-errors-in-X model.)

If measurement error is large (compared to adjacent distances), two 
approaches are possible:
 (1)  Divide the data (visualized on a scatterplot) into vertical slices 
(that is, segments of nearly-constant X).  Replace the observed X values 
with a single nominal value for each slice, possibly (but not 
necessarily) the center value for the slice.  This will introduce some 
random error into the X values, but the resulting standard error (of the 
combination of random & measurement error) for the mean of the  n_j  
cases (in slice  j ) may now be small compared to the difference between 
adjacent nominal values of X.  (This depends of course on how many cases 
there are in a slice.)  If you have a LOT of data, it may even be 
sensible to discard slices that are sparsely populated.  If this 
procedure works, you're back in Plan A above (so to speak) and ordinary 
regression is appropriate.

 (2)  Otherwise, an errors-in-variables regression may be called for, of 
the kind that simultaneously deals with uncertainty in Y and uncertainty 
in X.  All such approaches suffer from a common problem:  one must decide 
(or let the program decide ! ) how to weigh the deviations in Y and the 
deviations in X.  For problems where Y and X are in the same units, it 
may be reasonable to weigh the two deviations equally in generating sums 
of squares (or the equivalent of SS).  But if Y and X are in different 
units, the solution one obtains depends on the units in which one chose 
to measure the variables, and what counts as "equal weighting" is VERY 
poorly defined.  (Consider Y in pounds-mass and X in inches;  now think 
of pounds & feet;  now think of kilograms & cm;  ...  See?  A 
least-squares solution cannot be invariant with respect to changes in 
scale [i.e., changes in unit of measurement] -- unless, of course, the 
same change in scale is imposed on both variables.  This disadvantage 
alone may be enough to drive one to ordinary regression [or to drink, or 
both] as one contemplates explaining a set of results to a client.)
        NOTE, by the way:  standardizing both variables (to, say, zero 
mean and unit variance) may seem like a way out of this impasse;  but all 
THAT does is to specify a certain change of scale in each variable, and 
to specify it in a way that may not be reproducible in subsequently 
observed samples, depending as it does on the sample means and variances 
in the present sample.

I hope this has not been too confusing;  but you did ask!
                                                        -- DFB.
 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-471-7128  



===========================================================================
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================
Re: Correlation

Reply via email to