[EMAIL PROTECTED] (Sangdon Lee) wrote in message 
news:<[EMAIL PROTECTED]>...
> [...]
> I remember but could not find references showing the relationship
> between the Mahalanobis distance and principal component analysis.  I
> appreciate if anybody explain or give references.
> 
> Also, I'm wondering what is the right way of clustering observations
> when variables are highly collinear?
> 1) Run PCA and use all of principal components for cluster analysis
> 2) Use the Mahalanobis distance.  
> [...]

If d is a column vector of differences between two cases on several
variables, then the Euclidean distance between the cases is sqrt(d'd),
and the Mahalanobis distance between the cases is sqrt(d'Ad), where '
denotes transposition, and A is the inverse of the covariance matrix
of the variables.

If you keep all the components from a PCA -- I'm referring to unit-variance
components, not the kind whose variance is an eigenvalue -- then the
Euclidean distances between cases in the component space will equal
their Mahalanobis distances in the observed-variable space. However,
you may want to drop components whose eigenvalues are small, because
variability on those components is likely to be mostly error.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to