[EMAIL PROTECTED] (Sangdon Lee) wrote in message news:<[EMAIL PROTECTED]>... > [...] > I remember but could not find references showing the relationship > between the Mahalanobis distance and principal component analysis. I > appreciate if anybody explain or give references. > > Also, I'm wondering what is the right way of clustering observations > when variables are highly collinear? > 1) Run PCA and use all of principal components for cluster analysis > 2) Use the Mahalanobis distance. > [...]
If d is a column vector of differences between two cases on several variables, then the Euclidean distance between the cases is sqrt(d'd), and the Mahalanobis distance between the cases is sqrt(d'Ad), where ' denotes transposition, and A is the inverse of the covariance matrix of the variables. If you keep all the components from a PCA -- I'm referring to unit-variance components, not the kind whose variance is an eigenvalue -- then the Euclidean distances between cases in the component space will equal their Mahalanobis distances in the observed-variable space. However, you may want to drop components whose eigenvalues are small, because variability on those components is likely to be mostly error. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
