Hi. I have been looking at the PCAproj function in package pcaPP (R 2.4.1) for robust principal components, and I'm trying to interpret the results. I started with a data matrix of dimensions RxC (R is the number of rows / observations, C the number of columns / variables). PCAproj returns a list of class princomp, similar to the output of the function princomp. In a case where I can run princomp, I would get the following, from executing dmpca = princomp(datamatrix) : - the vector, sdev, of length C, contains the standard deviations of the components in order by descending value; the squares are the eigenvalues of the covariance matrix - the matrix, loadings, has dimension CxC, and the columns are the eigenvectors of the covariance matrix, in the same order as the sdev vector; the columns are orthonormal: sum(dmpca$loadings[,i]*dmpca$loadings[,j]) = 1 if i == j, ~ 0 if i != j - the vector, center, of length C, contains the means of the variable columns in the original data matrix, in the same order as the original columns - the vector, scale, of length C, contains the scalings applied to each variable, in the same order as the original columns - n.obs contains the number of observations used in the computation; this number equals R when there is no missing data - the matrix, scores, has dimension RxC, and it can be thought of as the projection of the eigenvector matrix, loadings, back onto the original data; these columns of scores are the principal components. princomp typically removes the mean, so the formula is: dmpca$scores = t(t(datamatrix) - dmpca$center)%*%dmpca$loadings and apply(dmpca$scores,2,mean) returns a length C vector of (effectively) zeroes; also the principal components (columns of scores) are orthogonal (but not orthonormal): sum(dmpca$scores[,i]*dmpca$scores[,j]) ~ 0 if i != j, > 0 if i == j - call contains the function call, in this case princomp(x = datamatrix)
That is all as it should be. In my case R < C, which produces singular results for standard PCA, but robust methods, like PCAproj, are designed to handle this. Also, I had "de-meaned" the data beforehand, so apply(datamatrix,2,mean) produces a length C vector of (effectively) zeroes. I ran the following: dmpcaprj=PCAproj(datamatrix,k=4,CalcMethod="sphere",update=TRUE) to get the first four robust components. When I look at the princomp object returned as dmpcaprj, some of the results are just what I expect. For example, - dmpcaprj$loadings has dimensions Cx4, as expected, and the first four eigenvectors of the (robust) covariance matrix are orthonormal: sum(dmpcaprj$loadings[,i]*dmpcaprj$loadings[,j]) = 1 if i == j, ~ 0 if i != j - dmpcaprj$sdev contains the square roots of the four corresponding eigenvalues. - dmpcaprj$n.obs equals R. - dmpcaprj$scores has dimensions Rx4, as it should. HOWEVER, the columns of dmpcaprj$scores are neither de-meaned nor orthogonal. So, apply(dmpcaprj$scores,2,mean) is a non-zero vector, and sum(dmpcaprj$scores[,i]*dmpcaprj$scores[,j]) != 0 if i != j, > 0 if i == j ALSO, - dmpcaprj$scale is in this case a vector of all 1's, as expected. But the length is C, not R. - dmpcaprj$center is a vector of length C, not R, and the entries are not equal to either apply(datamatrix,1,mean) or apply(datamatrix,2,mean); I can't figure out where they came from. One interesting thing is that the columns of the Rx4 matrix, dmpcaprj$scores - datamatrix%*%dmpcaprj$loadings are all identically constant vectors, such that each row equals apply(dmpcaprj$scores,2,mean), since apply(datamatrix%*%dmpcaprj$loadings,2,mean) is a length four vector of (effectively) zeroes, but I can't interpret the values of these means of dmpcaprj$scores. Can anyone please explain to me what is happening with the scores, scale, and center parts of the PCAproj results? Thanks! -- TMK -- 212-460-5430 home 917-656-5351 cell ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.