Following from the R-help thread of March 22 on "Memory usage in prcomp",
I've started looking into adding an optional 'rank.' argument to prcomp allowing to more efficiently get only a few PCs instead of the full p PCs, say when p = 1000 and you know you only want 5 PCs. (https://stat.ethz.ch/pipermail/r-help/2016-March/437228.html As it was mentioned, we already have an optional 'tol' argument which allows *not* to choose all PCs. When I do that, say C <- chol(S <- toeplitz(.9 ^ (0:31))) # Cov.matrix and its root all.equal(S, crossprod(C)) set.seed(17) X <- matrix(rnorm(32000), 1000, 32) Z <- X %*% C ## ==> cov(Z) ~= C'C = S all.equal(cov(Z), S, tol = 0.08) pZ <- prcomp(Z, tol = 0.1) summary(pZ) # only ~14 PCs (out of 32) I get for the last line, the summary.prcomp(.) call : > summary(pZ) # only ~14 PCs (out of 32) Importance of components: PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 Standard deviation 3.6415 2.7178 1.8447 1.3943 1.10207 0.90922 0.76951 0.67490 Proportion of Variance 0.4352 0.2424 0.1117 0.0638 0.03986 0.02713 0.01943 0.01495 Cumulative Proportion 0.4352 0.6775 0.7892 0.8530 0.89288 0.92001 0.93944 0.95439 PC9 PC10 PC11 PC12 PC13 PC14 Standard deviation 0.60833 0.51638 0.49048 0.44452 0.40326 0.3904 Proportion of Variance 0.01214 0.00875 0.00789 0.00648 0.00534 0.0050 Cumulative Proportion 0.96653 0.97528 0.98318 0.98966 0.99500 1.0000 > which computes the *proportions* as if there were only 14 PCs in total (but there were 32 originally). I would think that the summary should or could in addition show the usual "proportion of variance explained" like result which does involve all 32 variances or std.dev.s ... which are returned from the svd() anyway, even in the case when I use my new 'rank.' argument which only returns a "few" PCs instead of all. Would you think the current summary() output is good enough or rather misleading? I think I would want to see (possibly in addition) proportions with respect to the full variance and not just to the variance of those few components selected. Opinions? Martin Maechler ETH Zurich ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel