Dear R-sig-ecology Listers -

I am interested in doing a partial svd decomposition of a covariance matrix
using the package "irlba". Contrary to the svd function, the irlba method
can be set up to calculate the a smaller number of singular values and
vectors (e.g. principle components, "PC"s). I am interested in using this
method on a very large covariance matrix where a full svd decomposition
takes an agonizingly long time to compute. Furthermore, the "significant"
PCs are usually only the leading ones, and thus there is a lot of
computation effort lost on calculated PCs of remaining "noise". My problem
is that the full svd decomposition gives information about the importance
of each singular value - e.g. the explained variance of each principle
component is easily calculated as it's singular value divided by the sum of
all singular values.

So, using the irlba method, I would not have the full vector of singular
values needed to calculate their sum, but rather a truncated vector to a
defined number. For a covariance matrix (C) based on a single matrix, I can
calculate the total variance of C beforehand using several statistics (see
example below). However, for a covariance matrix calculated from two
different matrices, I am unable to use the same procedure. Thus, my
question is - is anyone aware of a solution to this? Is it even possible to
estimate what the sum of the singular values would be on such a matrix
without doing a full svd?

Many thanks in advance for your help. I have attached below a brief example
of my solution to the single matrix version.

Cheers,
Marc

###########################
set.seed(1)
m <- 50
n <- 20
X <- matrix(rnorm(m*n),m,n)
Y <- matrix(rnorm(m*n),m,n)
dim(X); dim(Y)

S1 <- svd(cov(X))
#a plot of the explained variance of each singular value
plot(S1$d/sum(S1$d)*100,
    ylab="% explained variance of each singular value",
    xlab="singular value #"
)
#The variance of the matrix X can be calculated as
#the sum of the column variances or the diagonal of
#the covariance matrix. Both agree with the sum of
#the singular values in the svd decomposition (S$d).
sum(S1$d); sum(apply(X,2,var)); sum(diag(cov(X))) # all give the total
variance


S2 <- svd(cov(X,Y))
#The same calculation using the diagonal of the
#covariance matrix cov(X,Y) does not agree (i.e.
#since the matrices are random, their mean covariance
#is zero)
sum(S2$d); sum(diag(cov(X,Y))) # no longer agree

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to