On 05/23/2012 10:55 AM, Marc Taylor wrote:
Hi Jari - one more question if you don't mind. Since the weights of the PCs
are related to the the amount of variance that they explain in the original
data - is it problematic to predict the PC scores with a second data set
that has a different amount of variance (e.g. due to differing number of
samples)? In both the 1st and 2nd data sets I have been using scaled values
for the variables (mean=0 and sd=1 for each sample).
Cheers,
Marc
I'll pretend to be Jari for a moment. :-)
PCA just scales and rotates the data in cunning ways, so with the new
data you need to scale and rotate it in the same way. If you scale the
values first then you've already changed the scaling.
What you need to do is either do PCA on the raw data or scale the new
data using the mean and varianes of the old data.
library(MASS)
NVar=5; NObs=50
Sigma=matrix(c(
10,0.2, 0, 0,0.4,
0.2, 5,0.1, 0,0.6,
0,0.1,1.0, 0.2,0,
0, 0,0.2, 5, 0,
0.4,0.6, 0, 0,1), nrow=5)
# simulate data
Data=mvrnorm(NObs, rnorm(NVar), Sigma=Sigma)
# Do PCA on scaled data
Data.Sc=scale(Data)
PC=princomp(Data.Sc)
# Simulate new data
NewData=mvrnorm(10, rnorm(NVar), Sigma=Sigma)
# Do PCA on new data. First do it wrong...
PC.wrong=predict(PC, newdata=scale(NewData))
# Now scale correctly
NewData.Sc=scale(NewData, center=attr(Data.Sc, "scaled:center"),
scale=attr(Data.Sc, "scaled:scale")
PC.right=predict(PC, newdata=NewData.Sc)
HTH
Bob
--
Bob O'Hara
Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany
Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW: http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org
_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology