On 05/23/2012 10:55 AM, Marc Taylor wrote:
Hi Jari - one more question if you don't mind. Since the weights of the PCs
are related to the the amount of variance that they explain in the original
data - is it problematic to predict the PC scores with a second data set
that has a different amount of variance (e.g. due to differing number of
samples)? In both the 1st and 2nd data sets I have been using scaled values
for the variables (mean=0 and sd=1 for each sample).
Cheers,
Marc
I'll pretend to be Jari for a moment. :-)

PCA just scales and rotates the data in cunning ways, so with the new data you need to scale and rotate it in the same way. If you scale the values first then you've already changed the scaling.

What you need to do is either do PCA on the raw data or scale the new data using the mean and varianes of the old data.

library(MASS)

NVar=5; NObs=50
Sigma=matrix(c(
 10,0.2,   0, 0,0.4,
0.2,   5,0.1, 0,0.6,
   0,0.1,1.0, 0.2,0,
   0,   0,0.2, 5, 0,
0.4,0.6,  0, 0,1), nrow=5)

# simulate data
Data=mvrnorm(NObs, rnorm(NVar), Sigma=Sigma)
# Do PCA on scaled data
Data.Sc=scale(Data)
PC=princomp(Data.Sc)

# Simulate new data
NewData=mvrnorm(10, rnorm(NVar), Sigma=Sigma)
# Do PCA on new data. First do it wrong...
PC.wrong=predict(PC, newdata=scale(NewData))

# Now scale correctly

NewData.Sc=scale(NewData, center=attr(Data.Sc, "scaled:center"), scale=attr(Data.Sc, "scaled:scale")
PC.right=predict(PC, newdata=NewData.Sc)

HTH

Bob

--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to