Re: [R-sig-eco] PCA as a predictive model

Marc Taylor Wed, 23 May 2012 03:43:51 -0700

Thank you Bob - you and Jari seem to be of consensus here :-)

I will have to double check that what I am doing really gives the same
result as predict.prcomp. My problem is that I have set up my PCA in a
slightly different way than you have - or for that matter, different from
many of the ordination examples in R -

My matrix columns are the "samples" and the rows are the "variables" - just
the transpose of what is typical. So I have been calculating the PC
loadings based on the sample covariances and not the variable covariances.
If you don't mind my excursion from ecology, I'll explain that my data
consists of measured light spectra (350-800nm) with a value at each nm.
Thus my matrix consists of 451 rows and each column is a sample. I have
been applying my scaling to the samples and not the variables. So, the
length of my centers is the length of the number of samples. I thus run
into problems when I want to predict from a newdata that contains a
different number of samples. In the end, I think I am getting reasonable
predictions by doing this in the way that I described here:
http://stats.stackexchange.com/questions/28916/can-empirical-orthogonal-function-eof-analysis-be-used-as-a-predictive-model/28986#28986
It would, however, be comforting to reproduce this prediction in the
standard way... I'll keep experimenting.
Cheers, Marc

On Wed, May 23, 2012 at 11:15 AM, Bob O'Hara <boh...@senckenberg.de> wrote:

> On 05/23/2012 10:55 AM, Marc Taylor wrote: -
>
>> Hi Jari - one more question if you don't mind. Since the weights of the
>> PCs
>> are related to the the amount of variance that they explain in the
>> original
>> data - is it problematic to predict the PC scores with a second data set
>> that has a different amount of variance (e.g. due to differing number of
>> samples)? In both the 1st and 2nd data sets I have been using scaled
>> values
>> for the variables (mean=0 and sd=1 for each sample).
>> Cheers,
>> Marc
>>
> I'll pretend to be Jari for a moment. :-)
>
> PCA just scales and rotates the data in cunning ways, so with the new data
> you need to scale and rotate it in the same way. If you scale the values
> first then you've already changed the scaling.
>
> What you need to do is either do PCA on the raw data or scale the new data
> using the mean and varianes of the old data.
>
> library(MASS)
>
> NVar=5; NObs=50
> Sigma=matrix(c(
>  10,0.2,   0, 0,0.4,
> 0.2,   5,0.1, 0,0.6,
>   0,0.1,1.0, 0.2,0,
>   0,   0,0.2, 5, 0,
> 0.4,0.6,  0, 0,1), nrow=5)
>
> # simulate data
> Data=mvrnorm(NObs, rnorm(NVar), Sigma=Sigma)
> # Do PCA on scaled data
> Data.Sc=scale(Data)
> PC=princomp(Data.Sc)
>
> # Simulate new data
> NewData=mvrnorm(10, rnorm(NVar), Sigma=Sigma)
> # Do PCA on new data. First do it wrong...
> PC.wrong=predict(PC, newdata=scale(NewData))
>
> # Now scale correctly
>
> NewData.Sc=scale(NewData, center=attr(Data.Sc, "scaled:center"),
> scale=attr(Data.Sc, "scaled:scale")
> PC.right=predict(PC, newdata=NewData.Sc)
>
> HTH
>
> Bob
>
> --
>
> Bob O'Hara
>
> Biodiversity and Climate Research Centre
> Senckenberganlage 25
> D-60325 Frankfurt am Main,
> Germany
>
> Tel: +49 69 798 40226
> Mobile: +49 1515 888 5440
> WWW:   
> http://www.bik-f.de/root/**index.php?page_id=219<http://www.bik-f.de/root/index.php?page_id=219>
> Blog: http://blogs.nature.com/boboh
> Journal of Negative Results - EEB: www.jnr-eeb.org
>
>
> ______________________________**_________________
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/**listinfo/r-sig-ecology<https://stat.ethz.ch/mailman/listinfo/r-sig-ecology>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] PCA as a predictive model

Reply via email to