Re: [R-sig-eco] PCA as a predictive model

2012-05-23 Thread Marc Taylor
Thank you Bob - you and Jari seem to be of consensus here :-)

I will have to double check that what I am doing really gives the same
result as predict.prcomp. My problem is that I have set up my PCA in a
slightly different way than you have - or for that matter, different from
many of the ordination examples in R -

My matrix columns are the "samples" and the rows are the "variables" - just
the transpose of what is typical. So I have been calculating the PC
loadings based on the sample covariances and not the variable covariances.
If you don't mind my excursion from ecology, I'll explain that my data
consists of measured light spectra (350-800nm) with a value at each nm.
Thus my matrix consists of 451 rows and each column is a sample. I have
been applying my scaling to the samples and not the variables. So, the
length of my centers is the length of the number of samples. I thus run
into problems when I want to predict from a newdata that contains a
different number of samples. In the end, I think I am getting reasonable
predictions by doing this in the way that I described here:
http://stats.stackexchange.com/questions/28916/can-empirical-orthogonal-function-eof-analysis-be-used-as-a-predictive-model/28986#28986
It would, however, be comforting to reproduce this prediction in the
standard way... I'll keep experimenting.
Cheers, Marc


On Wed, May 23, 2012 at 11:15 AM, Bob O'Hara  wrote:

> On 05/23/2012 10:55 AM, Marc Taylor wrote: -
>
>> Hi Jari - one more question if you don't mind. Since the weights of the
>> PCs
>> are related to the the amount of variance that they explain in the
>> original
>> data - is it problematic to predict the PC scores with a second data set
>> that has a different amount of variance (e.g. due to differing number of
>> samples)? In both the 1st and 2nd data sets I have been using scaled
>> values
>> for the variables (mean=0 and sd=1 for each sample).
>> Cheers,
>> Marc
>>
> I'll pretend to be Jari for a moment. :-)
>
> PCA just scales and rotates the data in cunning ways, so with the new data
> you need to scale and rotate it in the same way. If you scale the values
> first then you've already changed the scaling.
>
> What you need to do is either do PCA on the raw data or scale the new data
> using the mean and varianes of the old data.
>
> library(MASS)
>
> NVar=5; NObs=50
> Sigma=matrix(c(
>  10,0.2,   0, 0,0.4,
> 0.2,   5,0.1, 0,0.6,
>   0,0.1,1.0, 0.2,0,
>   0,   0,0.2, 5, 0,
> 0.4,0.6,  0, 0,1), nrow=5)
>
> # simulate data
> Data=mvrnorm(NObs, rnorm(NVar), Sigma=Sigma)
> # Do PCA on scaled data
> Data.Sc=scale(Data)
> PC=princomp(Data.Sc)
>
> # Simulate new data
> NewData=mvrnorm(10, rnorm(NVar), Sigma=Sigma)
> # Do PCA on new data. First do it wrong...
> PC.wrong=predict(PC, newdata=scale(NewData))
>
> # Now scale correctly
>
> NewData.Sc=scale(NewData, center=attr(Data.Sc, "scaled:center"),
> scale=attr(Data.Sc, "scaled:scale")
> PC.right=predict(PC, newdata=NewData.Sc)
>
> HTH
>
> Bob
>
> --
>
> Bob O'Hara
>
> Biodiversity and Climate Research Centre
> Senckenberganlage 25
> D-60325 Frankfurt am Main,
> Germany
>
> Tel: +49 69 798 40226
> Mobile: +49 1515 888 5440
> WWW:   
> http://www.bik-f.de/root/**index.php?page_id=219
> Blog: http://blogs.nature.com/boboh
> Journal of Negative Results - EEB: www.jnr-eeb.org
>
>
> __**_
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/**listinfo/r-sig-ecology
>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] PCA as a predictive model

2012-05-23 Thread Bob O'Hara

On 05/23/2012 10:55 AM, Marc Taylor wrote:

Hi Jari - one more question if you don't mind. Since the weights of the PCs
are related to the the amount of variance that they explain in the original
data - is it problematic to predict the PC scores with a second data set
that has a different amount of variance (e.g. due to differing number of
samples)? In both the 1st and 2nd data sets I have been using scaled values
for the variables (mean=0 and sd=1 for each sample).
Cheers,
Marc

I'll pretend to be Jari for a moment. :-)

PCA just scales and rotates the data in cunning ways, so with the new 
data you need to scale and rotate it in the same way. If you scale the 
values first then you've already changed the scaling.


What you need to do is either do PCA on the raw data or scale the new 
data using the mean and varianes of the old data.


library(MASS)

NVar=5; NObs=50
Sigma=matrix(c(
 10,0.2,   0, 0,0.4,
0.2,   5,0.1, 0,0.6,
   0,0.1,1.0, 0.2,0,
   0,   0,0.2, 5, 0,
0.4,0.6,  0, 0,1), nrow=5)

# simulate data
Data=mvrnorm(NObs, rnorm(NVar), Sigma=Sigma)
# Do PCA on scaled data
Data.Sc=scale(Data)
PC=princomp(Data.Sc)

# Simulate new data
NewData=mvrnorm(10, rnorm(NVar), Sigma=Sigma)
# Do PCA on new data. First do it wrong...
PC.wrong=predict(PC, newdata=scale(NewData))

# Now scale correctly

NewData.Sc=scale(NewData, center=attr(Data.Sc, "scaled:center"), 
scale=attr(Data.Sc, "scaled:scale")

PC.right=predict(PC, newdata=NewData.Sc)

HTH

Bob

--

Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] PCA as a predictive model

2012-05-23 Thread Marc Taylor
Excellent - thanks Jari.
Cheers,
Marc

On Wed, May 23, 2012 at 11:05 AM, Jari Oksanen  wrote:

>  Marc,
>
> I see no danger. When you predict with 'newdata', that 'newdata' should
> have no influence on the original PCA. You can use predict(..., newdata)
> for cross validation.  If you look at the code of stats:::predict.prcomp,
> you see that the prediction is only (i) scaling and centring your new data
> with the means and scale of the original data, and (ii) rotation of these
> scaled values to the original PCs. This is the last line of
> stats:::predict.prcomp:
>
> scale(newdata, object$center, object$scale) %*% object$rotation
>
> The PCA result 'object' saves the original 'center' and 'scale' (which can
> be FALSE or SD of variables) of your original variables, and applies those
> to your 'newdata' before rotating to PCs.
>
> Cheers, Jari Oksanen
>  --
> *From:* Marc Taylor [marchtay...@gmail.com]
> *Sent:* 23 May 2012 11:55
> *To:* Jari Oksanen
> *Cc:* r-sig-ecology@r-project.org
> *Subject:* Re: [R-sig-eco] PCA as a predictive model
>
>  Hi Jari - one more question if you don't mind. Since the weights of the
> PCs are related to the the amount of variance that they explain in the
> original data - is it problematic to predict the PC scores with a second
> data set that has a different amount of variance (e.g. due to differing
> number of samples)? In both the 1st and 2nd data sets I have been using
> scaled values for the variables (mean=0 and sd=1 for each sample).
> Cheers,
> Marc
>
>
> On Wed, May 23, 2012 at 9:59 AM, Marc Taylor wrote:
>
>> Hi Jari,
>>
>> That's good to hear - I hadn't made the connection to cca/rda. This will
>> help me find pertinent literature as well.
>>
>> Many thanks,
>> Marc
>>
>>
>> On Wed, May 23, 2012 at 9:33 AM, Jari Oksanen wrote:
>>
>>> Marc,
>>>
>>> Basic R stats functions like prcomp have a predict method that can be
>>> used to "predict" (calculate) scores with 'newdata'. This is standard, and
>>> has been in R for ever. Most textbooks of multivariate analysis should
>>> handle this issue.
>>>
>>> In community ecological context, see functions predict.cca and
>>> predict.rda in vegan. These take argument 'newdata' which can be new
>>> community data -- depending on what you want to predict (argument 'type').
>>> Function calibrate.cca documented in the same help page can be used to
>>> predict values of constraining variables in constrained ordination (CCA,
>>> RDA) from community composition, also with 'newdata' communities.
>>>
>>> Cheers, Jari Oksanen
>>>
>>> 
>>> From: r-sig-ecology-boun...@r-project.org [
>>> r-sig-ecology-boun...@r-project.org] on behalf of Marc Taylor [
>>> marchtay...@gmail.com]
>>> Sent: 23 May 2012 10:19
>>> To: r-sig-ecology@r-project.org
>>> Subject: [R-sig-eco] PCA as a predictive model
>>>
>>> Hello R-sig-ecology group,
>>>
>>> I was wondering if anyone is aware of an example where PCA is used as a
>>> predictive model? A community analysis example might be to predict the PC
>>> values of a sample given its community composition. I had thrown this
>>> question up on a statistics forum (
>>>
>>> http://stats.stackexchange.com/questions/28916/can-empirical-orthogonal-function-eof-analysis-be-used-as-a-predictive-model
>>> )
>>> but have gotten hardly any response. I imagined that there are some folks
>>> here that would have some insight into this problem.
>>>
>>> Many thanks,
>>> Marc
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ___
>>> R-sig-ecology mailing list
>>> R-sig-ecology@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>
>>
>>
>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] PCA as a predictive model

2012-05-23 Thread Jari Oksanen
Marc,

I see no danger. When you predict with 'newdata', that 'newdata' should have no 
influence on the original PCA. You can use predict(..., newdata) for cross 
validation.  If you look at the code of stats:::predict.prcomp, you see that 
the prediction is only (i) scaling and centring your new data with the means 
and scale of the original data, and (ii) rotation of these scaled values to the 
original PCs. This is the last line of stats:::predict.prcomp:

scale(newdata, object$center, object$scale) %*% object$rotation

The PCA result 'object' saves the original 'center' and 'scale' (which can be 
FALSE or SD of variables) of your original variables, and applies those to your 
'newdata' before rotating to PCs.

Cheers, Jari Oksanen

From: Marc Taylor [marchtay...@gmail.com]
Sent: 23 May 2012 11:55
To: Jari Oksanen
Cc: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] PCA as a predictive model

Hi Jari - one more question if you don't mind. Since the weights of the PCs are 
related to the the amount of variance that they explain in the original data - 
is it problematic to predict the PC scores with a second data set that has a 
different amount of variance (e.g. due to differing number of samples)? In both 
the 1st and 2nd data sets I have been using scaled values for the variables 
(mean=0 and sd=1 for each sample).
Cheers,
Marc


On Wed, May 23, 2012 at 9:59 AM, Marc Taylor 
mailto:marchtay...@gmail.com>> wrote:
Hi Jari,

That's good to hear - I hadn't made the connection to cca/rda. This will help 
me find pertinent literature as well.

Many thanks,
Marc


On Wed, May 23, 2012 at 9:33 AM, Jari Oksanen 
mailto:jari.oksa...@oulu.fi>> wrote:
Marc,

Basic R stats functions like prcomp have a predict method that can be used to 
"predict" (calculate) scores with 'newdata'. This is standard, and has been in 
R for ever. Most textbooks of multivariate analysis should handle this issue.

In community ecological context, see functions predict.cca and predict.rda in 
vegan. These take argument 'newdata' which can be new community data -- 
depending on what you want to predict (argument 'type'). Function calibrate.cca 
documented in the same help page can be used to predict values of constraining 
variables in constrained ordination (CCA, RDA) from community composition, also 
with 'newdata' communities.

Cheers, Jari Oksanen


From: 
r-sig-ecology-boun...@r-project.org<mailto:r-sig-ecology-boun...@r-project.org> 
[r-sig-ecology-boun...@r-project.org<mailto:r-sig-ecology-boun...@r-project.org>]
 on behalf of Marc Taylor [marchtay...@gmail.com<mailto:marchtay...@gmail.com>]
Sent: 23 May 2012 10:19
To: r-sig-ecology@r-project.org<mailto:r-sig-ecology@r-project.org>
Subject: [R-sig-eco] PCA as a predictive model

Hello R-sig-ecology group,

I was wondering if anyone is aware of an example where PCA is used as a
predictive model? A community analysis example might be to predict the PC
values of a sample given its community composition. I had thrown this
question up on a statistics forum (
http://stats.stackexchange.com/questions/28916/can-empirical-orthogonal-function-eof-analysis-be-used-as-a-predictive-model)
but have gotten hardly any response. I imagined that there are some folks
here that would have some insight into this problem.

Many thanks,
Marc

   [[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org<mailto:R-sig-ecology@r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] PCA as a predictive model

2012-05-23 Thread Marc Taylor
Hi Jari - one more question if you don't mind. Since the weights of the PCs
are related to the the amount of variance that they explain in the original
data - is it problematic to predict the PC scores with a second data set
that has a different amount of variance (e.g. due to differing number of
samples)? In both the 1st and 2nd data sets I have been using scaled values
for the variables (mean=0 and sd=1 for each sample).
Cheers,
Marc


On Wed, May 23, 2012 at 9:59 AM, Marc Taylor  wrote:

> Hi Jari,
>
> That's good to hear - I hadn't made the connection to cca/rda. This will
> help me find pertinent literature as well.
>
> Many thanks,
> Marc
>
>
> On Wed, May 23, 2012 at 9:33 AM, Jari Oksanen wrote:
>
>> Marc,
>>
>> Basic R stats functions like prcomp have a predict method that can be
>> used to "predict" (calculate) scores with 'newdata'. This is standard, and
>> has been in R for ever. Most textbooks of multivariate analysis should
>> handle this issue.
>>
>> In community ecological context, see functions predict.cca and
>> predict.rda in vegan. These take argument 'newdata' which can be new
>> community data -- depending on what you want to predict (argument 'type').
>> Function calibrate.cca documented in the same help page can be used to
>> predict values of constraining variables in constrained ordination (CCA,
>> RDA) from community composition, also with 'newdata' communities.
>>
>> Cheers, Jari Oksanen
>>
>> 
>> From: r-sig-ecology-boun...@r-project.org [
>> r-sig-ecology-boun...@r-project.org] on behalf of Marc Taylor [
>> marchtay...@gmail.com]
>> Sent: 23 May 2012 10:19
>> To: r-sig-ecology@r-project.org
>> Subject: [R-sig-eco] PCA as a predictive model
>>
>> Hello R-sig-ecology group,
>>
>> I was wondering if anyone is aware of an example where PCA is used as a
>> predictive model? A community analysis example might be to predict the PC
>> values of a sample given its community composition. I had thrown this
>> question up on a statistics forum (
>>
>> http://stats.stackexchange.com/questions/28916/can-empirical-orthogonal-function-eof-analysis-be-used-as-a-predictive-model
>> )
>> but have gotten hardly any response. I imagined that there are some folks
>> here that would have some insight into this problem.
>>
>> Many thanks,
>> Marc
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> R-sig-ecology mailing list
>> R-sig-ecology@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
>
>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] PCA as a predictive model

2012-05-23 Thread Marc Taylor
Hi Jari,

That's good to hear - I hadn't made the connection to cca/rda. This will
help me find pertinent literature as well.

Many thanks,
Marc

On Wed, May 23, 2012 at 9:33 AM, Jari Oksanen  wrote:

> Marc,
>
> Basic R stats functions like prcomp have a predict method that can be used
> to "predict" (calculate) scores with 'newdata'. This is standard, and has
> been in R for ever. Most textbooks of multivariate analysis should handle
> this issue.
>
> In community ecological context, see functions predict.cca and predict.rda
> in vegan. These take argument 'newdata' which can be new community data --
> depending on what you want to predict (argument 'type'). Function
> calibrate.cca documented in the same help page can be used to predict
> values of constraining variables in constrained ordination (CCA, RDA) from
> community composition, also with 'newdata' communities.
>
> Cheers, Jari Oksanen
>
> 
> From: r-sig-ecology-boun...@r-project.org [
> r-sig-ecology-boun...@r-project.org] on behalf of Marc Taylor [
> marchtay...@gmail.com]
> Sent: 23 May 2012 10:19
> To: r-sig-ecology@r-project.org
> Subject: [R-sig-eco] PCA as a predictive model
>
> Hello R-sig-ecology group,
>
> I was wondering if anyone is aware of an example where PCA is used as a
> predictive model? A community analysis example might be to predict the PC
> values of a sample given its community composition. I had thrown this
> question up on a statistics forum (
>
> http://stats.stackexchange.com/questions/28916/can-empirical-orthogonal-function-eof-analysis-be-used-as-a-predictive-model
> )
> but have gotten hardly any response. I imagined that there are some folks
> here that would have some insight into this problem.
>
> Many thanks,
> Marc
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] PCA as a predictive model

2012-05-23 Thread Jari Oksanen
Marc,

Basic R stats functions like prcomp have a predict method that can be used to 
"predict" (calculate) scores with 'newdata'. This is standard, and has been in 
R for ever. Most textbooks of multivariate analysis should handle this issue. 

In community ecological context, see functions predict.cca and predict.rda in 
vegan. These take argument 'newdata' which can be new community data -- 
depending on what you want to predict (argument 'type'). Function calibrate.cca 
documented in the same help page can be used to predict values of constraining 
variables in constrained ordination (CCA, RDA) from community composition, also 
with 'newdata' communities.

Cheers, Jari Oksanen


From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] 
on behalf of Marc Taylor [marchtay...@gmail.com]
Sent: 23 May 2012 10:19
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] PCA as a predictive model

Hello R-sig-ecology group,

I was wondering if anyone is aware of an example where PCA is used as a
predictive model? A community analysis example might be to predict the PC
values of a sample given its community composition. I had thrown this
question up on a statistics forum (
http://stats.stackexchange.com/questions/28916/can-empirical-orthogonal-function-eof-analysis-be-used-as-a-predictive-model)
but have gotten hardly any response. I imagined that there are some folks
here that would have some insight into this problem.

Many thanks,
Marc

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] PCA as a predictive model

2012-05-23 Thread Marc Taylor
Hello R-sig-ecology group,

I was wondering if anyone is aware of an example where PCA is used as a
predictive model? A community analysis example might be to predict the PC
values of a sample given its community composition. I had thrown this
question up on a statistics forum (
http://stats.stackexchange.com/questions/28916/can-empirical-orthogonal-function-eof-analysis-be-used-as-a-predictive-model)
but have gotten hardly any response. I imagined that there are some folks
here that would have some insight into this problem.

Many thanks,
Marc

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology