Re: [R-sig-phylo] PGLS vs lm

2013-07-21 Thread Tom Schoenemann
Thanks Liam,

A couple of questions: 

How does one do a hypothesis test on a regression, controlling for phylogeny, 
if not using PGLS as I am doing?  I realize one could use independent 
contrasts, though I was led to believe that is equivalent to a PGLS with lambda 
= 1.  

I take it from what you wrote that the PGLS in caper does a ML of lambda only 
on y, when doing the regression? Isn't this patently wrong, biologically 
speaking? Phylogenetic effects could have been operating on both x and y - we 
can't assume that it would only be relevant to y. Shouldn't phylogenetic 
methods account for both?

You say you aren't sure it is a good idea to jointly optimize lambda for x & y. 
 Can you expand on this?  What would be a better solution (if there is one)?

Am I wrong that it makes no evolutionary biological sense to use a method that 
gives different estimates of the probability of a relationship based on the 
direction in which one looks at the relationship? Doesn't the fact that the 
method gives different answers in this way invalidate the method for taking 
phylogeny into account when assessing relationships among biological taxa?  How 
could it be biologically meaningful for phylogeny to have a greater influence 
when x is predicting y, than when y is predicting x?  Maybe I'm missing 
something here.

-Tom 


On Jul 21, 2013, at 8:59 PM, Liam J. Revell  wrote:

> Hi Tom.
> 
> Joe pointed out that if we assume that our variables are multivariate normal, 
> then a hypothesis test on the regression is the same as a test that cov(x,y) 
> is different from zero.
> 
> If you insist on using lambda, one logical extension to this might be to 
> jointly optimize lambda for x & y (following Freckleton et al. 2002) and then 
> fix the value of lambda at its joint MLE during GLS. This would at least have 
> the property of guaranteeing that the P-values for y~x and x~y are the 
> same
> 
> I previously posted code for joint estimation of lambda on my blog here: 
> http://blog.phytools.org/2012/09/joint-estimation-of-pagels-for-multiple.html.
> 
> With this code to fit joint lambda, our analysis would then look something 
> like this:
> 
> require(phytools)
> require(nlme)
> lambda<-joint.lambda(tree,cbind(x,y))$lambda
> fit1<-gls(y~x,data=data.frame(x,y),correlation=corPagel(lambda,tree,fixed=TRUE))
> fit2<-gls(x~y,data=data.frame(x,y),correlation=corPagel(lambda,tree,fixed=TRUE))
> 
> I'm not sure that this is a good idea - but it is possible
> 
> - Liam
> 
> Liam J. Revell, Assistant Professor of Biology
> University of Massachusetts Boston
> web: http://faculty.umb.edu/liam.revell/
> email: liam.rev...@umb.edu
> blog: http://blog.phytools.org
> 
> On 7/21/2013 6:15 PM, Tom Schoenemann wrote:
>> Hi all,
>> 
>> I'm still unsure of how I should interpret results given that using PGLS
>> to predict group size from brain size gives different significance
>> levels and lambda estimates than when I do the reverse (i.e., predict
>> brain size from group size).  Biologically, I don't think this makes any
>> sense.  If lambda is an estimate of the phylogenetic signal, what
>> possible evolutionary and biological sense are we to make if the
>> estimates of lambda are significantly different depending on which way
>> the association is assessed? I understand the mathematics may allow
>> this, but if I can't make sense of this biologically, then doesn't it
>> call into question the use of this method for these kinds of questions
>> in the first place?  What am I missing here?
>> 
>> Here is some results from data I have that illustrate this (notice that
>> the lambda values are significantly different from each other):
>> 
>> Group size predicted by brain size:
>> 
>>> model.group.by.brain<-pgls(log(GroupSize) ~ log(AvgBrainWt), data = 
>>> primate_tom, lambda='ML')
>>> summary(model.group.by.brain)
>> 
>> Call:
>> pgls(formula = log(GroupSize) ~ log(AvgBrainWt), data = primate_tom,
>> lambda = "ML")
>> 
>> Residuals:
>>  Min   1Q   Median   3Q  Max
>> -0.27196 -0.07638  0.00399  0.10107  0.43852
>> 
>> Branch length transformations:
>> 
>> kappa  [Fix]  : 1.000
>> lambda [ ML]  : 0.759
>>lower bound : 0.000, p = 4.6524e-08
>>upper bound : 1.000, p = 2.5566e-10
>>95.0% CI   : (0.485, 0.904)
>> delta  [Fix]  : 1.000
>> 
>> Coefficients:
>>  Estimate Std. Error t value Pr(>|t|)
>> (Intercept) -0.080099   0.610151 -0.1313 0.895825
>> log(AvgBrainWt)  0.483366   0.136694  3.5361 0.000622 ***
>> ---
>> Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � 
>> � 1
>> 
>> Residual standard error: 0.1433 on 98 degrees of freedom
>> Multiple R-squared: 0.1132, Adjusted R-squared: 0.1041
>> F-statistic:  12.5 on 2 and 98 DF,  p-value: 1.457e-05
>> 
>> 
>> Brain size predicted by group size:
>> 
>>> model.brain.by.group<-pgls(log(AvgBrainWt) ~ log(GroupSize), data = 
>>> primate_tom, lambda='ML')
>>> summary(model.brain.by.group)
>> 
>> Call:

Re: [R-sig-phylo] PGLS vs lm

2013-07-21 Thread Liam J. Revell

Hi Tom.

Joe pointed out that if we assume that our variables are multivariate 
normal, then a hypothesis test on the regression is the same as a test 
that cov(x,y) is different from zero.


If you insist on using lambda, one logical extension to this might be to 
jointly optimize lambda for x & y (following Freckleton et al. 2002) and 
then fix the value of lambda at its joint MLE during GLS. This would at 
least have the property of guaranteeing that the P-values for y~x and 
x~y are the same


I previously posted code for joint estimation of lambda on my blog here: 
http://blog.phytools.org/2012/09/joint-estimation-of-pagels-for-multiple.html.


With this code to fit joint lambda, our analysis would then look 
something like this:


require(phytools)
require(nlme)
lambda<-joint.lambda(tree,cbind(x,y))$lambda
fit1<-gls(y~x,data=data.frame(x,y),correlation=corPagel(lambda,tree,fixed=TRUE))
fit2<-gls(x~y,data=data.frame(x,y),correlation=corPagel(lambda,tree,fixed=TRUE))

I'm not sure that this is a good idea - but it is possible

- Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 7/21/2013 6:15 PM, Tom Schoenemann wrote:

Hi all,

I'm still unsure of how I should interpret results given that using PGLS
to predict group size from brain size gives different significance
levels and lambda estimates than when I do the reverse (i.e., predict
brain size from group size).  Biologically, I don't think this makes any
sense.  If lambda is an estimate of the phylogenetic signal, what
possible evolutionary and biological sense are we to make if the
estimates of lambda are significantly different depending on which way
the association is assessed? I understand the mathematics may allow
this, but if I can't make sense of this biologically, then doesn't it
call into question the use of this method for these kinds of questions
in the first place?  What am I missing here?

Here is some results from data I have that illustrate this (notice that
the lambda values are significantly different from each other):

Group size predicted by brain size:


model.group.by.brain<-pgls(log(GroupSize) ~ log(AvgBrainWt), data = 
primate_tom, lambda='ML')
summary(model.group.by.brain)


Call:
pgls(formula = log(GroupSize) ~ log(AvgBrainWt), data = primate_tom,
 lambda = "ML")

Residuals:
  Min   1Q   Median   3Q  Max
-0.27196 -0.07638  0.00399  0.10107  0.43852

Branch length transformations:

kappa  [Fix]  : 1.000
lambda [ ML]  : 0.759
lower bound : 0.000, p = 4.6524e-08
upper bound : 1.000, p = 2.5566e-10
95.0% CI   : (0.485, 0.904)
delta  [Fix]  : 1.000

Coefficients:
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.080099   0.610151 -0.1313 0.895825
log(AvgBrainWt)  0.483366   0.136694  3.5361 0.000622 ***
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

Residual standard error: 0.1433 on 98 degrees of freedom
Multiple R-squared: 0.1132, Adjusted R-squared: 0.1041
F-statistic:  12.5 on 2 and 98 DF,  p-value: 1.457e-05


Brain size predicted by group size:


model.brain.by.group<-pgls(log(AvgBrainWt) ~ log(GroupSize), data = 
primate_tom, lambda='ML')
summary(model.brain.by.group)


Call:
pgls(formula = log(AvgBrainWt) ~ log(GroupSize), data = primate_tom,
 lambda = "ML")

Residuals:
  Min   1Q   Median   3Q  Max
-0.38359 -0.08216  0.00902  0.05609  0.27443

Branch length transformations:

kappa  [Fix]  : 1.000
lambda [ ML]  : 1.000
lower bound : 0.000, p = < 2.22e-16
upper bound : 1.000, p = 1
95.0% CI   : (0.992, NA)
delta  [Fix]  : 1.000

Coefficients:
Estimate Std. Error t value  Pr(>|t|)
(Intercept)2.740932   0.446943  6.1326 1.824e-08 ***
log(GroupSize) 0.050780   0.043363  1.17100.2444
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

Residual standard error: 0.122 on 98 degrees of freedom
Multiple R-squared: 0.0138, Adjusted R-squared: 0.003737
F-statistic: 1.371 on 2 and 98 DF,  p-value: 0.2586


On Jul 14, 2013, at 6:18 AM, Emmanuel Paradis 
wrote:


Hi all,

I would like to react a bit on this issue.

Probably one problem is that the distinction "correlation vs. regression" is 
not the same for independent data and for phylogenetic data.

Consider the case of independent observations first. Suppose we are interested 
in the relationship y = b x + a, where x is an environmental variable, say 
latitude. We can get estimates of b and a by moving to 10 well-chosen 
locations, sampling 10 observations  of y (they are independent) and analyse 
the 100 data points with OLS.

Here we cannot say anything about the correlation between x and y
because we controlled the distribution of x. In practice, even if x is
not controlled, this approach is still valid as long as the observations
are independent.


With phylogenetic data, x is n

Re: [R-sig-phylo] PGLS vs lm

2013-07-21 Thread Tom Schoenemann
Hi all,

I'm still unsure of how I should interpret results given that using PGLS to 
predict group size from brain size gives different significance levels and 
lambda estimates than when I do the reverse (i.e., predict brain size from 
group size).  Biologically, I don't think this makes any sense.  If lambda is 
an estimate of the phylogenetic signal, what possible evolutionary and 
biological sense are we to make if the estimates of lambda are significantly 
different depending on which way the association is assessed? I understand the 
mathematics may allow this, but if I can't make sense of this biologically, 
then doesn't it call into question the use of this method for these kinds of 
questions in the first place?  What am I missing here?

Here is some results from data I have that illustrate this (notice that the 
lambda values are significantly different from each other):

Group size predicted by brain size:

> model.group.by.brain<-pgls(log(GroupSize) ~ log(AvgBrainWt), data = 
> primate_tom, lambda='ML')
> summary(model.group.by.brain)

Call:
pgls(formula = log(GroupSize) ~ log(AvgBrainWt), data = primate_tom, 
lambda = "ML")

Residuals:
 Min   1Q   Median   3Q  Max 
-0.27196 -0.07638  0.00399  0.10107  0.43852 

Branch length transformations:

kappa  [Fix]  : 1.000
lambda [ ML]  : 0.759
   lower bound : 0.000, p = 4.6524e-08
   upper bound : 1.000, p = 2.5566e-10
   95.0% CI   : (0.485, 0.904)
delta  [Fix]  : 1.000

Coefficients:
 Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.080099   0.610151 -0.1313 0.895825
log(AvgBrainWt)  0.483366   0.136694  3.5361 0.000622 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1433 on 98 degrees of freedom
Multiple R-squared: 0.1132, Adjusted R-squared: 0.1041 
F-statistic:  12.5 on 2 and 98 DF,  p-value: 1.457e-05 


Brain size predicted by group size:

> model.brain.by.group<-pgls(log(AvgBrainWt) ~ log(GroupSize), data = 
> primate_tom, lambda='ML')
> summary(model.brain.by.group)

Call:
pgls(formula = log(AvgBrainWt) ~ log(GroupSize), data = primate_tom, 
lambda = "ML")

Residuals:
 Min   1Q   Median   3Q  Max 
-0.38359 -0.08216  0.00902  0.05609  0.27443 

Branch length transformations:

kappa  [Fix]  : 1.000
lambda [ ML]  : 1.000
   lower bound : 0.000, p = < 2.22e-16
   upper bound : 1.000, p = 1
   95.0% CI   : (0.992, NA)
delta  [Fix]  : 1.000

Coefficients:
   Estimate Std. Error t value  Pr(>|t|)
(Intercept)2.740932   0.446943  6.1326 1.824e-08 ***
log(GroupSize) 0.050780   0.043363  1.17100.2444
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.122 on 98 degrees of freedom
Multiple R-squared: 0.0138, Adjusted R-squared: 0.003737 
F-statistic: 1.371 on 2 and 98 DF,  p-value: 0.2586


On Jul 14, 2013, at 6:18 AM, Emmanuel Paradis  wrote:

> Hi all,
> 
> I would like to react a bit on this issue.
> 
> Probably one problem is that the distinction "correlation vs. regression" is 
> not the same for independent data and for phylogenetic data.
> 
> Consider the case of independent observations first. Suppose we are 
> interested in the relationship y = b x + a, where x is an environmental 
> variable, say latitude. We can get estimates of b and a by moving to 10 
> well-chosen locations, sampling 10 observations of y (they are independent) 
> and analyse the 100 data points with OLS. Here we cannot say anything about 
> the correlation between x and y because we controlled the distribution of x. 
> In practice, even if x is not controlled, this approach is still valid as 
> long as the observations are independent.
> 
> With phylogenetic data, x is not controlled if it is measured "on the 
> species" -- in other words it's an evolving trait (or intrinsic variable). x 
> may be controlled if it is measured "outside the species" (extrinsic 
> variable) such as latitude. So the case of using regression or correlation is 
> not the same than above. Combining intrinsic and extinsic variables has 
> generated a lot of debate in the literature.
> 
> I don't think it's a problem of using a method and not another, but rather to 
> use a method keeping in mind what it does (and its assumptions). Apparently, 
> Hansen and Bartoszek consider a range of models including regression models 
> where, by contrast to GLS, the evolution of the predictors is modelled 
> explicitly.
> 
> If we want to progress in our knowledge on how evolution works, I think we 
> have to not limit ourselves to assess whether there is a relationship, but to 
> test more complex models. The case presented by Tom is particularly relevant 
> here (at least to me): testing whether group size affects brain size or the 
> opposite (or both) is an important question. There's been also a lot of 
> debate whether comparative data can answer this question. Maybe what we need 
> here is an approach