Re: [R-sig-phylo] PGLS vs lm

Tom Schoenemann Fri, 26 Jul 2013 15:23:21 -0700

OK, so I haven't gotten any responses that convince me that PGLS isn't 
biologically suspect. At the risk of thinking out loud to myself here, I wonder 
if my finding might have to do with the method detecting phylogenetic signal in 
the error (residuals?):


From:
Revell, L. J. (2010). Phylogenetic signal and linear regression on species 
data. Methods in Ecology and Evolution, 1(4), 319-329.

I note the following: "...the suitability of a phylogenetic regression should 
actually be diagnosed by estimating phylogenetic signal in the residual 
deviations of Y given our predictors (X1, X2, etc.)."

Let's say one variable, "A", has a strong evolutionary signal, but the other, 
variable "B", does not. Would we expect this to affect a PGLS differently if we 
use A to predict B, vs. using B to predict A?  

If so, it would explain my findings. However, given the difference, I can have 
no confidence that there is, or is not, a significant covariance between A and 
B independent of phylogeny. Doesn't this finding call into question the method 
itself?

More directly, how is one to interpret such a finding? Is there, or is there 
not, a significant biological association?

-Tom


On Jul 21, 2013, at 11:47 PM, Tom Schoenemann <t...@indiana.edu> wrote:

> Thanks Liam,
> 
> A couple of questions: 
> 
> How does one do a hypothesis test on a regression, controlling for phylogeny, 
> if not using PGLS as I am doing?  I realize one could use independent 
> contrasts, though I was led to believe that is equivalent to a PGLS with 
> lambda = 1.  
> 
> I take it from what you wrote that the PGLS in caper does a ML of lambda only 
> on y, when doing the regression? Isn't this patently wrong, biologically 
> speaking? Phylogenetic effects could have been operating on both x and y - we 
> can't assume that it would only be relevant to y. Shouldn't phylogenetic 
> methods account for both?
> 
> You say you aren't sure it is a good idea to jointly optimize lambda for x & 
> y.  Can you expand on this?  What would be a better solution (if there is 
> one)?
> 
> Am I wrong that it makes no evolutionary biological sense to use a method 
> that gives different estimates of the probability of a relationship based on 
> the direction in which one looks at the relationship? Doesn't the fact that 
> the method gives different answers in this way invalidate the method for 
> taking phylogeny into account when assessing relationships among biological 
> taxa?  How could it be biologically meaningful for phylogeny to have a 
> greater influence when x is predicting y, than when y is predicting x?  Maybe 
> I'm missing something here.
> 
> -Tom 
> 
> 
> On Jul 21, 2013, at 8:59 PM, Liam J. Revell <liam.rev...@umb.edu> wrote:
> 
>> Hi Tom.
>> 
>> Joe pointed out that if we assume that our variables are multivariate 
>> normal, then a hypothesis test on the regression is the same as a test that 
>> cov(x,y) is different from zero.
>> 
>> If you insist on using lambda, one logical extension to this might be to 
>> jointly optimize lambda for x & y (following Freckleton et al. 2002) and 
>> then fix the value of lambda at its joint MLE during GLS. This would at 
>> least have the property of guaranteeing that the P-values for y~x and x~y 
>> are the same....
>> 
>> I previously posted code for joint estimation of lambda on my blog here: 
>> http://blog.phytools.org/2012/09/joint-estimation-of-pagels-for-multiple.html.
>> 
>> With this code to fit joint lambda, our analysis would then look something 
>> like this:
>> 
>> require(phytools)
>> require(nlme)
>> lambda<-joint.lambda(tree,cbind(x,y))$lambda
>> fit1<-gls(y~x,data=data.frame(x,y),correlation=corPagel(lambda,tree,fixed=TRUE))
>> fit2<-gls(x~y,data=data.frame(x,y),correlation=corPagel(lambda,tree,fixed=TRUE))
>> 
>> I'm not sure that this is a good idea - but it is possible....
>> 
>> - Liam
>> 
>> Liam J. Revell, Assistant Professor of Biology
>> University of Massachusetts Boston
>> web: http://faculty.umb.edu/liam.revell/
>> email: liam.rev...@umb.edu
>> blog: http://blog.phytools.org
>> 
>> On 7/21/2013 6:15 PM, Tom Schoenemann wrote:
>>> Hi all,
>>> 
>>> I'm still unsure of how I should interpret results given that using PGLS
>>> to predict group size from brain size gives different significance
>>> levels and lambda estimates than when I do the reverse (i.e., predict
>>> brain size from group size).  Biologically, I don't think this makes any
>>> sense.  If lambda is an estimate of the phylogenetic signal, what
>>> possible evolutionary and biological sense are we to make if the
>>> estimates of lambda are significantly different depending on which way
>>> the association is assessed? I understand the mathematics may allow
>>> this, but if I can't make sense of this biologically, then doesn't it
>>> call into question the use of this method for these kinds of questions
>>> in the first place?  What am I missing here?
>>> 
>>> Here is some results from data I have that illustrate this (notice that
>>> the lambda values are significantly different from each other):
>>> 
>>> Group size predicted by brain size:
>>> 
>>>> model.group.by.brain<-pgls(log(GroupSize) ~ log(AvgBrainWt), data = 
>>>> primate_tom, lambda='ML')
>>>> summary(model.group.by.brain)
>>> 
>>> Call:
>>> pgls(formula = log(GroupSize) ~ log(AvgBrainWt), data = primate_tom,
>>>    lambda = "ML")
>>> 
>>> Residuals:
>>>     Min       1Q   Median       3Q      Max
>>> -0.27196 -0.07638  0.00399  0.10107  0.43852
>>> 
>>> Branch length transformations:
>>> 
>>> kappa  [Fix]  : 1.000
>>> lambda [ ML]  : 0.759
>>>   lower bound : 0.000, p = 4.6524e-08
>>>   upper bound : 1.000, p = 2.5566e-10
>>>   95.0% CI   : (0.485, 0.904)
>>> delta  [Fix]  : 1.000
>>> 
>>> Coefficients:
>>>                 Estimate Std. Error t value Pr(>|t|)
>>> (Intercept)     -0.080099   0.610151 -0.1313 0.895825
>>> log(AvgBrainWt)  0.483366   0.136694  3.5361 0.000622 ***
>>> ---
>>> Signif. codes:  0 ï¿½***ï¿½ 0.001 ï¿½**ï¿½ 0.01 ï¿½*ï¿½ 0.05 ï¿½.ï¿½ 0.1 
>>> ï¿½ ï¿½ 1
>>> 
>>> Residual standard error: 0.1433 on 98 degrees of freedom
>>> Multiple R-squared: 0.1132,     Adjusted R-squared: 0.1041
>>> F-statistic:  12.5 on 2 and 98 DF,  p-value: 1.457e-05
>>> 
>>> 
>>> Brain size predicted by group size:
>>> 
>>>> model.brain.by.group<-pgls(log(AvgBrainWt) ~ log(GroupSize), data = 
>>>> primate_tom, lambda='ML')
>>>> summary(model.brain.by.group)
>>> 
>>> Call:
>>> pgls(formula = log(AvgBrainWt) ~ log(GroupSize), data = primate_tom,
>>>    lambda = "ML")
>>> 
>>> Residuals:
>>>     Min       1Q   Median       3Q      Max
>>> -0.38359 -0.08216  0.00902  0.05609  0.27443
>>> 
>>> Branch length transformations:
>>> 
>>> kappa  [Fix]  : 1.000
>>> lambda [ ML]  : 1.000
>>>   lower bound : 0.000, p = < 2.22e-16
>>>   upper bound : 1.000, p = 1
>>>   95.0% CI   : (0.992, NA)
>>> delta  [Fix]  : 1.000
>>> 
>>> Coefficients:
>>>               Estimate Std. Error t value  Pr(>|t|)
>>> (Intercept)    2.740932   0.446943  6.1326 1.824e-08 ***
>>> log(GroupSize) 0.050780   0.043363  1.1710    0.2444
>>> ---
>>> Signif. codes:  0 ï¿½***ï¿½ 0.001 ï¿½**ï¿½ 0.01 ï¿½*ï¿½ 0.05 ï¿½.ï¿½ 0.1 
>>> ï¿½ ï¿½ 1
>>> 
>>> Residual standard error: 0.122 on 98 degrees of freedom
>>> Multiple R-squared: 0.0138,     Adjusted R-squared: 0.003737
>>> F-statistic: 1.371 on 2 and 98 DF,  p-value: 0.2586
>>> 
>>> 
>>> On Jul 14, 2013, at 6:18 AM, Emmanuel Paradis <emmanuel.para...@ird.fr>
>>> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I would like to react a bit on this issue.
>>>> 
>>>> Probably one problem is that the distinction "correlation vs. regression" 
>>>> is not the same for independent data and for phylogenetic data.
>>>> 
>>>> Consider the case of independent observations first. Suppose we are 
>>>> interested in the relationship y = b x + a, where x is an environmental 
>>>> variable, say latitude. We can get estimates of b and a by moving to 10 
>>>> well-chosen locations, sampling 10 observations  of y (they are 
>>>> independent) and analyse the 100 data points with OLS.
>>> Here we cannot say anything about the correlation between x and y
>>> because we controlled the distribution of x. In practice, even if x is
>>> not controlled, this approach is still valid as long as the observations
>>> are independent.
>>>> 
>>>> With phylogenetic data, x is not controlled if it is measured "on the 
>>>> species" -- in other words it's an evolving trait (or intrinsic variable). 
>>>> x may be controlled if it is measured "outside the species" (extrinsic 
>>>> variable) such as latitude. So the case  of using regression or 
>>>> correlation is not the same than above.
>>> Combining intrinsic and extinsic variables has generated a lot of debate
>>> in the literature.
>>>> 
>>>> I don't think it's a problem of using a method and not another, but rather 
>>>> to use a method keeping in mind what it does (and its assumptions). 
>>>> Apparently, Hansen and Bartoszek consider a range of models including 
>>>> regression models where, by contrast to GLS,  the evolution of the 
>>>> predictors is modelled explicitly.
>>>> 
>>>> If we want to progress in our knowledge on how evolution works, I think we 
>>>> have to not limit ourselves to assess whether there is a relationship, but 
>>>> to test more complex models. The case presented by Tom is particularly 
>>>> relevant here (at least to me): testing  whether group size affects brain 
>>>> size or the opposite (or both) is an
>>> important question. There's been also a lot of debate whether
>>> comparative data can answer this question. Maybe what we need here is an
>>> approach based on simultaneous equations (aka structural equation
>>> models), but I'm not aware whether this exists in a phylogenetic
>>> framework. The approach by Hansen and Bartoszek could be a step in this
>>> direction.
>>>> 
>>>> Best,
>>>> 
>>>> Emmanuel
>>>> 
>>>> Le 13/07/2013 02:59, Joe Felsenstein a ï¿½crit :
>>>>> 
>>>>> Tom Schoenemann asked me:
>>>>> 
>>>>>> With respect to your crankiness, is this the paper by Hansen that you 
>>>>>> are referring to?:
>>>>>> 
>>>>>> Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S., & Hansen, T. F. 
>>>>>> (2012). A phylogenetic comparative method for studying multivariate 
>>>>>> adaptation. Journal of Theoretical Biology, 314(0), 204-215.
>>>>>> 
>>>>>> I wrote Bartoszek to see if I could get his R code to try the method 
>>>>>> mentioned in there. If I can figure out how to apply it to my data, that 
>>>>>> will be great. I agree that it is clearly a mistake to assume one 
>>>>>> variable is responding evolutionarily only to  the current value of the 
>>>>>> other (predictor variables).
>>>>> 
>>>>> I'm glad to hear that *somebody* here thinks it is a mistake (because it 
>>>>> really is).  I keep mentioning it here, and Hansen has published 
>>>>> extensively on it, but everyone keeps saying "Well, my friend used it, 
>>>>> and he got tenure, so it must be OK".
>>>>> 
>>>>> The paper I saw was this one:
>>>>> 
>>>>> Hansen, Thomas F & Bartoszek, Krzysztof (2012). Interpreting the 
>>>>> evolutionary regression: The interplay between observational and 
>>>>> biological errors in phylogenetic comparative studies. Systematic Biology 
>>>>>  61 (3): 413-425.  ISSN 1063-5157.
>>>>> 
>>>>> J.F.
>>>>> ----
>>>>> Joe Felsenstein         j...@gs.washington.edu
>>>>> Department of Genome Sciences and Department of Biology,
>>>>> University of Washington, Box 355065, Seattle, WA 98195-5065 USA
>>>>> 
>>>>> _______________________________________________
>>>>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>>>> Searchable archive 
>>>>> athttp://www.mail-archive.com/r-sig-phylo@r-project.org/
>>>>> 
>>> 
>>> _________________________________________________
>>> P. Thomas Schoenemann
>>> 
>>> Associate Professor
>>> Department of Anthropology
>>> Cognitive Science Program
>>> Indiana University
>>> Bloomington, IN  47405
>>> Phone: 812-855-8800
>>> E-mail: t...@indiana.edu
>>> 
>>> Open Research Scan Archive (ORSA) Co-Director
>>> Consulting Scholar
>>> Museum of Archaeology and Anthropology
>>> University of Pennsylvania
>>> 
>>> http://www.indiana.edu/~brainevo
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>        [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________
>>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
>> 
> 
> _________________________________________________
> P. Thomas Schoenemann
> 
> Associate Professor
> Department of Anthropology
> Cognitive Science Program
> Indiana University
> Bloomington, IN  47405
> Phone: 812-855-8800
> E-mail: t...@indiana.edu
> 
> Open Research Scan Archive (ORSA) Co-Director
> Consulting Scholar
> Museum of Archaeology and Anthropology
> University of Pennsylvania
> 
> http://www.indiana.edu/~brainevo
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>       [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_________________________________________________
P. Thomas Schoenemann

Associate Professor
Department of Anthropology
Cognitive Science Program
Indiana University
Bloomington, IN  47405
Phone: 812-855-8800
E-mail: t...@indiana.edu

Open Research Scan Archive (ORSA) Co-Director
Consulting Scholar
Museum of Archaeology and Anthropology
University of Pennsylvania

http://www.indiana.edu/~brainevo











        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] PGLS vs lm

Reply via email to