Re: [R-sig-phylo] PGLS vs lm
Hi Tom, In an OLS regression, the residuals from both regressions (varA ~ varB and varB ~ varA) are different but their distributions are (more or less) symmetric. So, because the residuals are independent (ie, their covariance is null), the residual standard error will be the same (or very close in practice). In GLS, the residuals are not independent, so this difference in the distribution of the residuals affects the estimation of the residual standard errors (because we need to estimate the covaraince of the residuals), and consequently the associated tests. Best, Emmanuel Le 11/07/2013 11:03, Tom Schoenemann a écrit : Hi all, I ran a PGLS with two variables, call them VarA and VarB, using a phylogenetic tree and corPagel. When I try to predict VarA from VarB, I get a significant coefficient for VarB. However, if I invert this and try to predict VarB from VarA, I do NOT get a significant coefficient for VarA. Shouldn't these be both significant, or both insignificant (the actual outputs and calls are pasted below)? If I do a simple lm for these, I get the same significance level for the coefficients either way (i.e., lm(VarA ~ VarB) vs. lm(VarB ~ VarA), though the values of the coefficients of course differ. Can someone help me understand why the PGLS would not necessarily be symmetric in this same way? Thanks, -Tom outTree_group_by_brain_LambdaEst_redo1 - gls(log_group_size_data ~ log_brain_weight_data, correlation = bm.t.100species_lamEst_redo1,data = DF.brain.repertoire.group, method= ML) summary(outTree_group_by_brain_LambdaEst_redo1) Generalized least squares fit by maximum likelihood Model: log_group_size_data ~ log_brain_weight_data Data: DF.brain.repertoire.group AIC BIClogLik 89.45152 99.8722 -40.72576 Correlation Structure: corPagel Formula: ~1 Parameter estimate(s): lambda 0.7522738 Coefficients: Value Std.Error t-value p-value (Intercept) -0.0077276 0.2628264 -0.029402 0.9766 log_brain_weight_data 0.4636859 0.1355499 3.420778 0.0009 Correlation: (Intr) log_brain_weight_data -0.637 Standardized residuals: Min Q1Med Q3Max -1.7225003 -0.1696079 0.5753531 1.0705308 3.0685637 Residual standard error: 0.5250319 Degrees of freedom: 100 total; 98 residual Here is the inverse: outTree_brain_by_group_LambdaEst_redo1 - gls(log_brain_weight_data ~ log_group_size_data, correlation = bm.t.100species_lamEst_redo1,data = DF.brain.repertoire.group, method= ML) summary(outTree_brain_by_group_LambdaEst_redo1) Generalized least squares fit by maximum likelihood Model: log_brain_weight_data ~ log_group_size_data Data: DF.brain.repertoire.group AIC BIC logLik -39.45804 -29.03736 23.72902 Correlation Structure: corPagel Formula: ~1 Parameter estimate(s): lambda 1.010277 Coefficients: Value Std.Error t-value p-value (Intercept) 1.2244133 0.20948634 5.844836 0. log_group_size_data -0.0234525 0.03723828 -0.629796 0.5303 Correlation: (Intr) log_group_size_data -0.095 Standardized residuals: Min Q1Med Q3Max -2.0682836 -0.3859688 1.1515176 1.5908565 3.1163377 Residual standard error: 0.4830596 Degrees of freedom: 100 total; 98 residual _ P. Thomas Schoenemann Associate Professor Department of Anthropology Cognitive Science Program Indiana University Bloomington, IN 47405 Phone: 812-855-8800 E-mail: t...@indiana.edu Open Research Scan Archive (ORSA) Co-Director Consulting Scholar Museum of Archaeology and Anthropology University of Pennsylvania http://www.indiana.edu/~brainevo [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] AICc model averaging for PGLS models with optimised lambda
Hi All, I am running a pgls regression analysis in the caper package in R. My dataset has a continuous response variable and five explanatory variables (three continuous, two discrete and with multiple categories). Each datapoint in the response variables corresponds to a single species of which there are 67 in total. I plan to run multiple regression models, each model containing a different combination of the explanatory variables (fixed effects only, no interactions; total number of models in full set = 32), and then use AICc values to calculate AICc model weights as per Burnham Anderson (2002). These weights will then be used to calculate weighted-average regression coefficients for each explanatory variable across all models (also as per Burnham Anderson 2002). The weighted-average regression coefficients will then be used for prediction. I am not necessarily interested in the extent of the phylogenetic effect - the aim is to compare results from the PGLS models to the same! set of models run using simple OLS regression and (hopefully) show that the results are broadly similar. I have a phylogeny (phyl) of the 67 species taken from the literature. This phylogeny has no branch lengths, so I have set all branch lengths as equal to 1 using: phylC - compute.brlen(phyl, 1) My questions are: Firstly, should I be optimising the value of the lambda transformation in each of the 32 models to control for the different levels of phylogenetic effect when including different combinations of explanatory variables? e.g. model - pgls(y ~ x1 + x2 + x3, data=phylC, lambda = ML) Or would it be sufficient/robust to set lambda = 1 (the default) for all models, thereby simply assuming that there is a phylogenetic effect without necessarily being interested in it's extent. Optimising lambda for all the 32 models returns maximum likelihood values of lambda that range from 0.478 (significantly 0 and 1 using likelihood ratio tests) to 0. Secondly, if I should be optimising lambda in each model, is it meaningful to then compare and average across models using AICc values? Surely I could not be sure whether the difference in AICc (i.e. model fit) between models was due to the inclusion of different explanatory variables (which I am really interested in) or the different value of the lambda transformation? In a previous post, David Orme wrote: ...it isn't necessarily safe to use AIC or deletion tests to compare models with different covariance structures _and_ different fixed effects, since changing the branch lengths affects the AIC of the model...Basically, it isn't easy to do model simplification of your fixed effects whilst also finding a ML evolutionary model for that changing set of fixed effects. Another argument for keeping the branch length transforms simple. http://www.mail-archive.com/r-sig-phylo@r-project.org/msg02060.html Though this implies that optimising lambda and then comparing models with AIC is not a good idea, I have found various references where this appears to have been done, so am confused as to what to do. Many thanks in advance for your help, Lucy Malpas Conservation Scientist Species Monitoring and Research, Conservation Science The Royal Society for the Protection of Birds [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] PGLS vs lm
I think the issue is largely one of conceptualizing the problem. People often view body size as an independent variable when analyzing brain size, but obviously this is a serious oversimplificaiton -- usually done for statistical convenience -- that does not reflect the biology (yes, I have also done this!). Moreover, brain mass is part of body mass, so if you use body mass per se as an independent variable then you have potential part-whole correlation statistical issues. I would think carefully about what you are really wanting to do (e.g., regression vs. correlation vs. ANCOVA), and check over this paper: Ives, A. R., P. E. Midford, and T. Garland, Jr. 2007. Within-species variation and measurement error in phylogenetic comparative methods. Systematic Biology 56:252-270. And maybe this one: Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. Phylogenetic analysis of covariance by computer simulation. Systematic Biology 42:265-292. Cheers, Ted Theodore Garland, Jr., Professor Department of Biology University of California, Riverside Riverside, CA 92521 Office Phone: (951) 827-3524 Wet Lab Phone: (951) 827-5724 Dry Lab Phone: (951) 827-4026 Home Phone: (951) 328-0820 Skype: theodoregarland Facsimile: (951) 827-4286 = Dept. office (not confidential) Email: tgarl...@ucr.edu http://www.biology.ucr.edu/people/faculty/Garland.html http://scholar.google.com/citations?hl=enuser=iSSbrhwJ Inquiry-based Middle School Lesson Plan: Born to Run: Artificial Selection Lab http://www.indiana.edu/~ensiweb/lessons/BornToRun.html From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on behalf of Tom Schoenemann [t...@indiana.edu] Sent: Thursday, July 11, 2013 11:19 AM To: Emmanuel Paradis Cc: r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] PGLS vs lm Thanks Emmanuel, OK, so this makes sense in terms of the math involved. However, from a practical, interpretive perspective, shouldn't I assume this to mean that we actually cannot say (from this data) whether VarA and VarB ARE actually associated with each other? In the real world, if VarA is causally related to VarB, then by definition they will be associated. Doesn't this type of situation - where the associations are judged to be statistically significant in one direction but not in the other - suggest that we actually DON'T have confidence that - independent of phylogeny - VarA is associated with VarB? Putting this in the context of the actual variables involved, doesn't this mean that we actually can't be sure brain size is associated with social group size (in this dataset) independent of phylogeny? I notice that the maximum likelihood lambda estimates are different (though I'm not sure they are significantly so). I understand this could mathematically be so, but I'm concerned with how to interpret this. In the real world, how could phylogenetic relatedness affect group size predicting brain size, more than brain size predicting group size? Isn't this a logical problem (for interpretation - not for the math)? In other words, in evolutionary history, shouldn't phylogeny affect the relationship between two variables in only one way, which would show up whichever way we approached the association? Again, I understand the math may allow it, I just don't understand how it could actually be true over evolutionary time. Thanks in advance for helping me understand this better, -Tom On Jul 11, 2013, at 5:12 AM, Emmanuel Paradis emmanuel.para...@ird.fr wrote: Hi Tom, In an OLS regression, the residuals from both regressions (varA ~ varB and varB ~ varA) are different but their distributions are (more or less) symmetric. So, because the residuals are independent (ie, their covariance is null), the residual standard error will be the same (or very close in practice). In GLS, the residuals are not independent, so this difference in the distribution of the residuals affects the estimation of the residual standard errors (because we need to estimate the covaraince of the residuals), and consequently the associated tests. Best, Emmanuel Le 11/07/2013 11:03, Tom Schoenemann a �crit : Hi all, I ran a PGLS with two variables, call them VarA and VarB, using a phylogenetic tree and corPagel. When I try to predict VarA from VarB, I get a significant coefficient for VarB. However, if I invert this and try to predict VarB from VarA, I do NOT get a significant coefficient for VarA. Shouldn't these be both significant, or both insignificant (the actual outputs and calls are pasted below)? If I do a simple lm for these, I get the same significance level for the coefficients either way (i.e., lm(VarA ~ VarB) vs. lm(VarB ~ VarA), though the values of the coefficients of course differ. Can someone help me understand why the PGLS would not necessarily be symmetric in this same way? Thanks, -Tom