Re: [R-sig-phylo] pic() vs gls()
Hi all, Citing http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction: Using Felsenstein's (1985) phylogenetic independent contrasts (pic); This is also a Brownian-motion based estimator, but it only takes descendants of each node into account in reconstructing the state at that node. More basal nodes are ignored. I THINK that, on the opposite, more basal nodes are NOT ignored in gls and for this reason results can differ slightly I'm wrong? Hi Nick. For your first (simple) problem, I believe you want to do: read.csv(file=Mason_data.csv,row.names=1)-smdata Regarding the more complicated issue, the problem of non-independence in linear regression comes in the residual error of the model. Thus, you should fit a correlation structure for the error in Y given X (not for X Y separately as you have done with PICs here). This indeed can be done using gls() - so, for instance: pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, spbm),data=smdata) fits a linear model in which the residual error of Pause_Rate given Comp.1 is distributed according to the correlation structure corMartins() with alpha estimated using REML. The way to reproduce this result using contrasts is the following: alpha-attr(pglsModel1a$apVar,Pars)[corStruct] # get fitted alpha smdata-as.matrix(smdata) # important pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=alpha)) pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=alpha)) summary(lm(pr_contrast~pc1_contrast-1)) Which if you compare to: summary(pglsModel1a) you should find yields the same results and P-value. (Note that smdata-as.matrix(smdata) seems to be important here as if smdata is a data frame, then smdata[,1] does not inherit the rownames of smdata and pic() will return an incorrect set of contrasts.) I hope this helps. No doubt other subscribers to the list may have something to add. Best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 7/13/2011 11:07 PM, Nicholas Mason wrote: Dear R-sig-phylo users, I have a question regarding comparative analyses of contrasts done with the functions fitContinuous() and pic() compared to using PGLS (using the gls() function). From my understanding the first method involving pic() below fits alpha (estimated using fitContinuous()) to each character independently then performs a regression on the resulting contrasts. The second (PGLS) method involving gls() fits both the regression and contrasts simultaneously and reports a single alpha value for the relationship between two traits. These two methods appear very similar, yet I get slightly different results between the two functions, particularly with respect to p-values. Using fitContinuous(), the results from the data set attached are r2 = 0.075 p = 0.091. Using gls(): r2 = 0.079 p =0.0519. Furthermore, if I switch the dependent and independent variables (V1~V2 vs V2~V1), I get the same answer with pic(), but gls() differs: r2 = -0.069 p =0.02 (see pglsModel1a vs pglsModel1b in the attachment). Could anyone explain why these functions vary (in my mind they were doing essentially the same thing) and if there are situations where one should be favored over another? Also, why does PGLS vary if you switch the dependent/independent variables in the linear model? Thanks in advance for any advice or comments offered!! Cheers, Nick Mason Below I have the code I have been using (also attached as Mason.R): require(ape) require(nlme) require(geiger) read.csv(file=Mason_data.csv)-smdata rownames(smdata)-smdata[,1] smdata[,1]-NULL#ASIDE: If anyone could tell me how to get around these two lines of code, that would also be awesome. Header=T doesn't seem to work for me. read.tree(file=Mason_tree.tre)-spbm name.check(smdata,spbm) fitContinuous(spbm,smdata,model=OU)-ou2 pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=ou2$Pause_Rate$alpha)) pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=ou2$Comp.1$alpha)) summary(lm(pr_contrast~pc1_contrast-1)) pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, spbm),data=smdata) summary(pglsModel1a) pglsModel1b-gls(Comp.1~Pause_Rate, correlation=corMartins(1, spbm),data=smdata) summary(pglsModel1b) - Nicholas Albert Mason MS Candidate, Burns Lab Department of Biology - EB Program San Diego State University 5500 Campanile Drive San Diego, CA 92182-4614 845-240-0649 (cell) ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org
Re: [R-sig-phylo] pic() vs gls()
Paolo Piras wrote -- Citing http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction: Using Felsenstein's (1985) phylogenetic independent contrasts (pic); This is also a Brownian-motion based estimator, but it only takes descendants of each node into account in reconstructing the state at that node. More basal nodes are ignored. I THINK that, on the opposite, more basal nodes are NOT ignored in gls and for this reason results can differ slightly I'm wrong? The contrast algorithm if continued to the root, makes the correct ancestral reconstruction for the root. You are correct that values for higher nodes in the tree are not the correct ML reconstruction for those nodes. If the tree is rerooted at any interior node and the algorithm used for that, then that node's reconstruction will be correct. There are ways of re-using information so that the total effort of doing this for all interior nodes will be no worse than about twice that of a single pass through the tree. However people may prefer to use PGLS, which if properly done should give the proper estimates for all nodes. There is some discussion of this in Rohlf's 2001 paper in Evolution. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] pic() vs gls()
The independent contrasts algebra for what Joe is talking about can be found in these papers: Garland, T., Jr., P. E. Midford, and A. R. Ives. 1999. An introduction to phylogenetically based statistical methods, with a new method for confidence intervals on ancestral values. American Zoologist 39:374-388. Garland, T., Jr., and A. R. Ives. 2000. Using the past to predict the present: Confidence intervals for regression equations in phylogenetic comparative methods. American Naturalist 155:346-364. Cheers, Ted Theodore Garland, Jr. Professor Department of Biology University of California, Riverside Riverside, CA 92521 Office Phone: (951) 827-3524 Facsimile: (951) 827-4286 = Dept. office (not confidential) Email: tgarl...@ucr.edu http://www.biology.ucr.edu/people/faculty/Garland.html Experimental Evolution: Concepts, Methods, and Applications of Selection Experiments Edited by Theodore Garland, Jr. and Michael R. Rose http://www.ucpress.edu/book.php?isbn=9780520261808 (PDFs of chapters are available from me or from the individual authors) From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on behalf of Joe Felsenstein [j...@gs.washington.edu] Sent: Thursday, July 14, 2011 6:35 AM To: ppi...@uniroma3.it Cc: r-sig-phylo Subject: Re: [R-sig-phylo] pic() vs gls() Paolo Piras wrote -- Citing http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction: Using Felsenstein's (1985) phylogenetic independent contrasts (pic); This is also a Brownian-motion based estimator, but it only takes descendants of each node into account in reconstructing the state at that node. More basal nodes are ignored. I THINK that, on the opposite, more basal nodes are NOT ignored in gls and for this reason results can differ slightly I'm wrong? The contrast algorithm if continued to the root, makes the correct ancestral reconstruction for the root. You are correct that values for higher nodes in the tree are not the correct ML reconstruction for those nodes. If the tree is rerooted at any interior node and the algorithm used for that, then that node's reconstruction will be correct. There are ways of re-using information so that the total effort of doing this for all interior nodes will be no worse than about twice that of a single pass through the tree. However people may prefer to use PGLS, which if properly done should give the proper estimates for all nodes. There is some discussion of this in Rohlf's 2001 paper in Evolution. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] pic() vs gls()
Dear R-sig-phylo users,I have a question regarding comparative analyses of contrasts done with the functions fitContinuous() and pic() compared to using PGLS (using the gls() function).From my understanding the first method involving pic() below fits alpha (estimated using fitContinuous()) to each character independently then performs a regression on the resulting contrasts.The second (PGLS) method involving gls() fits both the regression and contrasts simultaneously and reports a single alpha value for the relationship between two traits.These two methods appear very similar, yet I get slightly different results between the two functions, particularly with respect to p-values. Using fitContinuous(), the results from the data set attached are r2 = 0.075 p = 0.091. Using gls(): r2 = 0.079 p =0.0519. Furthermore, if I switch the dependent and independent variables (V1~V2 vs V2~V1), I get the same answer with pic(), but gls() differs: r2 = -0.069 p =0.02 (see pglsModel1a vs pglsModel1b in the attachment).Could anyone explain why these functions vary (in my mind they were doing essentially the same thing) and if there are situations where one should be favored over another? Also, why does PGLS vary if you switch the dependent/independent variables in the linear model?Thanks in advance for any advice or comments offered!!Cheers,Nick MasonBelow I have the code I have been using (also attached as Mason.R):require(ape)require(nlme)require(geiger)read.csv(file="Mason_data.csv")-smdatarownames(smdata)-smdata[,1]smdata[,1]-NULL#ASIDE: If anyone could tell me how to get around these two lines of code, that would also be awesome. Header=T doesn't seem to work for me.read.tree(file="Mason_tree.tre")-spbmname.check(smdata,spbm)fitContinuous(spbm,smdata,model="OU")-ou2pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=ou2$Pause_Rate$alpha))pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=ou2$Comp.1$alpha))summary(lm(pr_contrast~pc1_contrast-1))pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, spbm),data=smdata)summary(pglsModel1a)pglsModel1b-gls(Comp.1~Pause_Rate, correlation=corMartins(1, spbm),data=smdata)summary(pglsModel1b) Mason_tree.tre Description: Binary data ,Pause_Rate,Comp.1,Comp.2 Sporophila_schistacea,0.97572108797618,1.3796099841333,-1.12587956726316 Sporophila_intermedia,0.314864247370089,2.36946494397968,1.16313695174602 Sporophila_plumbea,0.525348288841447,2.71609091582214,-0.0201502845760047 Sporophila_torqueola,0.090114188687708,3.55768691291833,0.0588440560485339 Sporophila_collaris,0.258598106973663,-0.866576621272912,-0.375459372210033 Sporophila_lineola,1.91924407069217,5.38973476610418,-0.719604956651293 Sporophila_luctuosa,0.316867010852651,3.51591652591355,-1.01122149539569 Sporophila_nigricollis,0.646766139216069,4.60967961843748,0.403572095059875 Sporophila_caerulescens,1.19967606461856,4.42942981620688,0.266281245585206 Sporophila_albogularis,-0.747926431242122,2.14986795813319,0.453292241973436 Sporophila_peruviana,-0.662055276873553,-5.75668334764164,0.194910239070253 Sporophila_simplex,-0.0159932866483984,0.50911606352083,-0.961689651320771 Sporophila_bouvreuil,-0.113798179556032,4.96419876880559,-0.0444962146993295 Sporophila_minuta,-0.653039978630658,4.51909264308507,0.271433436605055 Sporophila_hypoxantha,-1.33520075061216,3.6424747461571,-0.143885556798834 Sporophila_ruficollis,-1.10188388209151,3.69620160094318,-1.122292891 Sporophila_castaneiventris,-0.957617880681526,5.13264024147439,0.71993032372333 Sporophila_hypochroma,-0.860526127835558,3.70604616959997,-0.317101374278479 Sporophila_cinnamomea,-0.859253764452833,4.92903311573274,-0.397590156130398 Sporophila_melanogaster,0.246280339669374,3.57517426978125,0.492140024454993 Sporophila_telasco,0.512066210743585,4.57687350229544,0.151380525693064 Oryzoborus_nuttingi,-0.803111018941645,-16.0737467067366,-1.53209487276288 Oryzoborus_crassirostris,-0.57221657664358,-10.5639148825931,1.92861328894315 Oryzoborus_atrirostris,-0.871304566495871,-18.0081579920927,-0.767118053337893 Oryzoborus_maximiliani,-0.326024013387329,-14.0150572642919,-0.0204930609520138 Oryzoborus_angolensis,-0.142855044536879,-6.48632361126553,0.96227143885814 Dolospingus_fringilloides,1.26961240149379,-2.93723705653849,1.74362516139737 Mason.R Description: Binary data -Nicholas Albert MasonMS Candidate, Burns LabDepartment of Biology - EB ProgramSan Diego State University5500 Campanile DriveSan Diego, CA 92182-4614845-240-0649 (cell) ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] pic() vs gls()
Hi Nick. For your first (simple) problem, I believe you want to do: read.csv(file=Mason_data.csv,row.names=1)-smdata Regarding the more complicated issue, the problem of non-independence in linear regression comes in the residual error of the model. Thus, you should fit a correlation structure for the error in Y given X (not for X Y separately as you have done with PICs here). This indeed can be done using gls() - so, for instance: pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, spbm),data=smdata) fits a linear model in which the residual error of Pause_Rate given Comp.1 is distributed according to the correlation structure corMartins() with alpha estimated using REML. The way to reproduce this result using contrasts is the following: alpha-attr(pglsModel1a$apVar,Pars)[corStruct] # get fitted alpha smdata-as.matrix(smdata) # important pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=alpha)) pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=alpha)) summary(lm(pr_contrast~pc1_contrast-1)) Which if you compare to: summary(pglsModel1a) you should find yields the same results and P-value. (Note that smdata-as.matrix(smdata) seems to be important here as if smdata is a data frame, then smdata[,1] does not inherit the rownames of smdata and pic() will return an incorrect set of contrasts.) I hope this helps. No doubt other subscribers to the list may have something to add. Best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 7/13/2011 11:07 PM, Nicholas Mason wrote: Dear R-sig-phylo users, I have a question regarding comparative analyses of contrasts done with the functions fitContinuous() and pic() compared to using PGLS (using the gls() function). From my understanding the first method involving pic() below fits alpha (estimated using fitContinuous()) to each character independently then performs a regression on the resulting contrasts. The second (PGLS) method involving gls() fits both the regression and contrasts simultaneously and reports a single alpha value for the relationship between two traits. These two methods appear very similar, yet I get slightly different results between the two functions, particularly with respect to p-values. Using fitContinuous(), the results from the data set attached are r2 = 0.075 p = 0.091. Using gls(): r2 = 0.079 p =0.0519. Furthermore, if I switch the dependent and independent variables (V1~V2 vs V2~V1), I get the same answer with pic(), but gls() differs: r2 = -0.069 p =0.02 (see pglsModel1a vs pglsModel1b in the attachment). Could anyone explain why these functions vary (in my mind they were doing essentially the same thing) and if there are situations where one should be favored over another? Also, why does PGLS vary if you switch the dependent/independent variables in the linear model? Thanks in advance for any advice or comments offered!! Cheers, Nick Mason Below I have the code I have been using (also attached as Mason.R): require(ape) require(nlme) require(geiger) read.csv(file=Mason_data.csv)-smdata rownames(smdata)-smdata[,1] smdata[,1]-NULL#ASIDE: If anyone could tell me how to get around these two lines of code, that would also be awesome. Header=T doesn't seem to work for me. read.tree(file=Mason_tree.tre)-spbm name.check(smdata,spbm) fitContinuous(spbm,smdata,model=OU)-ou2 pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=ou2$Pause_Rate$alpha)) pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=ou2$Comp.1$alpha)) summary(lm(pr_contrast~pc1_contrast-1)) pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, spbm),data=smdata) summary(pglsModel1a) pglsModel1b-gls(Comp.1~Pause_Rate, correlation=corMartins(1, spbm),data=smdata) summary(pglsModel1b) - Nicholas Albert Mason MS Candidate, Burns Lab Department of Biology - EB Program San Diego State University 5500 Campanile Drive San Diego, CA 92182-4614 845-240-0649 (cell) ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo