Re: [R-sig-phylo] pic() vs gls()
Hi all, Citing http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction: Using Felsenstein's (1985) phylogenetic independent contrasts (pic); This is also a Brownian-motion based estimator, but it only takes descendants of each node into account in reconstructing the state at that node. More basal nodes are ignored. I THINK that, on the opposite, more basal nodes are NOT ignored in gls and for this reason results can differ slightly I'm wrong? Hi Nick. For your first (simple) problem, I believe you want to do: read.csv(file=Mason_data.csv,row.names=1)-smdata Regarding the more complicated issue, the problem of non-independence in linear regression comes in the residual error of the model. Thus, you should fit a correlation structure for the error in Y given X (not for X Y separately as you have done with PICs here). This indeed can be done using gls() - so, for instance: pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, spbm),data=smdata) fits a linear model in which the residual error of Pause_Rate given Comp.1 is distributed according to the correlation structure corMartins() with alpha estimated using REML. The way to reproduce this result using contrasts is the following: alpha-attr(pglsModel1a$apVar,Pars)[corStruct] # get fitted alpha smdata-as.matrix(smdata) # important pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=alpha)) pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=alpha)) summary(lm(pr_contrast~pc1_contrast-1)) Which if you compare to: summary(pglsModel1a) you should find yields the same results and P-value. (Note that smdata-as.matrix(smdata) seems to be important here as if smdata is a data frame, then smdata[,1] does not inherit the rownames of smdata and pic() will return an incorrect set of contrasts.) I hope this helps. No doubt other subscribers to the list may have something to add. Best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 7/13/2011 11:07 PM, Nicholas Mason wrote: Dear R-sig-phylo users, I have a question regarding comparative analyses of contrasts done with the functions fitContinuous() and pic() compared to using PGLS (using the gls() function). From my understanding the first method involving pic() below fits alpha (estimated using fitContinuous()) to each character independently then performs a regression on the resulting contrasts. The second (PGLS) method involving gls() fits both the regression and contrasts simultaneously and reports a single alpha value for the relationship between two traits. These two methods appear very similar, yet I get slightly different results between the two functions, particularly with respect to p-values. Using fitContinuous(), the results from the data set attached are r2 = 0.075 p = 0.091. Using gls(): r2 = 0.079 p =0.0519. Furthermore, if I switch the dependent and independent variables (V1~V2 vs V2~V1), I get the same answer with pic(), but gls() differs: r2 = -0.069 p =0.02 (see pglsModel1a vs pglsModel1b in the attachment). Could anyone explain why these functions vary (in my mind they were doing essentially the same thing) and if there are situations where one should be favored over another? Also, why does PGLS vary if you switch the dependent/independent variables in the linear model? Thanks in advance for any advice or comments offered!! Cheers, Nick Mason Below I have the code I have been using (also attached as Mason.R): require(ape) require(nlme) require(geiger) read.csv(file=Mason_data.csv)-smdata rownames(smdata)-smdata[,1] smdata[,1]-NULL#ASIDE: If anyone could tell me how to get around these two lines of code, that would also be awesome. Header=T doesn't seem to work for me. read.tree(file=Mason_tree.tre)-spbm name.check(smdata,spbm) fitContinuous(spbm,smdata,model=OU)-ou2 pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=ou2$Pause_Rate$alpha)) pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=ou2$Comp.1$alpha)) summary(lm(pr_contrast~pc1_contrast-1)) pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, spbm),data=smdata) summary(pglsModel1a) pglsModel1b-gls(Comp.1~Pause_Rate, correlation=corMartins(1, spbm),data=smdata) summary(pglsModel1b) - Nicholas Albert Mason MS Candidate, Burns Lab Department of Biology - EB Program San Diego State University 5500 Campanile Drive San Diego, CA 92182-4614 845-240-0649 (cell) ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org
Re: [R-sig-phylo] Assigning node ages to a tree, revisited
Roger, If you like, go ahead and send me off-list your code and tree file. The function does not do a lot of error checking, so it is probably hanging on tip or node names; an incorrect tree structure can flummox it, too. If it is working ,it shouldn't take more than a few seconds. Best, Gene On 7/14/11 12:15 AM, Scott Chamberlain myrmecocys...@gmail.com wrote: Hi Roger, Can you provide a reproducible example (perhaps a subset of your tree and their tip and node ages if the tree is very large)? I don't know what the problem could be without the data, but perhaps Gene knows? Scott On Wednesday, July 13, 2011 at 10:57 PM, Roger Close wrote: Hello all, I wish to transform branch lengths on a tree according to ages of terminal taxa and internal node ages; to this end I have tried to implement the script written by Gene Hunt mentioned in a recent post to this list (http://www.mail-archive.com/r-sig-phylo@r-project.org/msg01005.html; script available at https://gist.github.com/938313). However, when attempting to run the command scalePhylo(tr, tip.ages, node.mins), it crunches away for hours on my computer (a 2010 MacBook Pro Core i7, which is quite fast). Clearly there's something wrong. I'm afraid I don't know enough about R to debug a script. I thought I'd send this to the list in case someone other than Gene Hunt or Scott Chamberlain has had a go at using this script. Thanks in advance, Roger -- Gene Hunt Curator, Department of Paleobiology National Museum of Natural History Smithsonian Institution [NHB, MRC 121] P.O. Box 37012 Washington DC 20013-7012 Phone: 202-633-1331 Fax: 202-786-2832 http://paleobiology.si.edu/staff/individuals/hunt.cfm ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] pic() vs gls()
Paolo Piras wrote -- Citing http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction: Using Felsenstein's (1985) phylogenetic independent contrasts (pic); This is also a Brownian-motion based estimator, but it only takes descendants of each node into account in reconstructing the state at that node. More basal nodes are ignored. I THINK that, on the opposite, more basal nodes are NOT ignored in gls and for this reason results can differ slightly I'm wrong? The contrast algorithm if continued to the root, makes the correct ancestral reconstruction for the root. You are correct that values for higher nodes in the tree are not the correct ML reconstruction for those nodes. If the tree is rerooted at any interior node and the algorithm used for that, then that node's reconstruction will be correct. There are ways of re-using information so that the total effort of doing this for all interior nodes will be no worse than about twice that of a single pass through the tree. However people may prefer to use PGLS, which if properly done should give the proper estimates for all nodes. There is some discussion of this in Rohlf's 2001 paper in Evolution. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] pic() vs gls()
The independent contrasts algebra for what Joe is talking about can be found in these papers: Garland, T., Jr., P. E. Midford, and A. R. Ives. 1999. An introduction to phylogenetically based statistical methods, with a new method for confidence intervals on ancestral values. American Zoologist 39:374-388. Garland, T., Jr., and A. R. Ives. 2000. Using the past to predict the present: Confidence intervals for regression equations in phylogenetic comparative methods. American Naturalist 155:346-364. Cheers, Ted Theodore Garland, Jr. Professor Department of Biology University of California, Riverside Riverside, CA 92521 Office Phone: (951) 827-3524 Facsimile: (951) 827-4286 = Dept. office (not confidential) Email: tgarl...@ucr.edu http://www.biology.ucr.edu/people/faculty/Garland.html Experimental Evolution: Concepts, Methods, and Applications of Selection Experiments Edited by Theodore Garland, Jr. and Michael R. Rose http://www.ucpress.edu/book.php?isbn=9780520261808 (PDFs of chapters are available from me or from the individual authors) From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on behalf of Joe Felsenstein [j...@gs.washington.edu] Sent: Thursday, July 14, 2011 6:35 AM To: ppi...@uniroma3.it Cc: r-sig-phylo Subject: Re: [R-sig-phylo] pic() vs gls() Paolo Piras wrote -- Citing http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction: Using Felsenstein's (1985) phylogenetic independent contrasts (pic); This is also a Brownian-motion based estimator, but it only takes descendants of each node into account in reconstructing the state at that node. More basal nodes are ignored. I THINK that, on the opposite, more basal nodes are NOT ignored in gls and for this reason results can differ slightly I'm wrong? The contrast algorithm if continued to the root, makes the correct ancestral reconstruction for the root. You are correct that values for higher nodes in the tree are not the correct ML reconstruction for those nodes. If the tree is rerooted at any interior node and the algorithm used for that, then that node's reconstruction will be correct. There are ways of re-using information so that the total effort of doing this for all interior nodes will be no worse than about twice that of a single pass through the tree. However people may prefer to use PGLS, which if properly done should give the proper estimates for all nodes. There is some discussion of this in Rohlf's 2001 paper in Evolution. Joe Joe Felsenstein, j...@gs.washington.edu Dept. of Genome Sciences, Univ. of Washington Box 355065, Seattle, WA 98195-5065 USA ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] cut a tree in a given time interval
On 7/14/11 2:45 AM, ppi...@uniroma3.it wrote: Thankyou NIck, now...I have an error when running it: Oops. Apologies, this stuff is in-house code, I haven't organized it. Add these to the text file... === get_daughters - function(nodenum, t) { daughter_edgenums = findall(nodenum, t$edge[,1]) daughter_nodenums = t$edge[,2][daughter_edgenums] return(daughter_nodenums) } # Get indices of all matches to a list findall - function(what, inlist) { TFmatches = inlist == what indices = 1:length(inlist) matching_indices = indices[TFmatches] return(matching_indices) } === object get_daughters not found It does not seem I miss something some help? thanks again best paolo Oops, left that out. Here it is as text and an attached file. You will have to do library(ape) and maybe library(phangorn) ...ahead of time (hopefully not others...) chainsaw- function(tr, timepoint=10) { # Take a tree and saw it off evenly across a certain timepoint. # This removes any tips above the timepoint, and replaces them # with a single tip representing the lineage crossing # the timepoint (with a new tip name). # Get the tree in a table tr_table = prt(tr, printflag=FALSE) # Find the tips that are less than 10 my old and drop them TF_exists_more_recently_than_10mya = tr_table$time_bp timepoint # Get the corresponding labels labels_for_tips_existing_more_recently_than_10mya = tr_table$label[ TF_exists_more_recently_than_10mya == TRUE ] ### # Draft chainsaw function ### # loop through the branches that cross 10 mya # get a list of the edge start/stops in the phylogeny's edges edge_times_bp = get_edge_times_before_present(tr) # which of these branches cross 10 mya? edges_start_earlier_than_10mya = edge_times_bp[, 1] timepoint edges_end_later_than_10mya = edge_times_bp[, 2] = timepoint edges_to_chainsaw = edges_start_earlier_than_10mya + edges_end_later_than_10mya == 2 # then, for each of these edges, figure out how many tips exist descending from it nodes_to_chainsaw = tr$edge[, 2][edges_to_chainsaw] nodes_to_chainsaw = nodes_to_chainsaw[nodes_to_chainsaw length(tr$tip.label)] # create a copy of the tree to chainsaw tree_to_chainsaw = tr for (i in 1:length(nodes_to_chainsaw)) { tmp_subtree = extract.clade(tr, nodes_to_chainsaw[i]) #print(tmp_subtree$tip.label) tmp_number_of_tips = length(tmp_subtree$tip.label) #print(tmp_number_of_tips) # number of tips to drop = (numtips -1) numtips_to_drop = tmp_number_of_tips - 1 # tips_to_drop tmp_labels = tmp_subtree$tip.label labels_to_drop = tmp_labels[1:numtips_to_drop] # new label label_kept_num = length(tmp_labels) label_kept = tmp_labels[label_kept_num] new_label = paste(CA_, label_kept, +, numtips_to_drop, _tips, sep=) tree_to_chainsaw$tip.label[tree_to_chainsaw$tip.label == label_kept] = new_label # chop off e.g. 2 of the 3 tips tree_to_chainsaw = drop.tip(tree_to_chainsaw, labels_to_drop) } #plot(tree_to_chainsaw) #axisPhylo() tree_to_chainsaw_table = prt(tree_to_chainsaw, printflag=FALSE) tree_to_chainsaw_table_tips_TF_time_bp_LT_10my = tree_to_chainsaw_table$time_bp timepoint tmp_edge_lengths = tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] times_bp_for_edges_to_chainsaw = tree_to_chainsaw_table$time_bp[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] adjustment = times_bp_for_edges_to_chainsaw - timepoint revised_tmp_edge_lengths = tmp_edge_lengths + adjustment tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] = revised_tmp_edge_lengths # revised ordered_nodenames = get_nodenums(tree_to_chainsaw) parent_branches = get_indices_where_list1_occurs_in_list2(ordered_nodenames, tree_to_chainsaw$edge[,2]) NA_false = is.not.na(tree_to_chainsaw_table$edge.length) tree_to_chainsaw$edge.length[parent_branches[NA_false]] = tree_to_chainsaw_table$edge.length[NA_false] return(tree_to_chainsaw) } # print tree in hierarchical format prt- function(t, printflag=TRUE, relabel_nodes = FALSE, time_bp_digits=7) { # assemble beginning table # check if internal node labels exist if (node.label %in% attributes(t)$names == FALSE) { rootnum = get_nodenum_structural_root(t) new_node_labels = paste(inNode, rootnum:(rootnum+t$Nnode-1), sep=) t$node.label = new_node_labels } # or manually relabel
Re: [R-sig-phylo] Assigning node ages to a tree, revisited
Hi all, As a side note to this discussion: ape has now the function compute.brtime (though it considers only node ages), available on ape's SVN in R/compute.brtime.R. Cheers, Emmanuel Scott Chamberlain wrote on 14/07/2011 11:15: Hi Roger, Can you provide a reproducible example (perhaps a subset of your tree and their tip and node ages if the tree is very large)? I don't know what the problem could be without the data, but perhaps Gene knows? Scott On Wednesday, July 13, 2011 at 10:57 PM, Roger Close wrote: Hello all, I wish to transform branch lengths on a tree according to ages of terminal taxa and internal node ages; to this end I have tried to implement the script written by Gene Hunt mentioned in a recent post to this list (http://www.mail-archive.com/r-sig-phylo@r-project.org/msg01005.html; script available at https://gist.github.com/938313). However, when attempting to run the command scalePhylo(tr, tip.ages, node.mins), it crunches away for hours on my computer (a 2010 MacBook Pro Core i7, which is quite fast). Clearly there's something wrong. I'm afraid I don't know enough about R to debug a script. I thought I'd send this to the list in case someone other than Gene Hunt or Scott Chamberlain has had a go at using this script. Thanks in advance, Roger [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- Emmanuel Paradis IRD, Jakarta, Indonesia http://ape.mpl.ird.fr/ ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo