Re: [R-sig-phylo] Extracting sister groups
Hi Fran�ois, Thank you kindly for your offer of help. The code below will simulate a phylogeny (tree) and a dataframe (trait) with one binary trait for 100 species. The format is representative of the data I am using for my analyses so should serve as a test case. Hopefully this helps, let me know if there's any other information I can provide. library(ape) library(phytools) tree-rtree(100) tran-matrix(c(-1,1,1,-1),2,2) rownames(tran)-c(0,1) colnames(tran)-c(0,1) phy-sim.history(tree,tran) trait-data.frame(sp=tree$tip.label,bt=getStates(phy,type=tips)) rownames(trait)-tree$tip.label Cheers, Kev From: Fran�ois Michonneau [francois.michonn...@gmail.com] Sent: 23 October 2014 14:54 To: Arbuckle, Kevin Subject: Re: [R-sig-phylo] Extracting sister groups Hi Kevin, We should be able to help you but it would be much easier if you provided us with a small data set that illustrate the format of your current dataset. How is your trait currently stored? and how is it associated with the tips in your tree? Cheers, -- Fran�ois On Thu, Oct 23, 2014 at 6:23 AM, Arbuckle, Kevin k.arbuc...@liverpool.ac.ukmailto:k.arbuc...@liverpool.ac.uk wrote: Hi everyone, I am attempting to run sister group analyses as one way to look at the effect of a binary trait on diversification. Two of the functions from ape that I'm looking at are diversity.contrast.test and richness.yule.test, but both have the same limitation. They require the data to be input as a dataframe of two columns, one with the number of species in clades that have the trait of interest, and the other with the number of species in the respective sister clades that don't have the trait. The issue is that I am working with a very large tree, and so extracting and entering such information by hand is not really feasible. I am therefore looking for a function which extracts all sister clades that differ in the presence vs absence of the trait, and ideally is capable of generating a dataframe of the appropriate format for the above functions automatically. It seems that a function to do this should exist already, but as I can't seem to find anything I would appreciate some help (hopefully someone will know of such a function that already exists). Thanks, Kevin Arbuckle [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.orgmailto:R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Extracting sister groups
HI Kevin, If I understand correctly what you're trying to do, you'll first need to collapse some of your tips to create clades, a proportion of which will have the trait. You'll then be able to use this new tree to generate the data.frame needed by the functions you mentioned in your original post. Depending on what you're trying to do, you may not want to lose this phylogenetic information. Maybe a different approach, such as using BiSSE in the diversitree package might be more appropriate? Cheers, -- François On Thu, Oct 23, 2014 at 10:27 AM, Arbuckle, Kevin k.arbuc...@liverpool.ac.uk wrote: Hi François, Thank you kindly for your offer of help. The code below will simulate a phylogeny (tree) and a dataframe (trait) with one binary trait for 100 species. The format is representative of the data I am using for my analyses so should serve as a test case. Hopefully this helps, let me know if there's any other information I can provide. library(ape) library(phytools) tree-rtree(100) tran-matrix(c(-1,1,1,-1),2,2) rownames(tran)-c(0,1) colnames(tran)-c(0,1) phy-sim.history(tree,tran) trait-data.frame(sp=tree$tip.label,bt=getStates(phy,type=tips)) rownames(trait)-tree$tip.label Cheers, Kev -- *From:* François Michonneau [francois.michonn...@gmail.com] *Sent:* 23 October 2014 14:54 *To:* Arbuckle, Kevin *Subject:* Re: [R-sig-phylo] Extracting sister groups Hi Kevin, We should be able to help you but it would be much easier if you provided us with a small data set that illustrate the format of your current dataset. How is your trait currently stored? and how is it associated with the tips in your tree? Cheers, -- François On Thu, Oct 23, 2014 at 6:23 AM, Arbuckle, Kevin k.arbuc...@liverpool.ac.uk wrote: Hi everyone, I am attempting to run sister group analyses as one way to look at the effect of a binary trait on diversification. Two of the functions from ape that I'm looking at are diversity.contrast.test and richness.yule.test, but both have the same limitation. They require the data to be input as a dataframe of two columns, one with the number of species in clades that have the trait of interest, and the other with the number of species in the respective sister clades that don't have the trait. The issue is that I am working with a very large tree, and so extracting and entering such information by hand is not really feasible. I am therefore looking for a function which extracts all sister clades that differ in the presence vs absence of the trait, and ideally is capable of generating a dataframe of the appropriate format for the above functions automatically. It seems that a function to do this should exist already, but as I can't seem to find anything I would appreciate some help (hopefully someone will know of such a function that already exists). Thanks, Kevin Arbuckle [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] midpoint rooting? how to get the outgroup?
Dear Klauss, Following the topic below, I was wondering if there is function to get the name of the outgroup after having rooted the tree with the midpoint function. I can plot the tree and just check it by eyes. However I got 1000 trees and I want to count how many times on particular species is placed as outgroup. Thank you for your help. Romain Blanc-Mathieu PhD INRA-CNRS-UNS Agrobiotech sophia Antipolis 06160 Antibes FRANCE Dear Robin, here is a function, that does midpoint rooting, you have to attach the phangorn package. It is likely to appear in one of the next ape versions. Cheers, Klaus midpoint - function(tree){ dm = cophenetic(tree) tree = unroot(tree) rn = max(tree$edge)+1 maxdm = max(dm) ind = which(dm==maxdm,arr=TRUE)[1,] tmproot = Ancestors(tree, ind[1], parent) tree = phangorn:::reroot(tree, tmproot) edge = tree$edge el = tree$edge.length children = tree$edge[,2] left = match(ind[1], children) tmp = Ancestors(tree, ind[2], all) tmp= c(ind[2], tmp[-length(tmp)]) right = match(tmp, children) if(el[left]= (maxdm/2)){ edge = rbind(edge, c(rn, ind[1])) edge[left,2] = rn el[left] = el[left] - (maxdm/2) el = c(el, maxdm/2) } else{ sel = cumsum(el[right]) i = which(sel(maxdm/2))[1] edge = rbind(edge, c(rn, tmp[i])) edge[right[i],2] = rn eltmp = sel[i] - (maxdm/2) #el = c(el, sel[i] - (maxdm/2)) el = c(el, el[right[i]] - eltmp) el[right[i]] = eltmp } tree$edge.length = el tree$edge=edge tree$Nnode = tree$Nnode+1 phangorn:::reorderPruning(phangorn:::reroot(tree, rn)) } On 9/2/10, Velzen, Robin van Robin.vanVelzen at wur.nl https://stat.ethz.ch/mailman/listinfo/r-sig-phylo wrote: / Dear Ape authors and mailing list members, // // I am new to R. and to Ape and very much impressed by the functionality the // platform offers. I have used Ape to construct distance-based trees using the // 'read.dna', 'dist.dna' and 'nj' functions. Now I wish to root the resulting // Neighbor Joining tree using midpoint rooting, but have not been able to find // the right function. // // Does anyone know if there is a function for midpoint rooting in Ape or // similar R. package, or if there is a smart way to define a midpoint root // outgroup that can be used with the 'root' function in Ape? // // Any help or suggestion will be much appreciated! // // Thanks, // // Robin // // Robin van Velzen // PhD student // Biosystematics Group // Wageningen University // // Wageningen Campus, Radix building 107, Room W4.Aa.095 // Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands // PO Box 647, 6700 AP Wageningen, The Netherlands // Tel. +31 (0)317 483425 // http://www.bis.wur.nl // // // // [[alternative HTML version deleted]] // // ___ // R-sig-phylo mailing list // R-sig-phylo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo // https://stat.ethz.ch/mailman/listinfo/r-sig-phylo // / -- Klaus Schliep Université Paris 6 (Pierre et Marie Curie) 9, Quai Saint-Bernard, 75005 Paris [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Extracting sister groups
Hi Kevin. It sounds like what you want to do is perform a pre-order tree traversal and for each node visited ask if all the taxa to one side (e.g., descendants of the right daughter node) are in state 0 and all the taxa to the other side (e.g., descendants of the left daughter) are in state 1. If this evaluates to be true, then you record the number of tips in each category. To do this most efficiently, you should not visit any daughters of a node for which you have evaluated balance - but if you do, then you will find that it doesn't satisfy the criterion described above (specifically, descendants on either side of the node will be all in state 0 or 1). This should be straightforward to code, but I do not have time to demonstrate right now. I will try to do it this evening. Let us know if you first figure it out yourself. - Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 10/23/2014 11:13 AM, Arbuckle, Kevin wrote: Hi Fran�ois, Thanks again for your response. Wouldn't that lose the information of how many species were in each clade? And how would I specify that 'clades' to keep consist of those sharing either state? My original tree consists of almost 3000 species so going through such clades manually would be difficult at best, hence the need to automate it somehow. (I apologise, my R coding skills are improving but still leave a lot to be desired in many cases). I completely agree that BiSSE is far more appropriate for my aims, and indeed this was the approach I used. However, reviewers have asked if I get the same basic result using other methods, which is the only reason I am attempting such analyses now. Thank you kindly once again for your time, Kev From: Fran�ois Michonneau [francois.michonn...@gmail.com] Sent: 23 October 2014 16:05 To: Arbuckle, Kevin Cc: r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] Extracting sister groups HI Kevin, If I understand correctly what you're trying to do, you'll first need to collapse some of your tips to create clades, a proportion of which will have the trait. You'll then be able to use this new tree to generate the data.frame needed by the functions you mentioned in your original post. Depending on what you're trying to do, you may not want to lose this phylogenetic information. Maybe a different approach, such as using BiSSE in the diversitree package might be more appropriate? Cheers, -- Fran�ois On Thu, Oct 23, 2014 at 10:27 AM, Arbuckle, Kevin k.arbuc...@liverpool.ac.ukmailto:k.arbuc...@liverpool.ac.uk wrote: Hi Fran�ois, Thank you kindly for your offer of help. The code below will simulate a phylogeny (tree) and a dataframe (trait) with one binary trait for 100 species. The format is representative of the data I am using for my analyses so should serve as a test case. Hopefully this helps, let me know if there's any other information I can provide. library(ape) library(phytools) tree-rtree(100) tran-matrix(c(-1,1,1,-1),2,2) rownames(tran)-c(0,1) colnames(tran)-c(0,1) phy-sim.history(tree,tran) trait-data.frame(sp=tree$tip.label,bt=getStates(phy,type=tips)) rownames(trait)-tree$tip.label Cheers, Kev From: Fran�ois Michonneau [francois.michonn...@gmail.commailto:francois.michonn...@gmail.com] Sent: 23 October 2014 14:54 To: Arbuckle, Kevin Subject: Re: [R-sig-phylo] Extracting sister groups Hi Kevin, We should be able to help you but it would be much easier if you provided us with a small data set that illustrate the format of your current dataset. How is your trait currently stored? and how is it associated with the tips in your tree? Cheers, -- Fran�ois On Thu, Oct 23, 2014 at 6:23 AM, Arbuckle, Kevin k.arbuc...@liverpool.ac.ukmailto:k.arbuc...@liverpool.ac.uk wrote: Hi everyone, I am attempting to run sister group analyses as one way to look at the effect of a binary trait on diversification. Two of the functions from ape that I'm looking at are diversity.contrast.test and richness.yule.test, but both have the same limitation. They require the data to be input as a dataframe of two columns, one with the number of species in clades that have the trait of interest, and the other with the number of species in the respective sister clades that don't have the trait. The issue is that I am working with a very large tree, and so extracting and entering such information by hand is not really feasible. I am therefore looking for a function which extracts all sister clades that differ in the presence vs absence of the trait, and ideally is capable of generating a dataframe of the appropriate format for the above functions automatically. It seems that a function to do this should exist already, but as I can't seem to find anything I would
Re: [R-sig-phylo] midpoint rooting? how to get the outgroup?
Hi Romain, I think I've done something similar to this by searching the subtrees for the name of the taxa in question. The code is probably pretty slow on really big trees (since the subtrees function takes a while). library(phybase) library(phytools) set.seed(5) #make 100 20 tip trees randomly placing xx as outgroup trees-rmtree(100,19) outTre-lapply(trees,function(x) multi2di(bind.tip(x,'xx'))) #check to see if the tip xx is within any subgroups detect-sapply(outTre,function(x){ st-subtrees(x) ! 1 %in% sapply(st,function(x) length(grep('xx',x$tip)))[-1] }) ogTres-outTre[detect] #just for final visualization/ validation (will make many new windows) sapply(ogTres,function(x){ dev.new() plot(x) }) Hope that helps! Cheers, -Dan From: r-sig-phylo-boun...@r-project.org r-sig-phylo-boun...@r-project.org on behalf of romain robl...@paca.inra.fr Sent: Thursday, October 23, 2014 4:57 AM To: r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] midpoint rooting? how to get the outgroup? Dear Klauss, Following the topic below, I was wondering if there is function to get the name of the outgroup after having rooted the tree with the midpoint function. I can plot the tree and just check it by eyes. However I got 1000 trees and I want to count how many times on particular species is placed as outgroup. Thank you for your help. Romain Blanc-Mathieu PhD INRA-CNRS-UNS Agrobiotech sophia Antipolis 06160 Antibes FRANCE Dear Robin, here is a function, that does midpoint rooting, you have to attach the phangorn package. It is likely to appear in one of the next ape versions. Cheers, Klaus midpoint - function(tree){ dm = cophenetic(tree) tree = unroot(tree) rn = max(tree$edge)+1 maxdm = max(dm) ind = which(dm==maxdm,arr=TRUE)[1,] tmproot = Ancestors(tree, ind[1], parent) tree = phangorn:::reroot(tree, tmproot) edge = tree$edge el = tree$edge.length children = tree$edge[,2] left = match(ind[1], children) tmp = Ancestors(tree, ind[2], all) tmp= c(ind[2], tmp[-length(tmp)]) right = match(tmp, children) if(el[left]= (maxdm/2)){ edge = rbind(edge, c(rn, ind[1])) edge[left,2] = rn el[left] = el[left] - (maxdm/2) el = c(el, maxdm/2) } else{ sel = cumsum(el[right]) i = which(sel(maxdm/2))[1] edge = rbind(edge, c(rn, tmp[i])) edge[right[i],2] = rn eltmp = sel[i] - (maxdm/2) #el = c(el, sel[i] - (maxdm/2)) el = c(el, el[right[i]] - eltmp) el[right[i]] = eltmp } tree$edge.length = el tree$edge=edge tree$Nnode = tree$Nnode+1 phangorn:::reorderPruning(phangorn:::reroot(tree, rn)) } On 9/2/10, Velzen, Robin van Robin.vanVelzen at wur.nl https://stat.ethz.ch/mailman/listinfo/r-sig-phylo wrote: / Dear Ape authors and mailing list members, // // I am new to R. and to Ape and very much impressed by the functionality the // platform offers. I have used Ape to construct distance-based trees using the // 'read.dna', 'dist.dna' and 'nj' functions. Now I wish to root the resulting // Neighbor Joining tree using midpoint rooting, but have not been able to find // the right function. // // Does anyone know if there is a function for midpoint rooting in Ape or // similar R. package, or if there is a smart way to define a midpoint root // outgroup that can be used with the 'root' function in Ape? // // Any help or suggestion will be much appreciated! // // Thanks, // // Robin // // Robin van Velzen // PhD student // Biosystematics Group // Wageningen University // // Wageningen Campus, Radix building 107, Room W4.Aa.095 // Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands // PO Box 647, 6700 AP Wageningen, The Netherlands // Tel. +31 (0)317 483425 // http://www.bis.wur.nl // // // // [[alternative HTML version deleted]] // // ___ // R-sig-phylo mailing list // R-sig-phylo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo // https://stat.ethz.ch/mailman/listinfo/r-sig-phylo // / -- Klaus Schliep Université Paris 6 (Pierre et Marie Curie) 9, Quai Saint-Bernard, 75005 Paris [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] midpoint rooting? how to get the outgroup?
Dear Romain, the small function below returns the out group. I define he outgroup as the clade with less taxa from the root. If both clades have the same number of taxa in it one clades is chosen randomly. library(phangorn) getOutgroup - function(tree){ tree = midpoint(tree) # you probably have done this beforehand tmp = Descendants(tree) kids = Children(tree, getRoot(tree)) tmp = tmp[kids] x = sapply(tmp, length) tree$tip.label[tmp[[which.min(x)]]] } # example trees = rmtree(10, 5) outGroups - lapply(trees, getOutgroup) res = sapply(outGroups , function(x, y)any(y %in% x), y=t1) sum(res) # number of outgroups which contain t1 which(res) # which trees have t1 in the outgroup which(res (sapply(outGroups, length)==1)) # which trees have only t1 in the outgroup trees = lapply(trees, midpoint) class(trees) = multiPhylo plot(trees) Cheers, Klaus PS: midpoint is now part of the phangorn package and much faster for larger trees. On Thu, Oct 23, 2014 at 5:57 AM, romain robl...@paca.inra.fr wrote: Dear Klauss, Following the topic below, I was wondering if there is function to get the name of the outgroup after having rooted the tree with the midpoint function. I can plot the tree and just check it by eyes. However I got 1000 trees and I want to count how many times on particular species is placed as outgroup. Thank you for your help. Romain Blanc-Mathieu PhD INRA-CNRS-UNS Agrobiotech sophia Antipolis 06160 Antibes FRANCE Dear Robin, here is a function, that does midpoint rooting, you have to attach the phangorn package. It is likely to appear in one of the next ape versions. Cheers, Klaus midpoint - function(tree){ dm = cophenetic(tree) tree = unroot(tree) rn = max(tree$edge)+1 maxdm = max(dm) ind = which(dm==maxdm,arr=TRUE)[1,] tmproot = Ancestors(tree, ind[1], parent) tree = phangorn:::reroot(tree, tmproot) edge = tree$edge el = tree$edge.length children = tree$edge[,2] left = match(ind[1], children) tmp = Ancestors(tree, ind[2], all) tmp= c(ind[2], tmp[-length(tmp)]) right = match(tmp, children) if(el[left]= (maxdm/2)){ edge = rbind(edge, c(rn, ind[1])) edge[left,2] = rn el[left] = el[left] - (maxdm/2) el = c(el, maxdm/2) } else{ sel = cumsum(el[right]) i = which(sel(maxdm/2))[1] edge = rbind(edge, c(rn, tmp[i])) edge[right[i],2] = rn eltmp = sel[i] - (maxdm/2) #el = c(el, sel[i] - (maxdm/2)) el = c(el, el[right[i]] - eltmp) el[right[i]] = eltmp } tree$edge.length = el tree$edge=edge tree$Nnode = tree$Nnode+1 phangorn:::reorderPruning(phangorn:::reroot(tree, rn)) } On 9/2/10, Velzen, Robin van Robin.vanVelzen at wur.nl https://stat.ethz.ch/mailman/listinfo/r-sig-phylo wrote: / Dear Ape authors and mailing list members, // // I am new to R. and to Ape and very much impressed by the functionality the // platform offers. I have used Ape to construct distance-based trees using the // 'read.dna', 'dist.dna' and 'nj' functions. Now I wish to root the resulting // Neighbor Joining tree using midpoint rooting, but have not been able to find // the right function. // // Does anyone know if there is a function for midpoint rooting in Ape or // similar R. package, or if there is a smart way to define a midpoint root // outgroup that can be used with the 'root' function in Ape? // // Any help or suggestion will be much appreciated! // // Thanks, // // Robin // // Robin van Velzen // PhD student // Biosystematics Group // Wageningen University // // Wageningen Campus, Radix building 107, Room W4.Aa.095 // Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands // PO Box 647, 6700 AP Wageningen, The Netherlands // Tel. +31 (0)317 483425 // http://www.bis.wur.nl // // // // [[alternative HTML version deleted]] // // ___ // R-sig-phylo mailing list // R-sig-phylo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo // https://stat.ethz.ch/mailman/listinfo/r-sig-phylo // / -- Klaus Schliep Universit� Paris 6 (Pierre et Marie Curie) 9, Quai Saint-Bernard, 75005 Paris [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ -- Klaus Schliep Postdoctoral Fellow Revell Lab, University of Massachusetts Boston [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at