Re: [R-sig-phylo] identifying phylogenetically equivalent nodes

Emmanuel Paradis Thu, 18 Feb 2021 05:56:44 -0800

Hi Jacob,

ape::makeNodeLabel(phy, method = "md5sum") returns 'phy' with node labels that 
depend on the tips descendant from each node. For instance:


tr3 <- makeNodeLabel(rtree(3), m = "m")
tr4 <- makeNodeLabel(rtree(4), m = "m")
any(tr3$node.label %in% tr4$node.label)

If you repeat these 3 commands several times, you should have ~20% of TRUE. In 
your case, match() should make more sense.

Also, I suppose your trees are rooted. If they are unrooted, you should 
consider using splits (or root them).

Best,

Emmanuel

----- Le 18 Fév 21, à 0:59, Jacob Berv jakeberv.r.sig.ph...@gmail.com a écrit :
> Dear R-sig-phylo,
> 
> Over the weekend, I asked Liam Revell if he had a solution to use matchNodes 
> for
> a particular problem I’m trying to solve—finding all phylogenetically
> equivalent nodes when comparing trees that have uneven taxon samples and
> different topologies. Liam was kind enough to take some time to write a blog
> post about this, and got me started with some code
> 
> http://blog.phytools.org/2021/02/on-matching-nodes-between-trees-using.html
> 
> On it’s face this seems like a simple problem, but I’m running into some 
> issues
> and thought I would reach out to the broader group. The code linked above 
> seems
> to work, but only for comparing trees that start out as topologically
> identical. For my purposes, I’m trying to match nodes from a given a 
> reference,
> to nodes in and across several hundred gene trees that differ in topology and
> taxon sample relative to the reference.
> 
> Here is a function definition based on Liam’s example
> 
> #function to match nodes from consensus
> #to individual gene trees with uneven sampling
> #derived from Liam Revell's example-- need to
> testmatch_phylo_nodes<-function(t1, t2){
>  ## step one drop tips
>  t1p<-drop.tip(t1,setdiff(t1$tip.label, t2$tip.label))
>  t2p<-drop.tip(t2,setdiff(t2 $tip.label, t1$tip.label))
>  
>  ## step two match nodes "descendants"
>  M<-matchNodes(t1p,t2p)
>  
>  ## step two match nodes "distances"
>  M1<-matchNodes(t1,t1p,"distances")
>  M2<-matchNodes(t2,t2p,"distances")
>  
>  ## final step, reconcile
>  MM<-matrix(NA,t1$Nnode,2,dimnames=list(NULL,c("left","right")))
>  
>  for(i in 1:nrow(MM)){
>    MM[i,1]<-M1[i,1]
>    nn<-M[which(M[,1]==M1[i,2]),2]
>    if(length(nn)>0){
>        MM[i,2]<-M2[which(M2[,2]==nn),1]
>    }
>  }
>  return(MM)
> }
> 
> 
> When t1 and t2 are trees that have topological conflicts, this function 
> returns
> an error:
> 
> Error in MM[i, 2] <- M2[which(M2[, 2] == nn), 1] :
>  replacement has length zero
> 
> I think(?) this happens because a particular node doesn’t exist in one or the
> other trees, and it returns integer(0) at that line — but I’m not sure I 
> really
> understand what is going on here.
> 
> 
> I modified Liam’s code slightly to get it to run without error in the 
> described
> case, by making it conditional on that particular line:
> 
> 
> Modified version
> 
> #function to match nodes from consensus
> #to individual gene trees with uneven sampling
> #derived from Liam Revell's example-- need to test
> match_phylo_nodes<-function(t1, t2){
>       ## step one drop tips
>       t1p<-drop.tip(t1,setdiff(t1$tip.label, t2$tip.label))
>       t2p<-drop.tip(t2,setdiff(t2 $tip.label, t1$tip.label))
> 
>       ## step two match nodes "descendants"
>       M<-matchNodes(t1p,t2p)
> 
>       ## step two match nodes "distances"
>       M1<-matchNodes(t1,t1p,"distances")
>       M2<-matchNodes(t2,t2p,"distances")
> 
>       ## final step, reconcile
>       MM<-matrix(NA,t1$Nnode,2,dimnames=list(NULL,c("left","right")))
> 
>       for(i in 1:nrow(MM)){
>               MM[i,1]<-M1[i,1]
>       nn<-M[which(M[,1]==M1[i,2]),2]
>    if(length(nn)>0){
>       if(length(which(M2[,2]==nn))>0){
>               MM[i,2]<-M2[which(M2[,2]==nn),1]
>       }
>    } else {
>    }
> }
> return(MM)
> }
> 
> 
> I’ve been experimenting with this and some downstream code for the last few
> days, but I’ve run into some weird inconsistent results (not easily 
> summarized)
> that make me think that this function is not working as intended.
> 
> I was wondering — have any of you dealt with a similar problem? In principle
> this seems like it should be similar to concordance analysis, but I care less
> about identifying the proportion of nodes that exist in gene trees given a
> reference, and instead I need the actual node numbers in a given gene tree 
> that
> are phylogenetically equivalent to particular nodes in a reference. Happy to
> try to hack away at something…
> 
> 
> Best,
> Jake Berv
> 
> 
> 
> 
> 
>       [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] identifying phylogenetically equivalent nodes

Reply via email to