Dear R-sig-phylo, Over the weekend, I asked Liam Revell if he had a solution to use matchNodes for a particular problem I’m trying to solve—finding all phylogenetically equivalent nodes when comparing trees that have uneven taxon samples and different topologies. Liam was kind enough to take some time to write a blog post about this, and got me started with some code
http://blog.phytools.org/2021/02/on-matching-nodes-between-trees-using.html On it’s face this seems like a simple problem, but I’m running into some issues and thought I would reach out to the broader group. The code linked above seems to work, but only for comparing trees that start out as topologically identical. For my purposes, I’m trying to match nodes from a given a reference, to nodes in and across several hundred gene trees that differ in topology and taxon sample relative to the reference. Here is a function definition based on Liam’s example #function to match nodes from consensus #to individual gene trees with uneven sampling #derived from Liam Revell's example-- need to testmatch_phylo_nodes<-function(t1, t2){ ## step one drop tips t1p<-drop.tip(t1,setdiff(t1$tip.label, t2$tip.label)) t2p<-drop.tip(t2,setdiff(t2 $tip.label, t1$tip.label)) ## step two match nodes "descendants" M<-matchNodes(t1p,t2p) ## step two match nodes "distances" M1<-matchNodes(t1,t1p,"distances") M2<-matchNodes(t2,t2p,"distances") ## final step, reconcile MM<-matrix(NA,t1$Nnode,2,dimnames=list(NULL,c("left","right"))) for(i in 1:nrow(MM)){ MM[i,1]<-M1[i,1] nn<-M[which(M[,1]==M1[i,2]),2] if(length(nn)>0){ MM[i,2]<-M2[which(M2[,2]==nn),1] } } return(MM) } When t1 and t2 are trees that have topological conflicts, this function returns an error: Error in MM[i, 2] <- M2[which(M2[, 2] == nn), 1] : replacement has length zero I think(?) this happens because a particular node doesn’t exist in one or the other trees, and it returns integer(0) at that line — but I’m not sure I really understand what is going on here. I modified Liam’s code slightly to get it to run without error in the described case, by making it conditional on that particular line: Modified version #function to match nodes from consensus #to individual gene trees with uneven sampling #derived from Liam Revell's example-- need to test match_phylo_nodes<-function(t1, t2){ ## step one drop tips t1p<-drop.tip(t1,setdiff(t1$tip.label, t2$tip.label)) t2p<-drop.tip(t2,setdiff(t2 $tip.label, t1$tip.label)) ## step two match nodes "descendants" M<-matchNodes(t1p,t2p) ## step two match nodes "distances" M1<-matchNodes(t1,t1p,"distances") M2<-matchNodes(t2,t2p,"distances") ## final step, reconcile MM<-matrix(NA,t1$Nnode,2,dimnames=list(NULL,c("left","right"))) for(i in 1:nrow(MM)){ MM[i,1]<-M1[i,1] nn<-M[which(M[,1]==M1[i,2]),2] if(length(nn)>0){ if(length(which(M2[,2]==nn))>0){ MM[i,2]<-M2[which(M2[,2]==nn),1] } } else { } } return(MM) } I’ve been experimenting with this and some downstream code for the last few days, but I’ve run into some weird inconsistent results (not easily summarized) that make me think that this function is not working as intended. I was wondering — have any of you dealt with a similar problem? In principle this seems like it should be similar to concordance analysis, but I care less about identifying the proportion of nodes that exist in gene trees given a reference, and instead I need the actual node numbers in a given gene tree that are phylogenetically equivalent to particular nodes in a reference. Happy to try to hack away at something… Best, Jake Berv [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/