Hello all, Following up on discussion from Monday, I've been trying to figure out how the tree I shared broke ape's rules for 'phylo' objects. It turns out, it doesn't, really (other than having single nodes), at least not as defined by checkValidPhylo or as described in the formatTree PDF.
############################################### library(ape) source(paste0( "https://raw.githubusercontent.com/emmanuelparadis/checkValidPhylo", "/master/checkValidPhylo.R") load(url("http://webpages.sdsmt.edu/~dbapst/weirdTree_06-12-15.txt")) tree1<-collapse.singles(tree) # with the original tree checkValidPhylo(tree) # singletons and nodes of degree 2... because 'singletons' # we can check directly: tabulate(tree$edge)[Ntip(tree) + 2:Nnode(tree)] # with the collapse.singles() processed tree checkValidPhylo(tree1) # this because the root has been mis-numbered ######################################## So what is broken? Why is collapse.singles returning the root ID in the wrong spot? I decided to investigate with some toy examples. Was it just having a single node at the root? ###################################### #is it that the root is a 'singleton node'? tabulate(tree$edge)[Ntip(tree) + 1] #let's test with a simple example phy<-rtree(10) phy$Nnode<-phy$Nnode+1L phy$edge[phy$edge>10]<-1+phy$edge[phy$edge>10] phy$edge<-rbind(phy$edge,c(11,12)) storage.mode(phy$edge)<-"integer" checkValidPhylo(phy) #let's try collapsing nodes phy1<-collapse.singles(phy) phy1$edge tabulate(phy1$edge) checkValidPhylo(phy1) ########################################## Nope, that wasn't it. ########################################## # It's not that the tree has a singleton node # let's check out the root edge tree$edge[tree$edge[,1]==Ntip(tree)+1,] #the root is connected to a node with a *non-consecutive* node ID #so let's test THAT with a simple example phy<-rtree(10) phy$Nnode<-phy$Nnode+1L phy$edge[phy$edge>10]<-1+phy$edge[phy$edge>10] #let's jumble the node IDs a little phy$edge[phy$edge==12]<-0 phy$edge[phy$edge==13]<-12 phy$edge[phy$edge==0]<-13 phy$edge<-rbind(phy$edge,c(11,13)) storage.mode(phy$edge)<-"integer" checkValidPhylo(phy) #let's try collapsing nodes phy1<-collapse.singles(phy) phy1$edge tabulate(phy1$edge) #we can see the root is now not at Ntip+1 ! checkValidPhylo(phy1) ############################################### So, the problem with the tree file I've provided is that the single root node wasn't attached ultimate to a node numbered Ntip+2. As far as I can tell, this is not a required aspect of the phylo format. collapse.singles() apparently collapses nodes but doesn't bother to check if the new root is going to be Ntip+1; it just assumes the root will be connected to sequentially numbered nodes. I got the source from the cran mirror repos on github (I'm not certain how to access the raw source scripts from http://ape-package.ird.fr/) and added a short segment at the end that switches the root node back to Ntip+1 if it isn't numbered that already. ############################################# #okay, we figured out the problem #can we fix it? collapse.singles <- function(tree){ elen <- tree$edge.length xmat <- tree$edge ## added by Elizabeth Purdom (2008-06-19): node.lab <- tree$node.label nnode <- tree$Nnode ntip <- length(tree$tip.label) ## end singles <- NA while (length(singles) > 0) { ## changed by EP to make it slightly more efficient: ## tx <- table(xmat[xmat < 0]) ## singles <- as.numeric(names(tx)[tx < 3]) tx <- tabulate(xmat[, 1]) singles <- which(tx == 1) ## END if (length(singles) > 0) { i <- singles[1] prev.node <- which(xmat[, 2] == i) next.node <- which(xmat[, 1] == i) xmat[prev.node, 2] <- xmat[next.node, 2] xmat <- xmat[xmat[, 1] != i, ] # drop ## changed by EP for the new coding of "phylo" (2006-10-05): ## xmat[xmat < i] <- xmat[xmat < i] + 1 ## adjust indices xmat[xmat > i] <- xmat[xmat > i] - 1L ## adjust indices # changed '1' by '1L' (2010-07-23) ## END elen[prev.node] <- elen[prev.node] + elen[next.node] ## added by Elizabeth Purdom (2008-06-19): if (!is.null(node.lab)) node.lab <- node.lab[-c(i - ntip)] nnode <- nnode - 1L ## end elen <- elen[-next.node] } } #*# #check xmat that root is Ntip+1 (DW Bapst, 06-17-15) rootID<-unique(xmat[,1])[ sapply(unique(xmat[,1]), function(x) (sum(x==xmat[,2])==0))] if(length(rootID)>1){ stop("More than one apparent root in $edge matrix")} expRootID<-length(tree$tip.label)+1 if(rootID!=expRootID){ xmat[xmat==rootID]<-0 xmat[xmat==expRootID]<-rootID xmat[xmat==0]<-expRootID storage.mode(xmat)<-"integer" } #*# tree$edge <- xmat tree$edge.length <- elen ## added by Elizabeth Purdom (2008-06-19): tree$node.label <- node.lab tree$Nnode <- nnode ## end tree } ############################## I've marked my code addition with #*#. Let's see if it fixes our issue: ############################# #test simulated example #so let's test THAT with a simple example phy<-rtree(10) phy$Nnode<-phy$Nnode+1L phy$edge[phy$edge>10]<-1+phy$edge[phy$edge>10] #let's jumble the node IDs a little phy$edge[phy$edge==12]<-0 phy$edge[phy$edge==13]<-12 phy$edge[phy$edge==0]<-13 phy$edge<-rbind(phy$edge,c(11,13)) storage.mode(phy$edge)<-"integer" checkValidPhylo(phy) #let's try collapsing nodes phy1<-collapse.singles(phy) phy1$edge tabulate(phy1$edge) checkValidPhylo(phy1) #it worked! #let's try empirical example tree1<-collapse.singles(tree) # with the collapse.singles() processed tree checkValidPhylo(tree1) ################################## Emmanuel, is there any chance you could implement this fix for collapse.singles (or a similar fix; my code probably isn't very efficient) in ape? -Dave On Mon, Jun 15, 2015 at 9:29 AM, Emmanuel Paradis <emmanuel.para...@ird.fr> wrote: > Hi David, > > collapse.singles() seems to work correctly if the "phylo" object is > correctly conformed. Some features of this class may seem annoying (and > Klaus and I alredy discussed about this), but these help for other things. > As a reminder the class "phylo" is defined here: > > http://ape-package.ird.fr/misc/FormatTreeR_24Oct2012.pdf > > Recently, I put a function on github to help code writters: > > https://github.com/emmanuelparadis/checkValidPhylo > > This could help you to detect problems that would be tough to find > otherwise. > > Cheers, > > Emmanuel > > > Le 12/06/2015 22:41, David Bapst a écrit : >> >> Hi Klaus (and others), >> >> Ah, I see! The real bug then appears to be in collapse.singles, as it >> does not reorder the ID numbers in $edge. Here's a quick function to >> return the root ID: >> >> getRootID<-function(tree){ >> uniqueNode<-unique(tree$edge[,1]) >> whichRoot<-sapply(uniqueNode,function(x) >> (sum(x==tree$edge[,2])==0)) >> rootID<-uniqueNode[whichRoot] >> return(rootID) >> } >> >> And if we run it... >> >>> getRootID(tree) #original tree >> >> [1] 151 >>> >>> getRootID(tree1) #after collapse.singles >> >> [1] 151 >>> >>> getRootID(tree2) #after reorder.phylo >> >> [1] 131 >> >> And we can see that although the number of nodes certainly changed >> when collapse.singles was run, it didn't change the root node's ID. >> >> So I guess this is really a collapse.singles bug report. Thanks for >> the insight, Klaus! >> >> -Dave >> >> On Fri, Jun 12, 2015 at 2:27 PM, Klaus Schliep <klaus.schl...@gmail.com> >> wrote: >>> >>> Hi David, >>> >>> I found the bug. Somehow ape assumes on the one hand that the root is >>> number >>> of tips + 1. >>> On the other hand the root of an phylo object is the node which is in >>> tree$edge[,1], but not in tree$edge[,2], >>> this is the case even in an unrooted tree. This annoys me for a long >>> time. >>> >>> The root of tree1 with the definition above is actually node 151 and not >>> 131. Changing these solves your problem: >>> Here is a small function to do this for you: >>> >>> switch.nodes<- function(tree, a, b){ >>> ntips = length(tree$tip.label) >>> tree$edge[tree$edge==a] = 0L >>> tree$edge[tree$edge==b] = as.integer(a) >>> tree$edge[tree$edge==0L] = as.integer(b) >>> if(!is.null(tree$node.label)){ >>> tmp<-tree$node.label[a-ntips] >>> tree$node.label[a-ntips] = tree$node.label[b-ntips] >>> tree$node.label[b-ntips] = tmp >>> } >>> tree >>> } >>> >>> library(phangorn) >>> attr(tree1, "order") = NULL >>> getRoot(tree1) >>> >>> tree2 = switch.nodes(tree1, 131, 151) >>> plot(tree2) # now works for me >>> >>> Cheers and have a nice weekend, >>> Klaus >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Jun 12, 2015 at 2:53 PM, David Bapst <dwba...@gmail.com> wrote: >>>> >>>> >>>> Hello all, >>>> >>>> As those of you who directly manipulate the guts of phylo objects in >>>> your code (or construct new phylo objects whole cloth from >>>> un-phylo-like data structures) have probably experienced, it is >>>> sometimes easy to create $edge matrices that aren't accepted by ape >>>> functions (I often use plot.phylo as my litmus for this). >>>> >>>> When this occurs, various things can be done; I usually do the >>>> following sequence: >>>> >>>> collapse.singles() >>>> reorder.phylo(,"cladewise") >>>> read.tree(text=write.tree(,file=NULL)) >>>> >>>> ...or something along those lines. >>>> >>>> However, I've run into an issue where reorder.phylo() returns what is >>>> essentially a scrap of the original $edge matrix, without warning. >>>> Very worrisome! I'm using R version 3.2.0 and ape 3.3. Here's, my >>>> script, including a call to a saved object on my website, so that the >>>> error can be reproduced: >>>> >>>> (Why do I have an .Rdata object listed with a .txt extension, you >>>> might wonder? Because my school apparently sanitizes file extensions >>>> on our hosted websites that it thinks are executables, archives or >>>> doesn't recognize, but doesn't bother to check file innards when it >>>> does recognize the extensions. Its easily hacked, at least.) >>>> >>>> ################# >>>> >>>> library(ape) >>>> >>>> load(url("http://webpages.sdsmt.edu/~dbapst/weirdTree_06-12-15.txt")) >>>> >>>> #can I plot it? >>>> plot(tree) >>>> #nope >>>> >>>> tree1<-collapse.singles(tree) >>>> >>>> #any single-child nodes left? >>>> sum(sapply(unique(tree1$edge[,1]),function(x) >>>> sum(x==tree1$edge[,1])==1)) >>>> #nope >>>> >>>> #can I plot it? >>>> plot(tree1) >>>> #nope >>>> >>>> #now reorder >>>> tree2<-reorder.phylo(tree1,"cladewise") >>>> tree2$edge >>>> >>>> #now reorder with postorder >>>> tree2<-reorder.phylo(tree1,"postorder") >>>> tree2$edge >>>> >>>> ################# >>>> >>>> As we can see, reorder.phylo with various ordering methods returns an >>>> edge matrix with just six rows, from a tree that originally had >>>> hundreds of edges. Most worrisome, it does this without any error >>>> message. In this particular case, I retain the tree correctly if I >>>> skip reorder.phylo and use the read.tree(write.tree) trick, but this >>>> behavior of reorder.phylo is worrying. >>>> >>>> Anyone have a clue what's going on here? If I am perhaps misusing >>>> reorder.phylo(), is there an alternative approach for use in cleaning >>>> phylo objects? >>>> >>>> Cheers, >>>> -Dave >>>> >>>> -- >>>> David W. Bapst, PhD >>>> Adjunct Asst. Professor, Geology and Geol. Eng. >>>> South Dakota School of Mines and Technology >>>> 501 E. St. Joseph >>>> Rapid City, SD 57701 >>>> >>>> http://webpages.sdsmt.edu/~dbapst/ >>>> http://cran.r-project.org/web/packages/paleotree/index.html >>>> >>>> _______________________________________________ >>>> R-sig-phylo mailing list - R-sig-phylo@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >>>> Searchable archive at >>>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ >>> >>> >>> >>> >>> >>> -- >>> Klaus Schliep >>> Postdoctoral Fellow >>> Revell Lab, University of Massachusetts Boston >>> >> >> >> > -- David W. Bapst, PhD Adjunct Asst. Professor, Geology and Geol. Eng. South Dakota School of Mines and Technology 501 E. St. Joseph Rapid City, SD 57701 http://webpages.sdsmt.edu/~dbapst/ http://cran.r-project.org/web/packages/paleotree/index.html _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/