Hello all,

Following up on discussion from Monday, I've been trying to figure out
how the tree I shared broke ape's rules for 'phylo' objects. It turns
out, it doesn't, really (other than having single nodes), at least not
as defined by checkValidPhylo or as described in the formatTree PDF.

###############################################

library(ape)
source(paste0(
    "https://raw.githubusercontent.com/emmanuelparadis/checkValidPhylo";,
    "/master/checkValidPhylo.R")

load(url("http://webpages.sdsmt.edu/~dbapst/weirdTree_06-12-15.txt";))

tree1<-collapse.singles(tree)

# with the original tree
checkValidPhylo(tree)
# singletons and nodes of degree 2... because 'singletons'

# we can check directly:
tabulate(tree$edge)[Ntip(tree) + 2:Nnode(tree)]

# with the collapse.singles() processed tree
checkValidPhylo(tree1)
# this because the root has been mis-numbered

########################################

So what is broken? Why is collapse.singles returning the root ID in
the wrong spot? I decided to investigate with some toy examples. Was
it just having a single node at the root?

######################################

#is it that the root is a 'singleton node'?
tabulate(tree$edge)[Ntip(tree) + 1]

#let's test with a simple example
phy<-rtree(10)
phy$Nnode<-phy$Nnode+1L
phy$edge[phy$edge>10]<-1+phy$edge[phy$edge>10]
phy$edge<-rbind(phy$edge,c(11,12))
storage.mode(phy$edge)<-"integer"
checkValidPhylo(phy)
#let's try collapsing nodes
phy1<-collapse.singles(phy)
phy1$edge
tabulate(phy1$edge)
checkValidPhylo(phy1)

##########################################

Nope, that wasn't it.

##########################################

# It's not that the tree has a singleton node
    # let's check out the root edge
tree$edge[tree$edge[,1]==Ntip(tree)+1,]
#the root is connected to a node with a *non-consecutive* node ID

#so let's test THAT with a simple example
phy<-rtree(10)
phy$Nnode<-phy$Nnode+1L
phy$edge[phy$edge>10]<-1+phy$edge[phy$edge>10]
#let's jumble the node IDs a little
phy$edge[phy$edge==12]<-0
phy$edge[phy$edge==13]<-12
phy$edge[phy$edge==0]<-13
phy$edge<-rbind(phy$edge,c(11,13))
storage.mode(phy$edge)<-"integer"
checkValidPhylo(phy)
#let's try collapsing nodes
phy1<-collapse.singles(phy)
phy1$edge
tabulate(phy1$edge)
#we can see the root is now not at Ntip+1 !
checkValidPhylo(phy1)

###############################################

So, the problem with the tree file I've provided is that the single
root node wasn't attached ultimate to a node numbered Ntip+2. As far
as I can tell, this is not a required aspect of the phylo format.
collapse.singles() apparently collapses nodes but doesn't bother to
check if the new root is going to be Ntip+1; it just assumes the root
will be connected to sequentially numbered nodes.

I got the source from the cran mirror repos on github (I'm not certain
how to access the raw source scripts from http://ape-package.ird.fr/)
and added a short segment at the end that switches the root node back
to Ntip+1 if it isn't numbered that already.

#############################################

#okay, we figured out the problem
    #can we fix it?

collapse.singles <- function(tree){
    elen <- tree$edge.length
    xmat <- tree$edge
    ## added by Elizabeth Purdom (2008-06-19):
    node.lab <- tree$node.label
    nnode <- tree$Nnode
    ntip <- length(tree$tip.label)
    ## end
    singles <- NA
    while (length(singles) > 0) {
        ## changed by EP to make it slightly more efficient:
        ## tx <- table(xmat[xmat < 0])
        ## singles <- as.numeric(names(tx)[tx < 3])
        tx <- tabulate(xmat[, 1])
        singles <- which(tx == 1)
        ## END
        if (length(singles) > 0) {
            i <- singles[1]
            prev.node <- which(xmat[, 2] == i)
            next.node <- which(xmat[, 1] == i)
            xmat[prev.node, 2] <- xmat[next.node, 2]
            xmat <- xmat[xmat[, 1] != i, ] # drop
            ## changed by EP for the new coding of "phylo" (2006-10-05):
            ## xmat[xmat < i] <- xmat[xmat < i] + 1 ## adjust indices
            xmat[xmat > i] <- xmat[xmat > i] - 1L ## adjust indices #
changed '1' by '1L' (2010-07-23)
            ## END
            elen[prev.node] <- elen[prev.node] + elen[next.node]
            ## added by Elizabeth Purdom (2008-06-19):
            if (!is.null(node.lab)) node.lab <- node.lab[-c(i - ntip)]
            nnode <- nnode - 1L
            ## end
            elen <- elen[-next.node]
        }
    }

#*#

    #check xmat that root is Ntip+1 (DW Bapst, 06-17-15)
    rootID<-unique(xmat[,1])[
        sapply(unique(xmat[,1]), function(x)
            (sum(x==xmat[,2])==0))]
    if(length(rootID)>1){
        stop("More than one apparent root in $edge matrix")}
    expRootID<-length(tree$tip.label)+1
    if(rootID!=expRootID){
        xmat[xmat==rootID]<-0
        xmat[xmat==expRootID]<-rootID
        xmat[xmat==0]<-expRootID
        storage.mode(xmat)<-"integer"
    }

#*#

    tree$edge <- xmat
    tree$edge.length <- elen
    ## added by Elizabeth Purdom (2008-06-19):
    tree$node.label <- node.lab
    tree$Nnode <- nnode
    ## end
    tree
}

##############################

I've marked my code addition with #*#. Let's see if it fixes our issue:

#############################

#test simulated example
#so let's test THAT with a simple example
phy<-rtree(10)
phy$Nnode<-phy$Nnode+1L
phy$edge[phy$edge>10]<-1+phy$edge[phy$edge>10]
#let's jumble the node IDs a little
phy$edge[phy$edge==12]<-0
phy$edge[phy$edge==13]<-12
phy$edge[phy$edge==0]<-13
phy$edge<-rbind(phy$edge,c(11,13))
storage.mode(phy$edge)<-"integer"
checkValidPhylo(phy)
#let's try collapsing nodes
phy1<-collapse.singles(phy)
phy1$edge
tabulate(phy1$edge)
checkValidPhylo(phy1)
#it worked!

#let's try empirical example
tree1<-collapse.singles(tree)
# with the collapse.singles() processed tree
checkValidPhylo(tree1)

##################################

Emmanuel, is there any chance you could implement this fix for
collapse.singles (or a similar fix; my code probably isn't very
efficient) in ape?

-Dave



On Mon, Jun 15, 2015 at 9:29 AM, Emmanuel Paradis
<emmanuel.para...@ird.fr> wrote:
> Hi David,
>
> collapse.singles() seems to work correctly if the "phylo" object is
> correctly conformed. Some features of this class may seem annoying (and
> Klaus and I alredy discussed about this), but these help for other things.
> As a reminder the class "phylo" is defined here:
>
> http://ape-package.ird.fr/misc/FormatTreeR_24Oct2012.pdf
>
> Recently, I put a function on github to help code writters:
>
> https://github.com/emmanuelparadis/checkValidPhylo
>
> This could help you to detect problems that would be tough to find
> otherwise.
>
> Cheers,
>
> Emmanuel
>
>
> Le 12/06/2015 22:41, David Bapst a écrit :
>>
>> Hi Klaus (and others),
>>
>> Ah, I see! The real bug then appears to be in collapse.singles, as it
>> does not reorder the ID numbers in $edge. Here's a quick function to
>> return the root ID:
>>
>> getRootID<-function(tree){
>>      uniqueNode<-unique(tree$edge[,1])
>>      whichRoot<-sapply(uniqueNode,function(x)
>>          (sum(x==tree$edge[,2])==0))
>>      rootID<-uniqueNode[whichRoot]
>>      return(rootID)
>>      }
>>
>> And if we run it...
>>
>>> getRootID(tree)     #original tree
>>
>> [1] 151
>>>
>>> getRootID(tree1)   #after collapse.singles
>>
>> [1] 151
>>>
>>> getRootID(tree2)   #after reorder.phylo
>>
>> [1] 131
>>
>> And we can see that although the number of nodes certainly changed
>> when collapse.singles was run, it didn't change the root node's ID.
>>
>> So I guess this is really a collapse.singles bug report. Thanks for
>> the insight, Klaus!
>>
>> -Dave
>>
>> On Fri, Jun 12, 2015 at 2:27 PM, Klaus Schliep <klaus.schl...@gmail.com>
>> wrote:
>>>
>>> Hi David,
>>>
>>> I found the bug. Somehow ape assumes on the one hand that the root is
>>> number
>>> of tips + 1.
>>> On the other hand the root of an phylo object is the node which is in
>>> tree$edge[,1], but not in tree$edge[,2],
>>> this is the case even in an unrooted tree. This annoys me for a long
>>> time.
>>>
>>> The root of tree1 with the definition above is actually node 151 and not
>>> 131. Changing these solves your problem:
>>> Here is a small function to do this for you:
>>>
>>> switch.nodes<- function(tree, a, b){
>>>      ntips = length(tree$tip.label)
>>>      tree$edge[tree$edge==a] = 0L
>>>      tree$edge[tree$edge==b] = as.integer(a)
>>>      tree$edge[tree$edge==0L] = as.integer(b)
>>>      if(!is.null(tree$node.label)){
>>>          tmp<-tree$node.label[a-ntips]
>>>          tree$node.label[a-ntips] = tree$node.label[b-ntips]
>>>          tree$node.label[b-ntips] = tmp
>>>      }
>>>      tree
>>> }
>>>
>>> library(phangorn)
>>> attr(tree1, "order") = NULL
>>> getRoot(tree1)
>>>
>>> tree2 = switch.nodes(tree1, 131, 151)
>>> plot(tree2) # now works for me
>>>
>>> Cheers and have a nice weekend,
>>> Klaus
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jun 12, 2015 at 2:53 PM, David Bapst <dwba...@gmail.com> wrote:
>>>>
>>>>
>>>> Hello all,
>>>>
>>>> As those of you who directly manipulate the guts of phylo objects in
>>>> your code (or construct new phylo objects whole cloth from
>>>> un-phylo-like data structures) have probably experienced, it is
>>>> sometimes easy to create $edge matrices that aren't accepted by ape
>>>> functions (I often use plot.phylo as my litmus for this).
>>>>
>>>> When this occurs, various things can be done; I usually do the
>>>> following sequence:
>>>>
>>>> collapse.singles()
>>>> reorder.phylo(,"cladewise")
>>>> read.tree(text=write.tree(,file=NULL))
>>>>
>>>> ...or something along those lines.
>>>>
>>>> However, I've run into an issue where reorder.phylo() returns what is
>>>> essentially a scrap of the original $edge matrix, without warning.
>>>> Very worrisome! I'm using R version 3.2.0 and ape 3.3. Here's, my
>>>> script, including a call to a saved object on my website, so that the
>>>> error can be reproduced:
>>>>
>>>> (Why do I have an .Rdata object listed with a .txt extension, you
>>>> might wonder? Because my school apparently sanitizes file extensions
>>>> on our hosted websites that it thinks are executables, archives or
>>>> doesn't recognize, but doesn't bother to check file innards when it
>>>> does recognize the extensions. Its easily hacked, at least.)
>>>>
>>>> #################
>>>>
>>>> library(ape)
>>>>
>>>> load(url("http://webpages.sdsmt.edu/~dbapst/weirdTree_06-12-15.txt";))
>>>>
>>>> #can I plot it?
>>>> plot(tree)
>>>> #nope
>>>>
>>>> tree1<-collapse.singles(tree)
>>>>
>>>> #any single-child nodes left?
>>>> sum(sapply(unique(tree1$edge[,1]),function(x)
>>>> sum(x==tree1$edge[,1])==1))
>>>> #nope
>>>>
>>>> #can I plot it?
>>>> plot(tree1)
>>>> #nope
>>>>
>>>> #now reorder
>>>> tree2<-reorder.phylo(tree1,"cladewise")
>>>> tree2$edge
>>>>
>>>> #now reorder with postorder
>>>> tree2<-reorder.phylo(tree1,"postorder")
>>>> tree2$edge
>>>>
>>>> #################
>>>>
>>>> As we can see, reorder.phylo with various ordering methods returns an
>>>> edge matrix with just six rows, from a tree that originally had
>>>> hundreds of edges. Most worrisome, it does this without any error
>>>> message. In this particular case, I retain the tree correctly if I
>>>> skip reorder.phylo and use the read.tree(write.tree) trick, but this
>>>> behavior of reorder.phylo is worrying.
>>>>
>>>> Anyone have a clue what's going on here? If I am perhaps misusing
>>>> reorder.phylo(), is there an alternative approach for use in cleaning
>>>> phylo objects?
>>>>
>>>> Cheers,
>>>> -Dave
>>>>
>>>> --
>>>> David W. Bapst, PhD
>>>> Adjunct Asst. Professor, Geology and Geol. Eng.
>>>> South Dakota School of Mines and Technology
>>>> 501 E. St. Joseph
>>>> Rapid City, SD 57701
>>>>
>>>> http://webpages.sdsmt.edu/~dbapst/
>>>> http://cran.r-project.org/web/packages/paleotree/index.html
>>>>
>>>> _______________________________________________
>>>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>>> Searchable archive at
>>>> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Klaus Schliep
>>> Postdoctoral Fellow
>>> Revell Lab, University of Massachusetts Boston
>>>
>>
>>
>>
>



-- 
David W. Bapst, PhD
Adjunct Asst. Professor, Geology and Geol. Eng.
South Dakota School of Mines and Technology
501 E. St. Joseph
Rapid City, SD 57701

http://webpages.sdsmt.edu/~dbapst/
http://cran.r-project.org/web/packages/paleotree/index.html

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to