Hi Gustavo & David,
I attached a file that contains a function timeSliceTree2, which is a
replacement for timeSliceTree.
I replaced
dist.nodes(tree)[, Ntip(tree) + 1]
with
node.depth.edgelength(tree)
dist.nodes computes a matrix (roughly nTips^2 * 4 * 8 bytes which can get
very large for many taxa)
whereas node.depth.edgelength just the vector the function needs. This
caused the memory problem.
I also simplified a unnecessary sapply and put prop.part out of a loop.
So it needs less memory and is also much faster for large trees. The
results should be exactly the same.
Cheers,
Klaus
> source("timeSliceTree.R")
> library(paleotree)
> set.seed(123)
> tree = rtree(2000)
system.time(tree1 <-
timeSliceTree(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
user system elapsed
7.416 0.033 7.454
> system.time(tree2 <-
timeSliceTree2(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
user system elapsed
0.147 0.003 0.151
On Tue, Oct 20, 2015 at 11:49 AM, Gustavo Burin Ferreira <
[email protected]> wrote:
> Hey David and Nick,
>
> thanks a lot for the quick responses! I think I wasn't very clear in the
> first e-mail. What I get is actually an error from within dist.nodes, not
> when calling it.
>
> I've tried to use chainsaw2 and in the beginning it appeared to be working
> quite well. However after some running time, I get the same (original)
> error that motivated me writing to the list:
>
>
> > *Error in double(nm * nm) : vector size cannot be NA*
> > *In addition: Warning message:**In nm * nm : NAs produced by integer
> > overflow*
>
>
> Digging into the functions called within chainsaw2, I found that at some
> point it uses the function get_max_height_tree, that calls dist.nodes and
> that's where I think the problem lies. The error I got now is almost
> exactly the same as I got from timeSliceTree (because both cases use
> dist.nodes):
>
>
> > *dist.nodes*
> > *function (x) *
> > *{*
> > * x <- reorder(x)*
> > * n <- Ntip(x)*
> > * m <- x$Nnode*
> > * nm <- n + m*
> > * d <- .C(dist_nodes, as.integer(n), as.integer(m),
> > as.integer(x$edge[, *
> > * 1] - 1L), as.integer(x$edge[, 2] - 1L),
> > as.double(x$edge.length), *
> > * as.integer(Nedge(x)), double(nm * nm), NAOK = TRUE)[[7]]*
> > * dim(d) <- c(nm, nm)*
> > * dimnames(d) <- list(1:nm, 1:nm)*
> > * d**}*
>
>
> I tried changing the highlighted part to something like
> double(as.numeric(nm) * as.numeric(nm)), and when I try running it, I get
> the error I wrote on the first e-mail:
>
>
> > *Error in dist.nodes(tree) (from #7) : ** long vectors (argument 7) are
> > not supported in .Fortran*
>
>
> Thus, I think that to solve this problem some tweak in the C/Fortran code
> that is called within dist.nodes (from ape) might be required, but I have
> no expertise on that. So if someone can help me with that, I'll appreciate
> it!
>
> Thanks again for the help so far!
>
> Best,
>
>
> *Gustavo Burin Ferreira, **Msc.*
> Instituto de Biociências
> Universidade de São Paulo
> Tel: (11) 98525-8948
>
> On Fri, Oct 16, 2015 at 5:06 PM, Nick Matzke <[email protected]> wrote:
>
> > Hi! I re-did chainsaw at some point, now there is chainsaw2. However,
> > googling that gets you horror movies, so here is a link with example
> code:
> >
> > https://groups.google.com/d/msg/biogeobears/Jy9uYckOL7s/XuNZ0B3jAwAJ
> >
> > (the discussion there points out a rare case where this crashes, but for
> > most trees it should work fine)
> >
> > Cheers, Nick
> >
> > On Fri, Oct 16, 2015 at 2:17 PM, David Bapst <[email protected]> wrote:
> >
> > > Hi Gustavo,
> > >
> > > I'm paleotree's author and maintainer. Just to be clear that I
> > > understand your problem, I believe you are saying that when you use
> > > timeSliceTree, you are getting an error that the internal call to
> > > dist.nodes is failing? Is that right?
> > >
> > > The first thought I have is that maybe the solution here is to avoid
> > > dist.nodes, as it is somewhat overkill. I use dist.nodes in that code,
> > > which I wrote in 2011, to get the distance of tips and nodes from the
> > > root. A better solution may now exist in another R package. I'd have
> > > to investigate (although maybe someone on the list can suggest one).
> > >
> > > The second thought I have is that there might be alternative functions
> > > that do something lie timeSliceTree in another R package. Off the top
> > > of my head, I recall that Nick Matzke had a similar, 'chainsaw'
> > > function, which you can find here and appears not to call dist.nodes:
> > >
> > > https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/001483.html
> > >
> > > Again, maybe someone on the list knows of a good alternative function.
> > >
> > > I'll try to give this more thought, but for now, maybe see if you can
> > > use Nick's function succesfully. Overall though, I've discovered the
> > > use of truly gigantic trees can often run into unexpected problems.
> > >
> > > Cheers,
> > > -Dave
> > >
> > >
> > >
> > > On Fri, Oct 16, 2015 at 12:47 PM, Gustavo Burin Ferreira
> > > <[email protected]> wrote:
> > > > Dear list,
> > > >
> > > > I'm trying to perform a time travel in simulated phylogenies with
> both
> > > > extant and extinct species using the timeSliceTree function form the
> > > > paleotree package. My aim is to have the molecular phylogenies
> derived
> > > from
> > > > the complete phylogeny (attached) in different points in time.
> > > >
> > > > However, when I try that with big trees (bigger than 20000 tips
> > total), I
> > > > get an error of integer overflow coming from the dist.nodes function.
> > > After
> > > > slightly tweaking the dist.nodes function (changing nm from integer
> to
> > > > numeric/double), I get the following message:
> > > >
> > > > Error in dist.nodes(tree) (from #7) :
> > > > long vectors (argument 7) are not supported in .Fortran
> > > >
> > > > Since I don't know much about C or Fortran, I couldn't find a way of
> > > solving
> > > > this by myself, so any help will be greatly appreciated.
> > > >
> > > > I'm sending one tree attached for example.
> > > >
> > > > Thank you very much in advance!
> > > >
> > > > Best,
> > > >
> > > > Gustavo Burin Ferreira, Msc.
> > > > Instituto de Biociências
> > > > Universidade de São Paulo
> > > > Tel: (11) 98525-8948
> > > >
> > > > _______________________________________________
> > > > R-sig-phylo mailing list - [email protected]
> > > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> > > > Searchable archive at
> > > http://www.mail-archive.com/[email protected]/
> > >
> > >
> > >
> > > --
> > > David W. Bapst, PhD
> > > Adjunct Asst. Professor, Geology and Geol. Eng.
> > > South Dakota School of Mines and Technology
> > > 501 E. St. Joseph
> > > Rapid City, SD 57701
> > >
> > > http://webpages.sdsmt.edu/~dbapst/
> > > http://cran.r-project.org/web/packages/paleotree/index.html
> > >
> > > _______________________________________________
> > > R-sig-phylo mailing list - [email protected]
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> > > Searchable archive at
> > > http://www.mail-archive.com/[email protected]/
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-phylo mailing list - [email protected]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> > Searchable archive at
> > http://www.mail-archive.com/[email protected]/
> >
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-phylo mailing list - [email protected]
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/[email protected]/
>
--
Klaus Schliep
Postdoctoral Fellow
Revell Lab, University of Massachusetts Boston
timeSliceTree2 <- function (ttree, sliceTime, drop.extinct = FALSE, plot = TRUE)
{
if (!inherits(ttree, "phylo")) {
stop("ttree is not of class phylo")
}
if (is.null(ttree$root.time)) {
ttree$root.time <- max(node.depth.edgelength(ttree)[1:Ntip(ttree)])
message("Warning: no ttree$root.time! Assuming latest tip is at present (time=0)")
}
tslice <- ttree$root.time - sliceTime
dnode <- node.depth.edgelength(ttree)
cedge <- which((dnode[ ttree$edge[, 1] ] < tslice) & (dnode[ttree$edge[, 2] ] >= tslice))
droppers <- numeric()
pp <- prop.part(ttree)
for (i in 1:length(cedge)) {
desc <- ttree$edge[cedge[i], 2]
if (desc > Ntip(ttree)) {
desctip <- pp[[desc - Ntip(ttree)]]
droppers <- c(droppers, desctip[-1])
}
}
stree <- drop.tip(ttree, droppers)
dnode <- node.depth.edgelength(stree)
cedge <- sapply(1:Nedge(stree), function(x) any(stree$edge[x,
2] == which(dnode >= tslice)))
cnode_depth <- dnode[stree$edge[cedge, 1]]
stree$edge.length[cedge] <- tslice - cnode_depth
stree$root.time <- ttree$root.time
if (drop.extinct) {
stree1 <- dropExtinct(stree, ignore.root.time = TRUE)
}
else {
stree1 <- stree
}
if (plot) {
layout(1:2)
plot(ladderize(ttree), show.tip.label = FALSE)
axisPhylo()
plot(ladderize(stree1), show.tip.label = FALSE)
axisPhylo()
layout(1)
}
return(stree1)
}
_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/