Gustavo, all,
I just wanted to let you know that I've integrated Klaus's suggestions
into timeSliceTree, as well as a growing number of additional
functions in paleotree, and these changes are in the current build on
github, which appears to pass all checks as of the moment.
You can install the in-development version of paleotree on github
using the R package devtools, like so:
library(devtools)
install_github("dwbapst/paleotree")
Gustavo, if you could check and see if you encounter any issues with
the timeSliceTree in this new version, I would appreciate it.
Cheers,
-Dave
On Tue, Oct 20, 2015 at 1:39 PM, David Bapst <[email protected]> wrote:
> Thanks, Klaus. I was unaware of node.depth.edgelength; I believe I use
> dist.nodes to calculate root-to-tip distances in a number of functions
> in paleotree, so this could mean improvement potentially to a large
> number of functions. I'll have to do a global search for dist.nodes.
>
> Cheers,
> -Dave
>
> On Tue, Oct 20, 2015 at 1:32 PM, Klaus Schliep <[email protected]>
> wrote:
>> Hi Gustavo & David,
>> a tiny improvement I missed another sapply.
>> You can do the same trick replacing dist.nodes in dropExtinct if needed.
>> Klaus
>>
>> On Tue, Oct 20, 2015 at 3:21 PM, Gustavo Burin Ferreira
>> <[email protected]> wrote:
>>>
>>> Hi Klaus,
>>>
>>> the function works perfectly! Thank you very much!
>>>
>>> Best,
>>>
>>> Gustavo Burin Ferreira, Msc.
>>> Instituto de Biociências
>>> Universidade de São Paulo
>>> Tel: (11) 98525-8948
>>>
>>> On Tue, Oct 20, 2015 at 5:15 PM, Klaus Schliep <[email protected]>
>>> wrote:
>>>>
>>>> Hi Gustavo & David,
>>>>
>>>> I attached a file that contains a function timeSliceTree2, which is a
>>>> replacement for timeSliceTree.
>>>> I replaced
>>>> dist.nodes(tree)[, Ntip(tree) + 1]
>>>> with
>>>> node.depth.edgelength(tree)
>>>> dist.nodes computes a matrix (roughly nTips^2 * 4 * 8 bytes which can get
>>>> very large for many taxa)
>>>> whereas node.depth.edgelength just the vector the function needs. This
>>>> caused the memory problem.
>>>> I also simplified a unnecessary sapply and put prop.part out of a loop.
>>>> So it needs less memory and is also much faster for large trees. The
>>>> results should be exactly the same.
>>>>
>>>> Cheers,
>>>> Klaus
>>>>
>>>>
>>>>
>>>> > source("timeSliceTree.R")
>>>> > library(paleotree)
>>>> > set.seed(123)
>>>> > tree = rtree(2000)
>>>> system.time(tree1 <-
>>>> timeSliceTree(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
>>>> Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
>>>> user system elapsed
>>>> 7.416 0.033 7.454
>>>> > system.time(tree2 <-
>>>> > timeSliceTree2(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
>>>> Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
>>>> user system elapsed
>>>> 0.147 0.003 0.151
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Oct 20, 2015 at 11:49 AM, Gustavo Burin Ferreira
>>>> <[email protected]> wrote:
>>>>>
>>>>> Hey David and Nick,
>>>>>
>>>>> thanks a lot for the quick responses! I think I wasn't very clear in the
>>>>> first e-mail. What I get is actually an error from within dist.nodes,
>>>>> not
>>>>> when calling it.
>>>>>
>>>>> I've tried to use chainsaw2 and in the beginning it appeared to be
>>>>> working
>>>>> quite well. However after some running time, I get the same (original)
>>>>> error that motivated me writing to the list:
>>>>>
>>>>>
>>>>> > *Error in double(nm * nm) : vector size cannot be NA*
>>>>> > *In addition: Warning message:**In nm * nm : NAs produced by integer
>>>>> > overflow*
>>>>>
>>>>>
>>>>> Digging into the functions called within chainsaw2, I found that at some
>>>>> point it uses the function get_max_height_tree, that calls dist.nodes
>>>>> and
>>>>> that's where I think the problem lies. The error I got now is almost
>>>>> exactly the same as I got from timeSliceTree (because both cases use
>>>>> dist.nodes):
>>>>>
>>>>>
>>>>> > *dist.nodes*
>>>>> > *function (x) *
>>>>> > *{*
>>>>> > * x <- reorder(x)*
>>>>> > * n <- Ntip(x)*
>>>>> > * m <- x$Nnode*
>>>>> > * nm <- n + m*
>>>>> > * d <- .C(dist_nodes, as.integer(n), as.integer(m),
>>>>> > as.integer(x$edge[, *
>>>>> > * 1] - 1L), as.integer(x$edge[, 2] - 1L),
>>>>> > as.double(x$edge.length), *
>>>>> > * as.integer(Nedge(x)), double(nm * nm), NAOK = TRUE)[[7]]*
>>>>> > * dim(d) <- c(nm, nm)*
>>>>> > * dimnames(d) <- list(1:nm, 1:nm)*
>>>>> > * d**}*
>>>>>
>>>>>
>>>>> I tried changing the highlighted part to something like
>>>>> double(as.numeric(nm) * as.numeric(nm)), and when I try running it, I
>>>>> get
>>>>> the error I wrote on the first e-mail:
>>>>>
>>>>>
>>>>> > *Error in dist.nodes(tree) (from #7) : ** long vectors (argument 7)
>>>>> > are
>>>>> > not supported in .Fortran*
>>>>>
>>>>>
>>>>> Thus, I think that to solve this problem some tweak in the C/Fortran
>>>>> code
>>>>> that is called within dist.nodes (from ape) might be required, but I
>>>>> have
>>>>> no expertise on that. So if someone can help me with that, I'll
>>>>> appreciate
>>>>> it!
>>>>>
>>>>> Thanks again for the help so far!
>>>>>
>>>>> Best,
>>>>>
>>>>>
>>>>> *Gustavo Burin Ferreira, **Msc.*
>>>>>
>>>>> Instituto de Biociências
>>>>> Universidade de São Paulo
>>>>> Tel: (11) 98525-8948
>>>>>
>>>>> On Fri, Oct 16, 2015 at 5:06 PM, Nick Matzke <[email protected]> wrote:
>>>>>
>>>>> > Hi! I re-did chainsaw at some point, now there is chainsaw2.
>>>>> > However,
>>>>> > googling that gets you horror movies, so here is a link with example
>>>>> > code:
>>>>> >
>>>>> > https://groups.google.com/d/msg/biogeobears/Jy9uYckOL7s/XuNZ0B3jAwAJ
>>>>> >
>>>>> > (the discussion there points out a rare case where this crashes, but
>>>>> > for
>>>>> > most trees it should work fine)
>>>>> >
>>>>> > Cheers, Nick
>>>>> >
>>>>> > On Fri, Oct 16, 2015 at 2:17 PM, David Bapst <[email protected]>
>>>>> > wrote:
>>>>> >
>>>>> > > Hi Gustavo,
>>>>> > >
>>>>> > > I'm paleotree's author and maintainer. Just to be clear that I
>>>>> > > understand your problem, I believe you are saying that when you use
>>>>> > > timeSliceTree, you are getting an error that the internal call to
>>>>> > > dist.nodes is failing? Is that right?
>>>>> > >
>>>>> > > The first thought I have is that maybe the solution here is to avoid
>>>>> > > dist.nodes, as it is somewhat overkill. I use dist.nodes in that
>>>>> > > code,
>>>>> > > which I wrote in 2011, to get the distance of tips and nodes from
>>>>> > > the
>>>>> > > root. A better solution may now exist in another R package. I'd have
>>>>> > > to investigate (although maybe someone on the list can suggest one).
>>>>> > >
>>>>> > > The second thought I have is that there might be alternative
>>>>> > > functions
>>>>> > > that do something lie timeSliceTree in another R package. Off the
>>>>> > > top
>>>>> > > of my head, I recall that Nick Matzke had a similar, 'chainsaw'
>>>>> > > function, which you can find here and appears not to call
>>>>> > > dist.nodes:
>>>>> > >
>>>>> > > https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/001483.html
>>>>> > >
>>>>> > > Again, maybe someone on the list knows of a good alternative
>>>>> > > function.
>>>>> > >
>>>>> > > I'll try to give this more thought, but for now, maybe see if you
>>>>> > > can
>>>>> > > use Nick's function succesfully. Overall though, I've discovered the
>>>>> > > use of truly gigantic trees can often run into unexpected problems.
>>>>> > >
>>>>> > > Cheers,
>>>>> > > -Dave
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > On Fri, Oct 16, 2015 at 12:47 PM, Gustavo Burin Ferreira
>>>>> > > <[email protected]> wrote:
>>>>> > > > Dear list,
>>>>> > > >
>>>>> > > > I'm trying to perform a time travel in simulated phylogenies with
>>>>> > > > both
>>>>> > > > extant and extinct species using the timeSliceTree function form
>>>>> > > > the
>>>>> > > > paleotree package. My aim is to have the molecular phylogenies
>>>>> > > > derived
>>>>> > > from
>>>>> > > > the complete phylogeny (attached) in different points in time.
>>>>> > > >
>>>>> > > > However, when I try that with big trees (bigger than 20000 tips
>>>>> > total), I
>>>>> > > > get an error of integer overflow coming from the dist.nodes
>>>>> > > > function.
>>>>> > > After
>>>>> > > > slightly tweaking the dist.nodes function (changing nm from
>>>>> > > > integer to
>>>>> > > > numeric/double), I get the following message:
>>>>> > > >
>>>>> > > > Error in dist.nodes(tree) (from #7) :
>>>>> > > > long vectors (argument 7) are not supported in .Fortran
>>>>> > > >
>>>>> > > > Since I don't know much about C or Fortran, I couldn't find a way
>>>>> > > > of
>>>>> > > solving
>>>>> > > > this by myself, so any help will be greatly appreciated.
>>>>> > > >
>>>>> > > > I'm sending one tree attached for example.
>>>>> > > >
>>>>> > > > Thank you very much in advance!
>>>>> > > >
>>>>> > > > Best,
>>>>> > > >
>>>>> > > > Gustavo Burin Ferreira, Msc.
>>>>> > > > Instituto de Biociências
>>>>> > > > Universidade de São Paulo
>>>>> > > > Tel: (11) 98525-8948
>>>>> > > >
>>>>> > > > _______________________________________________
>>>>> > > > R-sig-phylo mailing list - [email protected]
>>>>> > > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>>>> > > > Searchable archive at
>>>>> > > http://www.mail-archive.com/[email protected]/
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > > David W. Bapst, PhD
>>>>> > > Adjunct Asst. Professor, Geology and Geol. Eng.
>>>>> > > South Dakota School of Mines and Technology
>>>>> > > 501 E. St. Joseph
>>>>> > > Rapid City, SD 57701
>>>>> > >
>>>>> > > http://webpages.sdsmt.edu/~dbapst/
>>>>> > > http://cran.r-project.org/web/packages/paleotree/index.html
>>>>> > >
>>>>> > > _______________________________________________
>>>>> > > R-sig-phylo mailing list - [email protected]
>>>>> > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>>>> > > Searchable archive at
>>>>> > > http://www.mail-archive.com/[email protected]/
>>>>> >
>>>>> > [[alternative HTML version deleted]]
>>>>> >
>>>>> > _______________________________________________
>>>>> > R-sig-phylo mailing list - [email protected]
>>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>>>> > Searchable archive at
>>>>> > http://www.mail-archive.com/[email protected]/
>>>>> >
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-phylo mailing list - [email protected]
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>>>> Searchable archive at
>>>>> http://www.mail-archive.com/[email protected]/
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Klaus Schliep
>>>> Postdoctoral Fellow
>>>> Revell Lab, University of Massachusetts Boston
>>>>
>>>
>>
>>
>>
>> --
>> Klaus Schliep
>> Postdoctoral Fellow
>> Revell Lab, University of Massachusetts Boston
>>
>
>
>
> --
> David W. Bapst, PhD
> Adjunct Asst. Professor, Geology and Geol. Eng.
> South Dakota School of Mines and Technology
> 501 E. St. Joseph
> Rapid City, SD 57701
>
> http://webpages.sdsmt.edu/~dbapst/
> http://cran.r-project.org/web/packages/paleotree/index.html
--
David W. Bapst, PhD
Adjunct Asst. Professor, Geology and Geol. Eng.
South Dakota School of Mines and Technology
501 E. St. Joseph
Rapid City, SD 57701
http://webpages.sdsmt.edu/~dbapst/
http://cran.r-project.org/web/packages/paleotree/index.html
_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/