Re: [R-sig-phylo] dist.nodes

2021-09-07 Thread Klaus Schliep
Hi Nick,

it might be useful to plot the node labels on the tree:

plot(tree, label.offset = .25)
tiplabels()
nodelabels()

Regards,
Klaus




On Tue, Sep 7, 2021 at 4:56 AM Emmanuel Paradis 
wrote:

> Hi,
>
> A tree has n terminal nodes (aka tips) and m internal nodes (aka nodes
> simply). In the edge matrix, the tips are numbered 1:n and the nodes are
> numbered (n+1):(n+m) (same than n+1:m). n and m can be found with:
>
> n <- Ntip(tree) # or length(tree$tip.label)
> m <- Nnode(tree) # or tree$Nnode
>
> Details can be found in:
>
> http://ape-package.ird.fr/misc/FormatTreeR.pdf
>
> Best,
>
> Emmanuel
>
> - Le 5 Sep 21, à 21:07, Nick Youngblut nyoun...@gmail.com a écrit :
> > For `ape::dist.nodes()`, how can one match the output matrix
> rows/columns with
> > the node IDs in the tree (eg., the tip.labels)? I cannot just use
> > `ape::cophenetic()` in my particular situation. The docs for
> > `ape::dist.nodes()` state:
> >
> > ```
> > … in the case of dist.nodes, the numbers of the tips and the nodes (as
> given by
> > the element edge).
> > ```
> >
> > … but tree$edge doesn’t provide any direct info about which nodes are
> internal
> > vs external and how the external tip labels map to the tree$edge matrix.
> >
> > Thanks,
> > Nick
> > ___
> > R-sig-phylo mailing list - R-sig-phylo@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> > Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>


-- 
Klaus Schliep

Senior Scientist
Institute of Computational Biotechnology
TU Graz
https://icbt.tugraz.at
https://www.phangorn.org/ 

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] dist.nodes

2021-09-06 Thread Emmanuel Paradis
Hi,

A tree has n terminal nodes (aka tips) and m internal nodes (aka nodes simply). 
In the edge matrix, the tips are numbered 1:n and the nodes are numbered 
(n+1):(n+m) (same than n+1:m). n and m can be found with:

n <- Ntip(tree) # or length(tree$tip.label)
m <- Nnode(tree) # or tree$Nnode

Details can be found in:

http://ape-package.ird.fr/misc/FormatTreeR.pdf

Best,

Emmanuel

- Le 5 Sep 21, à 21:07, Nick Youngblut nyoun...@gmail.com a écrit :
> For `ape::dist.nodes()`, how can one match the output matrix rows/columns with
> the node IDs in the tree (eg., the tip.labels)? I cannot just use
> `ape::cophenetic()` in my particular situation. The docs for
> `ape::dist.nodes()` state:
> 
> ```
> … in the case of dist.nodes, the numbers of the tips and the nodes (as given 
> by
> the element edge).
> ```
> 
> … but tree$edge doesn’t provide any direct info about which nodes are internal
> vs external and how the external tip labels map to the tree$edge matrix.
> 
> Thanks,
> Nick
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] dist.nodes

2021-09-06 Thread Nick Youngblut
For `ape::dist.nodes()`, how can one match the output matrix rows/columns with 
the node IDs in the tree (eg., the tip.labels)? I cannot just use 
`ape::cophenetic()` in my particular situation. The docs for 
`ape::dist.nodes()` state:

```
… in the case of dist.nodes, the numbers of the tips and the nodes (as given by 
the element edge).
```

… but tree$edge doesn’t provide any direct info about which nodes are internal 
vs external and how the external tip labels map to the tree$edge matrix. 

Thanks,
Nick
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] dist.nodes crashing with big trees

2015-10-22 Thread Gustavo Burin Ferreira
Hey David,

I installed and it is working perfectly! Tried with a couple of big trees
and worked well and fast!

Thank all of you for the help!

Best,

*Gustavo Burin Ferreira, **Msc.*
Instituto de Biociências
Universidade de São Paulo
Tel: (11) 98525-8948

On Wed, Oct 21, 2015 at 2:43 AM, David Bapst  wrote:

> Gustavo, all,
>
> I just wanted to let you know that I've integrated Klaus's suggestions
> into timeSliceTree, as well as a growing number of additional
> functions in paleotree, and these changes are in the current build on
> github, which appears to pass all checks as of the moment.
>
> You can install the in-development version of paleotree on github
> using the R package devtools, like so:
>
> library(devtools)
> install_github("dwbapst/paleotree")
>
> Gustavo, if you could check and see if you encounter any issues with
> the timeSliceTree in this new version, I would appreciate it.
>
> Cheers,
> -Dave
>
>
> On Tue, Oct 20, 2015 at 1:39 PM, David Bapst  wrote:
> > Thanks, Klaus. I was unaware of node.depth.edgelength; I believe I use
> > dist.nodes to calculate root-to-tip distances in a number of functions
> > in paleotree, so this could mean improvement potentially to a large
> > number of functions. I'll have to do a global search for dist.nodes.
> >
> > Cheers,
> > -Dave
> >
> > On Tue, Oct 20, 2015 at 1:32 PM, Klaus Schliep 
> wrote:
> >> Hi Gustavo & David,
> >> a tiny improvement I missed another sapply.
> >> You can do the same trick replacing dist.nodes in dropExtinct if needed.
> >> Klaus
> >>
> >> On Tue, Oct 20, 2015 at 3:21 PM, Gustavo Burin Ferreira
> >>  wrote:
> >>>
> >>> Hi Klaus,
> >>>
> >>> the function works perfectly! Thank you very much!
> >>>
> >>> Best,
> >>>
> >>> Gustavo Burin Ferreira, Msc.
> >>> Instituto de Biociências
> >>> Universidade de São Paulo
> >>> Tel: (11) 98525-8948
> >>>
> >>> On Tue, Oct 20, 2015 at 5:15 PM, Klaus Schliep <
> klaus.schl...@gmail.com>
> >>> wrote:
> 
>  Hi Gustavo & David,
> 
>  I attached a file that contains a function timeSliceTree2, which is a
>  replacement for timeSliceTree.
>  I replaced
>  dist.nodes(tree)[, Ntip(tree) + 1]
>  with
>  node.depth.edgelength(tree)
>  dist.nodes computes a matrix (roughly nTips^2 * 4 * 8 bytes which can
> get
>  very large for many taxa)
>  whereas node.depth.edgelength just the vector the function needs. This
>  caused the memory problem.
>  I also simplified a unnecessary sapply and put prop.part out of a
> loop.
>  So it needs less memory and is also much faster for large trees. The
>  results should be exactly the same.
> 
>  Cheers,
>  Klaus
> 
> 
> 
>  > source("timeSliceTree.R")
>  > library(paleotree)
>  > set.seed(123)
>  > tree = rtree(2000)
>  system.time(tree1 <-
>  timeSliceTree(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
>  Warning: no ttree$root.time! Assuming latest tip is at present
> (time=0)
> user  system elapsed
>    7.416   0.033   7.454
>  > system.time(tree2 <-
>  > timeSliceTree2(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
>  Warning: no ttree$root.time! Assuming latest tip is at present
> (time=0)
> user  system elapsed
>    0.147   0.003   0.151
> 
> 
> 
> 
>  On Tue, Oct 20, 2015 at 11:49 AM, Gustavo Burin Ferreira
>   wrote:
> >
> > Hey David and Nick,
> >
> > thanks a lot for the quick responses! I think I wasn't very clear in
> the
> > first e-mail. What I get is actually an error from within dist.nodes,
> > not
> > when calling it.
> >
> > I've tried to use chainsaw2 and in the beginning it appeared to be
> > working
> > quite well. However after some running time, I get the same
> (original)
> > error that motivated me writing to the list:
> >
> >
> > > *Error in double(nm * nm) : vector size cannot be NA*
> > > *In addition: Warning message:**In nm * nm : NAs produced by
> integer
> > > overflow*
> >
> >
> > Digging into the functions called within chainsaw2, I found that at
> some
> > point it uses the function get_max_height_tree, that calls dist.nodes
> > and
> > that's where I think the problem lies. The error I got now is almost
> > exactly the same as I got from timeSliceTree (because both cases use
> > dist.nodes):
> >
> >
> > > *dist.nodes*
> > > *function (x) *
> > > *{*
> > > *x <- reorder(x)*
> > > *n <- Ntip(x)*
> > > *m <- x$Nnode*
> > > *nm <- n + m*
> > > *d <- .C(dist_nodes, as.integer(n), as.integer(m),
> > > as.integer(x$edge[, *
> > > *1] - 1L), as.integer(x$edge[, 2] - 1L),
> > > as.double(x$edge.length), *
> > > *as.integer(Nedge(x)), double(nm * nm), NAOK = TRUE)[[7]]*
> > > *dim(d) <- c(nm, nm)*
> > > *dimnames(d) <- list(1:nm, 1:nm)*
> > > *d**}*
>

Re: [R-sig-phylo] dist.nodes crashing with big trees

2015-10-20 Thread David Bapst
Gustavo, all,

I just wanted to let you know that I've integrated Klaus's suggestions
into timeSliceTree, as well as a growing number of additional
functions in paleotree, and these changes are in the current build on
github, which appears to pass all checks as of the moment.

You can install the in-development version of paleotree on github
using the R package devtools, like so:

library(devtools)
install_github("dwbapst/paleotree")

Gustavo, if you could check and see if you encounter any issues with
the timeSliceTree in this new version, I would appreciate it.

Cheers,
-Dave


On Tue, Oct 20, 2015 at 1:39 PM, David Bapst  wrote:
> Thanks, Klaus. I was unaware of node.depth.edgelength; I believe I use
> dist.nodes to calculate root-to-tip distances in a number of functions
> in paleotree, so this could mean improvement potentially to a large
> number of functions. I'll have to do a global search for dist.nodes.
>
> Cheers,
> -Dave
>
> On Tue, Oct 20, 2015 at 1:32 PM, Klaus Schliep  
> wrote:
>> Hi Gustavo & David,
>> a tiny improvement I missed another sapply.
>> You can do the same trick replacing dist.nodes in dropExtinct if needed.
>> Klaus
>>
>> On Tue, Oct 20, 2015 at 3:21 PM, Gustavo Burin Ferreira
>>  wrote:
>>>
>>> Hi Klaus,
>>>
>>> the function works perfectly! Thank you very much!
>>>
>>> Best,
>>>
>>> Gustavo Burin Ferreira, Msc.
>>> Instituto de Biociências
>>> Universidade de São Paulo
>>> Tel: (11) 98525-8948
>>>
>>> On Tue, Oct 20, 2015 at 5:15 PM, Klaus Schliep 
>>> wrote:

 Hi Gustavo & David,

 I attached a file that contains a function timeSliceTree2, which is a
 replacement for timeSliceTree.
 I replaced
 dist.nodes(tree)[, Ntip(tree) + 1]
 with
 node.depth.edgelength(tree)
 dist.nodes computes a matrix (roughly nTips^2 * 4 * 8 bytes which can get
 very large for many taxa)
 whereas node.depth.edgelength just the vector the function needs. This
 caused the memory problem.
 I also simplified a unnecessary sapply and put prop.part out of a loop.
 So it needs less memory and is also much faster for large trees. The
 results should be exactly the same.

 Cheers,
 Klaus



 > source("timeSliceTree.R")
 > library(paleotree)
 > set.seed(123)
 > tree = rtree(2000)
 system.time(tree1 <-
 timeSliceTree(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
 Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
user  system elapsed
   7.416   0.033   7.454
 > system.time(tree2 <-
 > timeSliceTree2(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
 Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
user  system elapsed
   0.147   0.003   0.151




 On Tue, Oct 20, 2015 at 11:49 AM, Gustavo Burin Ferreira
  wrote:
>
> Hey David and Nick,
>
> thanks a lot for the quick responses! I think I wasn't very clear in the
> first e-mail. What I get is actually an error from within dist.nodes,
> not
> when calling it.
>
> I've tried to use chainsaw2 and in the beginning it appeared to be
> working
> quite well. However after some running time, I get the same (original)
> error that motivated me writing to the list:
>
>
> > *Error in double(nm * nm) : vector size cannot be NA*
> > *In addition: Warning message:**In nm * nm : NAs produced by integer
> > overflow*
>
>
> Digging into the functions called within chainsaw2, I found that at some
> point it uses the function get_max_height_tree, that calls dist.nodes
> and
> that's where I think the problem lies. The error I got now is almost
> exactly the same as I got from timeSliceTree (because both cases use
> dist.nodes):
>
>
> > *dist.nodes*
> > *function (x) *
> > *{*
> > *x <- reorder(x)*
> > *n <- Ntip(x)*
> > *m <- x$Nnode*
> > *nm <- n + m*
> > *d <- .C(dist_nodes, as.integer(n), as.integer(m),
> > as.integer(x$edge[, *
> > *1] - 1L), as.integer(x$edge[, 2] - 1L),
> > as.double(x$edge.length), *
> > *as.integer(Nedge(x)), double(nm * nm), NAOK = TRUE)[[7]]*
> > *dim(d) <- c(nm, nm)*
> > *dimnames(d) <- list(1:nm, 1:nm)*
> > *d**}*
>
>
> I tried changing the highlighted part to something like
> double(as.numeric(nm) * as.numeric(nm)), and when I try running it, I
> get
> the error I wrote on the first e-mail:
>
>
> > *Error in dist.nodes(tree) (from #7) : **  long vectors (argument 7)
> > are
> > not supported in .Fortran*
>
>
> Thus, I think that to solve this problem some tweak in the C/Fortran
> code
> that is called within dist.nodes (from ape) might be required, but I
> have
> no expertise on that. So if someone can help me with that, I'll
> appreciate
> i

Re: [R-sig-phylo] dist.nodes crashing with big trees

2015-10-20 Thread David Bapst
Thanks, Klaus. I was unaware of node.depth.edgelength; I believe I use
dist.nodes to calculate root-to-tip distances in a number of functions
in paleotree, so this could mean improvement potentially to a large
number of functions. I'll have to do a global search for dist.nodes.

Cheers,
-Dave

On Tue, Oct 20, 2015 at 1:32 PM, Klaus Schliep  wrote:
> Hi Gustavo & David,
> a tiny improvement I missed another sapply.
> You can do the same trick replacing dist.nodes in dropExtinct if needed.
> Klaus
>
> On Tue, Oct 20, 2015 at 3:21 PM, Gustavo Burin Ferreira
>  wrote:
>>
>> Hi Klaus,
>>
>> the function works perfectly! Thank you very much!
>>
>> Best,
>>
>> Gustavo Burin Ferreira, Msc.
>> Instituto de Biociências
>> Universidade de São Paulo
>> Tel: (11) 98525-8948
>>
>> On Tue, Oct 20, 2015 at 5:15 PM, Klaus Schliep 
>> wrote:
>>>
>>> Hi Gustavo & David,
>>>
>>> I attached a file that contains a function timeSliceTree2, which is a
>>> replacement for timeSliceTree.
>>> I replaced
>>> dist.nodes(tree)[, Ntip(tree) + 1]
>>> with
>>> node.depth.edgelength(tree)
>>> dist.nodes computes a matrix (roughly nTips^2 * 4 * 8 bytes which can get
>>> very large for many taxa)
>>> whereas node.depth.edgelength just the vector the function needs. This
>>> caused the memory problem.
>>> I also simplified a unnecessary sapply and put prop.part out of a loop.
>>> So it needs less memory and is also much faster for large trees. The
>>> results should be exactly the same.
>>>
>>> Cheers,
>>> Klaus
>>>
>>>
>>>
>>> > source("timeSliceTree.R")
>>> > library(paleotree)
>>> > set.seed(123)
>>> > tree = rtree(2000)
>>> system.time(tree1 <-
>>> timeSliceTree(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
>>> Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
>>>user  system elapsed
>>>   7.416   0.033   7.454
>>> > system.time(tree2 <-
>>> > timeSliceTree2(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
>>> Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
>>>user  system elapsed
>>>   0.147   0.003   0.151
>>>
>>>
>>>
>>>
>>> On Tue, Oct 20, 2015 at 11:49 AM, Gustavo Burin Ferreira
>>>  wrote:

 Hey David and Nick,

 thanks a lot for the quick responses! I think I wasn't very clear in the
 first e-mail. What I get is actually an error from within dist.nodes,
 not
 when calling it.

 I've tried to use chainsaw2 and in the beginning it appeared to be
 working
 quite well. However after some running time, I get the same (original)
 error that motivated me writing to the list:


 > *Error in double(nm * nm) : vector size cannot be NA*
 > *In addition: Warning message:**In nm * nm : NAs produced by integer
 > overflow*


 Digging into the functions called within chainsaw2, I found that at some
 point it uses the function get_max_height_tree, that calls dist.nodes
 and
 that's where I think the problem lies. The error I got now is almost
 exactly the same as I got from timeSliceTree (because both cases use
 dist.nodes):


 > *dist.nodes*
 > *function (x) *
 > *{*
 > *x <- reorder(x)*
 > *n <- Ntip(x)*
 > *m <- x$Nnode*
 > *nm <- n + m*
 > *d <- .C(dist_nodes, as.integer(n), as.integer(m),
 > as.integer(x$edge[, *
 > *1] - 1L), as.integer(x$edge[, 2] - 1L),
 > as.double(x$edge.length), *
 > *as.integer(Nedge(x)), double(nm * nm), NAOK = TRUE)[[7]]*
 > *dim(d) <- c(nm, nm)*
 > *dimnames(d) <- list(1:nm, 1:nm)*
 > *d**}*


 I tried changing the highlighted part to something like
 double(as.numeric(nm) * as.numeric(nm)), and when I try running it, I
 get
 the error I wrote on the first e-mail:


 > *Error in dist.nodes(tree) (from #7) : **  long vectors (argument 7)
 > are
 > not supported in .Fortran*


 Thus, I think that to solve this problem some tweak in the C/Fortran
 code
 that is called within dist.nodes (from ape) might be required, but I
 have
 no expertise on that. So if someone can help me with that, I'll
 appreciate
 it!

 Thanks again for the help so far!

 Best,


 *Gustavo Burin Ferreira, **Msc.*

 Instituto de Biociências
 Universidade de São Paulo
 Tel: (11) 98525-8948

 On Fri, Oct 16, 2015 at 5:06 PM, Nick Matzke  wrote:

 > Hi!  I re-did chainsaw at some point, now there is chainsaw2.
 > However,
 > googling that gets you horror movies, so here is a link with example
 > code:
 >
 > https://groups.google.com/d/msg/biogeobears/Jy9uYckOL7s/XuNZ0B3jAwAJ
 >
 > (the discussion there points out a rare case where this crashes, but
 > for
 > most trees it should work fine)
 >
 > Cheers, Nick
 >
 > On Fri, Oct 16, 2015 at 2:17 PM, David Bapst 
 > wrote:
 >
 > > Hi Gus

Re: [R-sig-phylo] dist.nodes crashing with big trees

2015-10-20 Thread Klaus Schliep
That's fine with me.

On Tue, Oct 20, 2015 at 3:29 PM, David Bapst  wrote:

> Ah, thanks, Klaus! I was is it alright with you if I merge these edits
> into paleotree?
> -Dave
>
> On Tue, Oct 20, 2015 at 1:15 PM, Klaus Schliep 
> wrote:
> > Hi Gustavo & David,
> >
> > I attached a file that contains a function timeSliceTree2, which is a
> > replacement for timeSliceTree.
> > I replaced
> > dist.nodes(tree)[, Ntip(tree) + 1]
> > with
> > node.depth.edgelength(tree)
> > dist.nodes computes a matrix (roughly nTips^2 * 4 * 8 bytes which can get
> > very large for many taxa)
> > whereas node.depth.edgelength just the vector the function needs. This
> > caused the memory problem.
> > I also simplified a unnecessary sapply and put prop.part out of a loop.
> > So it needs less memory and is also much faster for large trees. The
> results
> > should be exactly the same.
> >
> > Cheers,
> > Klaus
> >
> >
> >
> >> source("timeSliceTree.R")
> >> library(paleotree)
> >> set.seed(123)
> >> tree = rtree(2000)
> > system.time(tree1 <-
> > timeSliceTree(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
> > Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
> >user  system elapsed
> >   7.416   0.033   7.454
> >> system.time(tree2 <-
> >> timeSliceTree2(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
> > Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
> >user  system elapsed
> >   0.147   0.003   0.151
> >
> >
> >
> >
> > On Tue, Oct 20, 2015 at 11:49 AM, Gustavo Burin Ferreira
> >  wrote:
> >>
> >> Hey David and Nick,
> >>
> >> thanks a lot for the quick responses! I think I wasn't very clear in the
> >> first e-mail. What I get is actually an error from within dist.nodes,
> not
> >> when calling it.
> >>
> >> I've tried to use chainsaw2 and in the beginning it appeared to be
> working
> >> quite well. However after some running time, I get the same (original)
> >> error that motivated me writing to the list:
> >>
> >>
> >> > *Error in double(nm * nm) : vector size cannot be NA*
> >> > *In addition: Warning message:**In nm * nm : NAs produced by integer
> >> > overflow*
> >>
> >>
> >> Digging into the functions called within chainsaw2, I found that at some
> >> point it uses the function get_max_height_tree, that calls dist.nodes
> and
> >> that's where I think the problem lies. The error I got now is almost
> >> exactly the same as I got from timeSliceTree (because both cases use
> >> dist.nodes):
> >>
> >>
> >> > *dist.nodes*
> >> > *function (x) *
> >> > *{*
> >> > *x <- reorder(x)*
> >> > *n <- Ntip(x)*
> >> > *m <- x$Nnode*
> >> > *nm <- n + m*
> >> > *d <- .C(dist_nodes, as.integer(n), as.integer(m),
> >> > as.integer(x$edge[, *
> >> > *1] - 1L), as.integer(x$edge[, 2] - 1L),
> >> > as.double(x$edge.length), *
> >> > *as.integer(Nedge(x)), double(nm * nm), NAOK = TRUE)[[7]]*
> >> > *dim(d) <- c(nm, nm)*
> >> > *dimnames(d) <- list(1:nm, 1:nm)*
> >> > *d**}*
> >>
> >>
> >> I tried changing the highlighted part to something like
> >> double(as.numeric(nm) * as.numeric(nm)), and when I try running it, I
> get
> >> the error I wrote on the first e-mail:
> >>
> >>
> >> > *Error in dist.nodes(tree) (from #7) : **  long vectors (argument 7)
> are
> >> > not supported in .Fortran*
> >>
> >>
> >> Thus, I think that to solve this problem some tweak in the C/Fortran
> code
> >> that is called within dist.nodes (from ape) might be required, but I
> have
> >> no expertise on that. So if someone can help me with that, I'll
> appreciate
> >> it!
> >>
> >> Thanks again for the help so far!
> >>
> >> Best,
> >>
> >>
> >> *Gustavo Burin Ferreira, **Msc.*
> >> Instituto de Biociências
> >> Universidade de São Paulo
> >> Tel: (11) 98525-8948
> >>
> >> On Fri, Oct 16, 2015 at 5:06 PM, Nick Matzke 
> wrote:
> >>
> >> > Hi!  I re-did chainsaw at some point, now there is chainsaw2.
> However,
> >> > googling that gets you horror movies, so here is a link with example
> >> > code:
> >> >
> >> > https://groups.google.com/d/msg/biogeobears/Jy9uYckOL7s/XuNZ0B3jAwAJ
> >> >
> >> > (the discussion there points out a rare case where this crashes, but
> for
> >> > most trees it should work fine)
> >> >
> >> > Cheers, Nick
> >> >
> >> > On Fri, Oct 16, 2015 at 2:17 PM, David Bapst 
> wrote:
> >> >
> >> > > Hi Gustavo,
> >> > >
> >> > > I'm paleotree's author and maintainer. Just to be clear that I
> >> > > understand your problem, I believe you are saying that when you use
> >> > > timeSliceTree, you are getting an error that the internal call to
> >> > > dist.nodes is failing? Is that right?
> >> > >
> >> > > The first thought I have is that maybe the solution here is to avoid
> >> > > dist.nodes, as it is somewhat overkill. I use dist.nodes in that
> code,
> >> > > which I wrote in 2011, to get the distance of tips and nodes from
> the
> >> > > root. A better solution may now exist in another R package. I'd have
> >> > > to investigate (although mayb

Re: [R-sig-phylo] dist.nodes crashing with big trees

2015-10-20 Thread Klaus Schliep
Hi Gustavo & David,
a tiny improvement I missed another sapply.
You can do the same trick replacing dist.nodes in dropExtinct if needed.
Klaus

On Tue, Oct 20, 2015 at 3:21 PM, Gustavo Burin Ferreira  wrote:

> Hi Klaus,
>
> the function works perfectly! Thank you very much!
>
> Best,
>
> *Gustavo Burin Ferreira, **Msc.*
> Instituto de Biociências
> Universidade de São Paulo
> Tel: (11) 98525-8948
>
> On Tue, Oct 20, 2015 at 5:15 PM, Klaus Schliep 
> wrote:
>
>> Hi Gustavo & David,
>>
>> I attached a file that contains a function timeSliceTree2, which is a
>> replacement for timeSliceTree.
>> I replaced
>> dist.nodes(tree)[, Ntip(tree) + 1]
>> with
>> node.depth.edgelength(tree)
>> dist.nodes computes a matrix (roughly nTips^2 * 4 * 8 bytes which can get
>> very large for many taxa)
>> whereas node.depth.edgelength just the vector the function needs. This
>> caused the memory problem.
>> I also simplified a unnecessary sapply and put prop.part out of a loop.
>> So it needs less memory and is also much faster for large trees. The
>> results should be exactly the same.
>>
>> Cheers,
>> Klaus
>>
>>
>>
>> > source("timeSliceTree.R")
>> > library(paleotree)
>> > set.seed(123)
>> > tree = rtree(2000)
>> system.time(tree1 <-
>> timeSliceTree(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
>> Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
>>user  system elapsed
>>   7.416   0.033   7.454
>> > system.time(tree2 <-
>> timeSliceTree2(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
>> Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
>>user  system elapsed
>>   0.147   0.003   0.151
>>
>>
>>
>>
>> On Tue, Oct 20, 2015 at 11:49 AM, Gustavo Burin Ferreira <
>> ariete...@gmail.com> wrote:
>>
>>> Hey David and Nick,
>>>
>>> thanks a lot for the quick responses! I think I wasn't very clear in the
>>> first e-mail. What I get is actually an error from within dist.nodes, not
>>> when calling it.
>>>
>>> I've tried to use chainsaw2 and in the beginning it appeared to be
>>> working
>>> quite well. However after some running time, I get the same (original)
>>> error that motivated me writing to the list:
>>>
>>>
>>> > *Error in double(nm * nm) : vector size cannot be NA*
>>> > *In addition: Warning message:**In nm * nm : NAs produced by integer
>>> > overflow*
>>>
>>>
>>> Digging into the functions called within chainsaw2, I found that at some
>>> point it uses the function get_max_height_tree, that calls dist.nodes and
>>> that's where I think the problem lies. The error I got now is almost
>>> exactly the same as I got from timeSliceTree (because both cases use
>>> dist.nodes):
>>>
>>>
>>> > *dist.nodes*
>>> > *function (x) *
>>> > *{*
>>> > *x <- reorder(x)*
>>> > *n <- Ntip(x)*
>>> > *m <- x$Nnode*
>>> > *nm <- n + m*
>>> > *d <- .C(dist_nodes, as.integer(n), as.integer(m),
>>> > as.integer(x$edge[, *
>>> > *1] - 1L), as.integer(x$edge[, 2] - 1L),
>>> > as.double(x$edge.length), *
>>> > *as.integer(Nedge(x)), double(nm * nm), NAOK = TRUE)[[7]]*
>>> > *dim(d) <- c(nm, nm)*
>>> > *dimnames(d) <- list(1:nm, 1:nm)*
>>> > *d**}*
>>>
>>>
>>> I tried changing the highlighted part to something like
>>> double(as.numeric(nm) * as.numeric(nm)), and when I try running it, I get
>>> the error I wrote on the first e-mail:
>>>
>>>
>>> > *Error in dist.nodes(tree) (from #7) : **  long vectors (argument 7)
>>> are
>>> > not supported in .Fortran*
>>>
>>>
>>> Thus, I think that to solve this problem some tweak in the C/Fortran code
>>> that is called within dist.nodes (from ape) might be required, but I have
>>> no expertise on that. So if someone can help me with that, I'll
>>> appreciate
>>> it!
>>>
>>> Thanks again for the help so far!
>>>
>>> Best,
>>>
>>>
>>> *Gustavo Burin Ferreira, **Msc.*
>>>
>>> Instituto de Biociências
>>> Universidade de São Paulo
>>> Tel: (11) 98525-8948
>>>
>>> On Fri, Oct 16, 2015 at 5:06 PM, Nick Matzke  wrote:
>>>
>>> > Hi!  I re-did chainsaw at some point, now there is chainsaw2.  However,
>>> > googling that gets you horror movies, so here is a link with example
>>> code:
>>> >
>>> > https://groups.google.com/d/msg/biogeobears/Jy9uYckOL7s/XuNZ0B3jAwAJ
>>> >
>>> > (the discussion there points out a rare case where this crashes, but
>>> for
>>> > most trees it should work fine)
>>> >
>>> > Cheers, Nick
>>> >
>>> > On Fri, Oct 16, 2015 at 2:17 PM, David Bapst 
>>> wrote:
>>> >
>>> > > Hi Gustavo,
>>> > >
>>> > > I'm paleotree's author and maintainer. Just to be clear that I
>>> > > understand your problem, I believe you are saying that when you use
>>> > > timeSliceTree, you are getting an error that the internal call to
>>> > > dist.nodes is failing? Is that right?
>>> > >
>>> > > The first thought I have is that maybe the solution here is to avoid
>>> > > dist.nodes, as it is somewhat overkill. I use dist.nodes in that
>>> code,
>>> > > which I wrote in 2011, to get the distance of tips and nodes from t

Re: [R-sig-phylo] dist.nodes crashing with big trees

2015-10-20 Thread David Bapst
Ah, thanks, Klaus! I was is it alright with you if I merge these edits
into paleotree?
-Dave

On Tue, Oct 20, 2015 at 1:15 PM, Klaus Schliep  wrote:
> Hi Gustavo & David,
>
> I attached a file that contains a function timeSliceTree2, which is a
> replacement for timeSliceTree.
> I replaced
> dist.nodes(tree)[, Ntip(tree) + 1]
> with
> node.depth.edgelength(tree)
> dist.nodes computes a matrix (roughly nTips^2 * 4 * 8 bytes which can get
> very large for many taxa)
> whereas node.depth.edgelength just the vector the function needs. This
> caused the memory problem.
> I also simplified a unnecessary sapply and put prop.part out of a loop.
> So it needs less memory and is also much faster for large trees. The results
> should be exactly the same.
>
> Cheers,
> Klaus
>
>
>
>> source("timeSliceTree.R")
>> library(paleotree)
>> set.seed(123)
>> tree = rtree(2000)
> system.time(tree1 <-
> timeSliceTree(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
> Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
>user  system elapsed
>   7.416   0.033   7.454
>> system.time(tree2 <-
>> timeSliceTree2(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
> Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
>user  system elapsed
>   0.147   0.003   0.151
>
>
>
>
> On Tue, Oct 20, 2015 at 11:49 AM, Gustavo Burin Ferreira
>  wrote:
>>
>> Hey David and Nick,
>>
>> thanks a lot for the quick responses! I think I wasn't very clear in the
>> first e-mail. What I get is actually an error from within dist.nodes, not
>> when calling it.
>>
>> I've tried to use chainsaw2 and in the beginning it appeared to be working
>> quite well. However after some running time, I get the same (original)
>> error that motivated me writing to the list:
>>
>>
>> > *Error in double(nm * nm) : vector size cannot be NA*
>> > *In addition: Warning message:**In nm * nm : NAs produced by integer
>> > overflow*
>>
>>
>> Digging into the functions called within chainsaw2, I found that at some
>> point it uses the function get_max_height_tree, that calls dist.nodes and
>> that's where I think the problem lies. The error I got now is almost
>> exactly the same as I got from timeSliceTree (because both cases use
>> dist.nodes):
>>
>>
>> > *dist.nodes*
>> > *function (x) *
>> > *{*
>> > *x <- reorder(x)*
>> > *n <- Ntip(x)*
>> > *m <- x$Nnode*
>> > *nm <- n + m*
>> > *d <- .C(dist_nodes, as.integer(n), as.integer(m),
>> > as.integer(x$edge[, *
>> > *1] - 1L), as.integer(x$edge[, 2] - 1L),
>> > as.double(x$edge.length), *
>> > *as.integer(Nedge(x)), double(nm * nm), NAOK = TRUE)[[7]]*
>> > *dim(d) <- c(nm, nm)*
>> > *dimnames(d) <- list(1:nm, 1:nm)*
>> > *d**}*
>>
>>
>> I tried changing the highlighted part to something like
>> double(as.numeric(nm) * as.numeric(nm)), and when I try running it, I get
>> the error I wrote on the first e-mail:
>>
>>
>> > *Error in dist.nodes(tree) (from #7) : **  long vectors (argument 7) are
>> > not supported in .Fortran*
>>
>>
>> Thus, I think that to solve this problem some tweak in the C/Fortran code
>> that is called within dist.nodes (from ape) might be required, but I have
>> no expertise on that. So if someone can help me with that, I'll appreciate
>> it!
>>
>> Thanks again for the help so far!
>>
>> Best,
>>
>>
>> *Gustavo Burin Ferreira, **Msc.*
>> Instituto de Biociências
>> Universidade de São Paulo
>> Tel: (11) 98525-8948
>>
>> On Fri, Oct 16, 2015 at 5:06 PM, Nick Matzke  wrote:
>>
>> > Hi!  I re-did chainsaw at some point, now there is chainsaw2.  However,
>> > googling that gets you horror movies, so here is a link with example
>> > code:
>> >
>> > https://groups.google.com/d/msg/biogeobears/Jy9uYckOL7s/XuNZ0B3jAwAJ
>> >
>> > (the discussion there points out a rare case where this crashes, but for
>> > most trees it should work fine)
>> >
>> > Cheers, Nick
>> >
>> > On Fri, Oct 16, 2015 at 2:17 PM, David Bapst  wrote:
>> >
>> > > Hi Gustavo,
>> > >
>> > > I'm paleotree's author and maintainer. Just to be clear that I
>> > > understand your problem, I believe you are saying that when you use
>> > > timeSliceTree, you are getting an error that the internal call to
>> > > dist.nodes is failing? Is that right?
>> > >
>> > > The first thought I have is that maybe the solution here is to avoid
>> > > dist.nodes, as it is somewhat overkill. I use dist.nodes in that code,
>> > > which I wrote in 2011, to get the distance of tips and nodes from the
>> > > root. A better solution may now exist in another R package. I'd have
>> > > to investigate (although maybe someone on the list can suggest one).
>> > >
>> > > The second thought I have is that there might be alternative functions
>> > > that do something lie timeSliceTree in another R package. Off the top
>> > > of my head, I recall that Nick Matzke had a similar, 'chainsaw'
>> > > function, which you can find here and appears not to call dist.nodes:
>> > >
>> > > https://stat.ethz.

Re: [R-sig-phylo] dist.nodes crashing with big trees

2015-10-20 Thread Gustavo Burin Ferreira
Hi Klaus,

the function works perfectly! Thank you very much!

Best,

*Gustavo Burin Ferreira, **Msc.*
Instituto de Biociências
Universidade de São Paulo
Tel: (11) 98525-8948

On Tue, Oct 20, 2015 at 5:15 PM, Klaus Schliep 
wrote:

> Hi Gustavo & David,
>
> I attached a file that contains a function timeSliceTree2, which is a
> replacement for timeSliceTree.
> I replaced
> dist.nodes(tree)[, Ntip(tree) + 1]
> with
> node.depth.edgelength(tree)
> dist.nodes computes a matrix (roughly nTips^2 * 4 * 8 bytes which can get
> very large for many taxa)
> whereas node.depth.edgelength just the vector the function needs. This
> caused the memory problem.
> I also simplified a unnecessary sapply and put prop.part out of a loop.
> So it needs less memory and is also much faster for large trees. The
> results should be exactly the same.
>
> Cheers,
> Klaus
>
>
>
> > source("timeSliceTree.R")
> > library(paleotree)
> > set.seed(123)
> > tree = rtree(2000)
> system.time(tree1 <-
> timeSliceTree(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
> Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
>user  system elapsed
>   7.416   0.033   7.454
> > system.time(tree2 <-
> timeSliceTree2(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
> Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
>user  system elapsed
>   0.147   0.003   0.151
>
>
>
>
> On Tue, Oct 20, 2015 at 11:49 AM, Gustavo Burin Ferreira <
> ariete...@gmail.com> wrote:
>
>> Hey David and Nick,
>>
>> thanks a lot for the quick responses! I think I wasn't very clear in the
>> first e-mail. What I get is actually an error from within dist.nodes, not
>> when calling it.
>>
>> I've tried to use chainsaw2 and in the beginning it appeared to be working
>> quite well. However after some running time, I get the same (original)
>> error that motivated me writing to the list:
>>
>>
>> > *Error in double(nm * nm) : vector size cannot be NA*
>> > *In addition: Warning message:**In nm * nm : NAs produced by integer
>> > overflow*
>>
>>
>> Digging into the functions called within chainsaw2, I found that at some
>> point it uses the function get_max_height_tree, that calls dist.nodes and
>> that's where I think the problem lies. The error I got now is almost
>> exactly the same as I got from timeSliceTree (because both cases use
>> dist.nodes):
>>
>>
>> > *dist.nodes*
>> > *function (x) *
>> > *{*
>> > *x <- reorder(x)*
>> > *n <- Ntip(x)*
>> > *m <- x$Nnode*
>> > *nm <- n + m*
>> > *d <- .C(dist_nodes, as.integer(n), as.integer(m),
>> > as.integer(x$edge[, *
>> > *1] - 1L), as.integer(x$edge[, 2] - 1L),
>> > as.double(x$edge.length), *
>> > *as.integer(Nedge(x)), double(nm * nm), NAOK = TRUE)[[7]]*
>> > *dim(d) <- c(nm, nm)*
>> > *dimnames(d) <- list(1:nm, 1:nm)*
>> > *d**}*
>>
>>
>> I tried changing the highlighted part to something like
>> double(as.numeric(nm) * as.numeric(nm)), and when I try running it, I get
>> the error I wrote on the first e-mail:
>>
>>
>> > *Error in dist.nodes(tree) (from #7) : **  long vectors (argument 7) are
>> > not supported in .Fortran*
>>
>>
>> Thus, I think that to solve this problem some tweak in the C/Fortran code
>> that is called within dist.nodes (from ape) might be required, but I have
>> no expertise on that. So if someone can help me with that, I'll appreciate
>> it!
>>
>> Thanks again for the help so far!
>>
>> Best,
>>
>>
>> *Gustavo Burin Ferreira, **Msc.*
>>
>> Instituto de Biociências
>> Universidade de São Paulo
>> Tel: (11) 98525-8948
>>
>> On Fri, Oct 16, 2015 at 5:06 PM, Nick Matzke  wrote:
>>
>> > Hi!  I re-did chainsaw at some point, now there is chainsaw2.  However,
>> > googling that gets you horror movies, so here is a link with example
>> code:
>> >
>> > https://groups.google.com/d/msg/biogeobears/Jy9uYckOL7s/XuNZ0B3jAwAJ
>> >
>> > (the discussion there points out a rare case where this crashes, but for
>> > most trees it should work fine)
>> >
>> > Cheers, Nick
>> >
>> > On Fri, Oct 16, 2015 at 2:17 PM, David Bapst  wrote:
>> >
>> > > Hi Gustavo,
>> > >
>> > > I'm paleotree's author and maintainer. Just to be clear that I
>> > > understand your problem, I believe you are saying that when you use
>> > > timeSliceTree, you are getting an error that the internal call to
>> > > dist.nodes is failing? Is that right?
>> > >
>> > > The first thought I have is that maybe the solution here is to avoid
>> > > dist.nodes, as it is somewhat overkill. I use dist.nodes in that code,
>> > > which I wrote in 2011, to get the distance of tips and nodes from the
>> > > root. A better solution may now exist in another R package. I'd have
>> > > to investigate (although maybe someone on the list can suggest one).
>> > >
>> > > The second thought I have is that there might be alternative functions
>> > > that do something lie timeSliceTree in another R package. Off the top
>> > > of my head, I recall that Nick Matzke had a similar, 'chainsaw'

Re: [R-sig-phylo] dist.nodes crashing with big trees

2015-10-20 Thread Klaus Schliep
Hi Gustavo & David,

I attached a file that contains a function timeSliceTree2, which is a
replacement for timeSliceTree.
I replaced
dist.nodes(tree)[, Ntip(tree) + 1]
with
node.depth.edgelength(tree)
dist.nodes computes a matrix (roughly nTips^2 * 4 * 8 bytes which can get
very large for many taxa)
whereas node.depth.edgelength just the vector the function needs. This
caused the memory problem.
I also simplified a unnecessary sapply and put prop.part out of a loop.
So it needs less memory and is also much faster for large trees. The
results should be exactly the same.

Cheers,
Klaus



> source("timeSliceTree.R")
> library(paleotree)
> set.seed(123)
> tree = rtree(2000)
system.time(tree1 <-
timeSliceTree(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
   user  system elapsed
  7.416   0.033   7.454
> system.time(tree2 <-
timeSliceTree2(tree,sliceTime=5,plot=FALSE,drop.extinct=FALSE))
Warning: no ttree$root.time! Assuming latest tip is at present (time=0)
   user  system elapsed
  0.147   0.003   0.151




On Tue, Oct 20, 2015 at 11:49 AM, Gustavo Burin Ferreira <
ariete...@gmail.com> wrote:

> Hey David and Nick,
>
> thanks a lot for the quick responses! I think I wasn't very clear in the
> first e-mail. What I get is actually an error from within dist.nodes, not
> when calling it.
>
> I've tried to use chainsaw2 and in the beginning it appeared to be working
> quite well. However after some running time, I get the same (original)
> error that motivated me writing to the list:
>
>
> > *Error in double(nm * nm) : vector size cannot be NA*
> > *In addition: Warning message:**In nm * nm : NAs produced by integer
> > overflow*
>
>
> Digging into the functions called within chainsaw2, I found that at some
> point it uses the function get_max_height_tree, that calls dist.nodes and
> that's where I think the problem lies. The error I got now is almost
> exactly the same as I got from timeSliceTree (because both cases use
> dist.nodes):
>
>
> > *dist.nodes*
> > *function (x) *
> > *{*
> > *x <- reorder(x)*
> > *n <- Ntip(x)*
> > *m <- x$Nnode*
> > *nm <- n + m*
> > *d <- .C(dist_nodes, as.integer(n), as.integer(m),
> > as.integer(x$edge[, *
> > *1] - 1L), as.integer(x$edge[, 2] - 1L),
> > as.double(x$edge.length), *
> > *as.integer(Nedge(x)), double(nm * nm), NAOK = TRUE)[[7]]*
> > *dim(d) <- c(nm, nm)*
> > *dimnames(d) <- list(1:nm, 1:nm)*
> > *d**}*
>
>
> I tried changing the highlighted part to something like
> double(as.numeric(nm) * as.numeric(nm)), and when I try running it, I get
> the error I wrote on the first e-mail:
>
>
> > *Error in dist.nodes(tree) (from #7) : **  long vectors (argument 7) are
> > not supported in .Fortran*
>
>
> Thus, I think that to solve this problem some tweak in the C/Fortran code
> that is called within dist.nodes (from ape) might be required, but I have
> no expertise on that. So if someone can help me with that, I'll appreciate
> it!
>
> Thanks again for the help so far!
>
> Best,
>
>
> *Gustavo Burin Ferreira, **Msc.*
> Instituto de Biociências
> Universidade de São Paulo
> Tel: (11) 98525-8948
>
> On Fri, Oct 16, 2015 at 5:06 PM, Nick Matzke  wrote:
>
> > Hi!  I re-did chainsaw at some point, now there is chainsaw2.  However,
> > googling that gets you horror movies, so here is a link with example
> code:
> >
> > https://groups.google.com/d/msg/biogeobears/Jy9uYckOL7s/XuNZ0B3jAwAJ
> >
> > (the discussion there points out a rare case where this crashes, but for
> > most trees it should work fine)
> >
> > Cheers, Nick
> >
> > On Fri, Oct 16, 2015 at 2:17 PM, David Bapst  wrote:
> >
> > > Hi Gustavo,
> > >
> > > I'm paleotree's author and maintainer. Just to be clear that I
> > > understand your problem, I believe you are saying that when you use
> > > timeSliceTree, you are getting an error that the internal call to
> > > dist.nodes is failing? Is that right?
> > >
> > > The first thought I have is that maybe the solution here is to avoid
> > > dist.nodes, as it is somewhat overkill. I use dist.nodes in that code,
> > > which I wrote in 2011, to get the distance of tips and nodes from the
> > > root. A better solution may now exist in another R package. I'd have
> > > to investigate (although maybe someone on the list can suggest one).
> > >
> > > The second thought I have is that there might be alternative functions
> > > that do something lie timeSliceTree in another R package. Off the top
> > > of my head, I recall that Nick Matzke had a similar, 'chainsaw'
> > > function, which you can find here and appears not to call dist.nodes:
> > >
> > > https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/001483.html
> > >
> > > Again, maybe someone on the list knows of a good alternative function.
> > >
> > > I'll try to give this more thought, but for now, maybe see if you can
> > > use Nick's function succesfully. Overall though, I've discovered the
> > > use o

Re: [R-sig-phylo] dist.nodes crashing with big trees

2015-10-20 Thread Gustavo Burin Ferreira
Hey David and Nick,

thanks a lot for the quick responses! I think I wasn't very clear in the
first e-mail. What I get is actually an error from within dist.nodes, not
when calling it.

I've tried to use chainsaw2 and in the beginning it appeared to be working
quite well. However after some running time, I get the same (original)
error that motivated me writing to the list:


> *Error in double(nm * nm) : vector size cannot be NA*
> *In addition: Warning message:**In nm * nm : NAs produced by integer
> overflow*


Digging into the functions called within chainsaw2, I found that at some
point it uses the function get_max_height_tree, that calls dist.nodes and
that's where I think the problem lies. The error I got now is almost
exactly the same as I got from timeSliceTree (because both cases use
dist.nodes):


> *dist.nodes*
> *function (x) *
> *{*
> *x <- reorder(x)*
> *n <- Ntip(x)*
> *m <- x$Nnode*
> *nm <- n + m*
> *d <- .C(dist_nodes, as.integer(n), as.integer(m),
> as.integer(x$edge[, *
> *1] - 1L), as.integer(x$edge[, 2] - 1L),
> as.double(x$edge.length), *
> *as.integer(Nedge(x)), double(nm * nm), NAOK = TRUE)[[7]]*
> *dim(d) <- c(nm, nm)*
> *dimnames(d) <- list(1:nm, 1:nm)*
> *d**}*


I tried changing the highlighted part to something like
double(as.numeric(nm) * as.numeric(nm)), and when I try running it, I get
the error I wrote on the first e-mail:


> *Error in dist.nodes(tree) (from #7) : **  long vectors (argument 7) are
> not supported in .Fortran*


Thus, I think that to solve this problem some tweak in the C/Fortran code
that is called within dist.nodes (from ape) might be required, but I have
no expertise on that. So if someone can help me with that, I'll appreciate
it!

Thanks again for the help so far!

Best,


*Gustavo Burin Ferreira, **Msc.*
Instituto de Biociências
Universidade de São Paulo
Tel: (11) 98525-8948

On Fri, Oct 16, 2015 at 5:06 PM, Nick Matzke  wrote:

> Hi!  I re-did chainsaw at some point, now there is chainsaw2.  However,
> googling that gets you horror movies, so here is a link with example code:
>
> https://groups.google.com/d/msg/biogeobears/Jy9uYckOL7s/XuNZ0B3jAwAJ
>
> (the discussion there points out a rare case where this crashes, but for
> most trees it should work fine)
>
> Cheers, Nick
>
> On Fri, Oct 16, 2015 at 2:17 PM, David Bapst  wrote:
>
> > Hi Gustavo,
> >
> > I'm paleotree's author and maintainer. Just to be clear that I
> > understand your problem, I believe you are saying that when you use
> > timeSliceTree, you are getting an error that the internal call to
> > dist.nodes is failing? Is that right?
> >
> > The first thought I have is that maybe the solution here is to avoid
> > dist.nodes, as it is somewhat overkill. I use dist.nodes in that code,
> > which I wrote in 2011, to get the distance of tips and nodes from the
> > root. A better solution may now exist in another R package. I'd have
> > to investigate (although maybe someone on the list can suggest one).
> >
> > The second thought I have is that there might be alternative functions
> > that do something lie timeSliceTree in another R package. Off the top
> > of my head, I recall that Nick Matzke had a similar, 'chainsaw'
> > function, which you can find here and appears not to call dist.nodes:
> >
> > https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/001483.html
> >
> > Again, maybe someone on the list knows of a good alternative function.
> >
> > I'll try to give this more thought, but for now, maybe see if you can
> > use Nick's function succesfully. Overall though, I've discovered the
> > use of truly gigantic trees can often run into unexpected problems.
> >
> > Cheers,
> > -Dave
> >
> >
> >
> > On Fri, Oct 16, 2015 at 12:47 PM, Gustavo Burin Ferreira
> >  wrote:
> > > Dear list,
> > >
> > > I'm trying to perform a time travel in simulated phylogenies with both
> > > extant and extinct species using the timeSliceTree function form the
> > > paleotree package. My aim is to have the molecular phylogenies derived
> > from
> > > the complete phylogeny (attached) in different points in time.
> > >
> > > However, when I try that with big trees (bigger than 2 tips
> total), I
> > > get an error of integer overflow coming from the dist.nodes function.
> > After
> > > slightly tweaking the dist.nodes function (changing nm from integer to
> > > numeric/double), I get the following message:
> > >
> > > Error in dist.nodes(tree) (from #7) :
> > >   long vectors (argument 7) are not supported in .Fortran
> > >
> > > Since I don't know much about C or Fortran, I couldn't find a way of
> > solving
> > > this by myself, so any help will be greatly appreciated.
> > >
> > > I'm sending one tree attached for example.
> > >
> > > Thank you very much in advance!
> > >
> > > Best,
> > >
> > > Gustavo Burin Ferreira, Msc.
> > > Instituto de Biociências
> > > Universidade de São Paulo
> > > Tel: (11) 98525-8948
> > >
> > > __

Re: [R-sig-phylo] dist.nodes crashing with big trees

2015-10-16 Thread Nick Matzke
Hi!  I re-did chainsaw at some point, now there is chainsaw2.  However,
googling that gets you horror movies, so here is a link with example code:

https://groups.google.com/d/msg/biogeobears/Jy9uYckOL7s/XuNZ0B3jAwAJ

(the discussion there points out a rare case where this crashes, but for
most trees it should work fine)

Cheers, Nick

On Fri, Oct 16, 2015 at 2:17 PM, David Bapst  wrote:

> Hi Gustavo,
>
> I'm paleotree's author and maintainer. Just to be clear that I
> understand your problem, I believe you are saying that when you use
> timeSliceTree, you are getting an error that the internal call to
> dist.nodes is failing? Is that right?
>
> The first thought I have is that maybe the solution here is to avoid
> dist.nodes, as it is somewhat overkill. I use dist.nodes in that code,
> which I wrote in 2011, to get the distance of tips and nodes from the
> root. A better solution may now exist in another R package. I'd have
> to investigate (although maybe someone on the list can suggest one).
>
> The second thought I have is that there might be alternative functions
> that do something lie timeSliceTree in another R package. Off the top
> of my head, I recall that Nick Matzke had a similar, 'chainsaw'
> function, which you can find here and appears not to call dist.nodes:
>
> https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/001483.html
>
> Again, maybe someone on the list knows of a good alternative function.
>
> I'll try to give this more thought, but for now, maybe see if you can
> use Nick's function succesfully. Overall though, I've discovered the
> use of truly gigantic trees can often run into unexpected problems.
>
> Cheers,
> -Dave
>
>
>
> On Fri, Oct 16, 2015 at 12:47 PM, Gustavo Burin Ferreira
>  wrote:
> > Dear list,
> >
> > I'm trying to perform a time travel in simulated phylogenies with both
> > extant and extinct species using the timeSliceTree function form the
> > paleotree package. My aim is to have the molecular phylogenies derived
> from
> > the complete phylogeny (attached) in different points in time.
> >
> > However, when I try that with big trees (bigger than 2 tips total), I
> > get an error of integer overflow coming from the dist.nodes function.
> After
> > slightly tweaking the dist.nodes function (changing nm from integer to
> > numeric/double), I get the following message:
> >
> > Error in dist.nodes(tree) (from #7) :
> >   long vectors (argument 7) are not supported in .Fortran
> >
> > Since I don't know much about C or Fortran, I couldn't find a way of
> solving
> > this by myself, so any help will be greatly appreciated.
> >
> > I'm sending one tree attached for example.
> >
> > Thank you very much in advance!
> >
> > Best,
> >
> > Gustavo Burin Ferreira, Msc.
> > Instituto de Biociências
> > Universidade de São Paulo
> > Tel: (11) 98525-8948
> >
> > ___
> > R-sig-phylo mailing list - R-sig-phylo@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> > Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>
>
>
> --
> David W. Bapst, PhD
> Adjunct Asst. Professor, Geology and Geol. Eng.
> South Dakota School of Mines and Technology
> 501 E. St. Joseph
> Rapid City, SD 57701
>
> http://webpages.sdsmt.edu/~dbapst/
> http://cran.r-project.org/web/packages/paleotree/index.html
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] dist.nodes crashing with big trees

2015-10-16 Thread David Bapst
Hi Gustavo,

I'm paleotree's author and maintainer. Just to be clear that I
understand your problem, I believe you are saying that when you use
timeSliceTree, you are getting an error that the internal call to
dist.nodes is failing? Is that right?

The first thought I have is that maybe the solution here is to avoid
dist.nodes, as it is somewhat overkill. I use dist.nodes in that code,
which I wrote in 2011, to get the distance of tips and nodes from the
root. A better solution may now exist in another R package. I'd have
to investigate (although maybe someone on the list can suggest one).

The second thought I have is that there might be alternative functions
that do something lie timeSliceTree in another R package. Off the top
of my head, I recall that Nick Matzke had a similar, 'chainsaw'
function, which you can find here and appears not to call dist.nodes:

https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/001483.html

Again, maybe someone on the list knows of a good alternative function.

I'll try to give this more thought, but for now, maybe see if you can
use Nick's function succesfully. Overall though, I've discovered the
use of truly gigantic trees can often run into unexpected problems.

Cheers,
-Dave



On Fri, Oct 16, 2015 at 12:47 PM, Gustavo Burin Ferreira
 wrote:
> Dear list,
>
> I'm trying to perform a time travel in simulated phylogenies with both
> extant and extinct species using the timeSliceTree function form the
> paleotree package. My aim is to have the molecular phylogenies derived from
> the complete phylogeny (attached) in different points in time.
>
> However, when I try that with big trees (bigger than 2 tips total), I
> get an error of integer overflow coming from the dist.nodes function. After
> slightly tweaking the dist.nodes function (changing nm from integer to
> numeric/double), I get the following message:
>
> Error in dist.nodes(tree) (from #7) :
>   long vectors (argument 7) are not supported in .Fortran
>
> Since I don't know much about C or Fortran, I couldn't find a way of solving
> this by myself, so any help will be greatly appreciated.
>
> I'm sending one tree attached for example.
>
> Thank you very much in advance!
>
> Best,
>
> Gustavo Burin Ferreira, Msc.
> Instituto de Biociências
> Universidade de São Paulo
> Tel: (11) 98525-8948
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



-- 
David W. Bapst, PhD
Adjunct Asst. Professor, Geology and Geol. Eng.
South Dakota School of Mines and Technology
501 E. St. Joseph
Rapid City, SD 57701

http://webpages.sdsmt.edu/~dbapst/
http://cran.r-project.org/web/packages/paleotree/index.html

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/