Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Klaus Schliep
Hi Yan,
Joseph was right. In read.nexus you need a TRANSLATE block, just a
TAXLABELS is not enough. Then read.nexus returns the compressed object and
is 10x faster to read in (for 1000 trees with 1000 taxa on my machine).
There is also the package rncl (Nexus Class Library), it is faster to read
in, even the pure R implementation with the TRANSLATE block is almost as
fast.
However the objects are actually quite a bit larger. It also stores the
edge matrix as doubles, and which I find dangerous.
Cheers,
Klaus

On Wed, Dec 14, 2016 at 4:44 PM, Yan Wong  wrote:

>
> On 14 Dec 2016, at 20:57, Emmanuel Paradis 
> wrote:
>
> > What is the size of your problem?
>
> Erm, quite large. I am looking at tree comparison metrics for roughly
> 10,000 trees with perhaps 10,000 tips on each, replicated several times.
> The newick files themselves take up gigabyes uncompressed. For this sized
> problem I’m likely to implement my own comparison metrics, but I want to
> trial this out with a tested library before rolling my own.
>
> > Do you use a recent version of ape? This function was improved one or
> two years ago.
>
> Yes, 4.0.
>
> But I’m happy for the moment to just leave this stuff running for days on
> a server, so it was just a quick suggestion really.
>
> Thanks for the quick reply
>
> Yan
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-
> sig-ph...@r-project.org/
>



-- 
Klaus Schliep
Postdoctoral Fellow
Revell Lab, University of Massachusetts Boston
http://www.phangorn.org/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Yan Wong

On 14 Dec 2016, at 21:06, Emmanuel Paradis  wrote:

> If the trees are in a NEXUS file with a TRANSLATE block, then the output is a 
> compressed list. So applying .compressTipLabel returns the list unmodified 
> (which should be almost instantaneous).

Ah, I see what I was doing wrong. I used a BEGIN TAXA;TAXLABELS ... END; block, 
rather than a TRANSLATE block within the TREES block. The read.nexus() function 
now works as Joseph Brown surmised. So the easiest way for me to do this is 
simply to use a nexus format trees file.

Thanks

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Yan Wong

On 14 Dec 2016, at 20:57, Emmanuel Paradis  wrote:

> What is the size of your problem?

Erm, quite large. I am looking at tree comparison metrics for roughly 10,000 
trees with perhaps 10,000 tips on each, replicated several times. The newick 
files themselves take up gigabyes uncompressed. For this sized problem I’m 
likely to implement my own comparison metrics, but I want to trial this out 
with a tested library before rolling my own.

> Do you use a recent version of ape? This function was improved one or two 
> years ago.

Yes, 4.0.

But I’m happy for the moment to just leave this stuff running for days on a 
server, so it was just a quick suggestion really.

Thanks for the quick reply

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Comparing support values on different trees

2016-12-14 Thread Emmanuel Paradis

Hi Jake,

What you describe looks very musch like the Lento method implemented in 
the function lento() in phangorn. consensusNet(), also in phangorn, 
implements something similar: the consensus network.


prop.part(), in ape, is the function behind the two previous ones. 
bitsplits() is more efficient with a different output format. There are 
functions as.prop.part() and as.bitsplits() to convert among these classes.


Best,

Emmanuel

Le 14/12/2016 à 19:01, Jacob Berv a écrit :

To clarify - the idea here is that you are asking which clades appear in 
’subordinate’ trees relative to clades that exist in a consensus tree, and then 
interrogating the support values of the shared clades which exist in the 
’subordinate’ trees? So for example, clade A appears in consensus tree X, and 
also appears in gene trees 1-5 - so this will give me the summary of the 
support values for clade A in gene trees 1-5?

Seems like this would be a useful function to interrogate support values among 
clades recapitulated in gene trees relative to a species tree. Would there be 
an analogue for clades that exist in your consensus that don’t exist in 
’subordinate’ trees?

Jake



On Dec 14, 2016, at 12:41 PM, Keith Barker  wrote:

Frank:

You can import all of the trees into one or more multiPhylo objects, then use 
the ape functions prop.part or prop.clades (depending on what you want to do) 
to summarize different subsets (e.g., from different analyses). Here is an 
example with simulated trees:

x<-rmtree(50,100)
plot(x[[1]])
nodelabels(prop.clades(x[[1]],x))
y<-rep(x[1],50)
plot(x[[1]])
nodelabels(prop.clades(x[[1]],c(x,y)))

The first part just creates a bunch of random trees, so most nodes will only be 
supported by around 1-4 or so trees. The second part just repeats tree one 50 
times, and when you label the nodes with trees x+ tres yy, you get 50 plus the 
number of trees from part 1. That should give you the idea.

You can find the shared clades from the "best" trees (if you are doing ML) by 
first calculating the strict consensus using the consensus function in ape.

Hope that helps,
Keith

On 12/14/16 10:19 AM, Frank T Burbrink wrote:

Hello,

I have one question

Is there a method to compare the support values (either bootstraps or Pp) 
across all shared clades between two or more different trees having identical 
taxa? I believe this method would have to first identify the shared clades and 
then determine the measure of support at each shared node.

Thank you,

Frank

Frank T. Burbrink, Ph.D.
Associate Curator
Department of Herpetology
American Museum of Natural History
Central Park West at 79th Street
New York, NY 10024-5192

fburbr...@amnh.org

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


--
F. Keith Barker, Ph.D.
Associate Professor, Department of Ecology, Evolution and Behavior
Curator of Genetic Resources, and
Interim Curator of Birds, Bell Museum of Natural History
University of Minnesota
140 Gortner Laboratory
1479 Gortner Ave
Saint Paul, MN 55108
612.624.2737 (phone)
612.624.6777 (fax)
barke...@umn.edu
http://www.tc.umn.edu/~barke042

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Emmanuel Paradis
If the trees are in a NEXUS file with a TRANSLATE block, then the output 
is a compressed list. So applying .compressTipLabel returns the list 
unmodified (which should be almost instantaneous).


Best,

Emmanuel

Le 14/12/2016 à 16:51, Yan Wong a écrit :


On 14 Dec 2016, at 15:33, Joseph W. Brown  wrote:


I wonder if reading in a Nexus file with a translation table bypasses this 
problem?


Cheers,

If I try read.nexus with a TAXLABELS entry, it still (oddly) results in a 
multiPhylo structure of the same size as before running .compressTipLabel. 
However, when I then do .compressTipLabel() it only takes a moment. My guess is 
this is something to do with skipping the renumbering process. It would be nice 
to have the option in both read.nexus and read.tree, so that I don’t have to 
allocate memory (many GB in my case) for the intermediate step.

Thanks

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Emmanuel Paradis

Hi Yan,

I tried with 10,000 trees each with 1000 tips and it took a bit more 
than 1 sec:


R> tr <- rmtree(1, 1000)
R> system.time(a <- .compressTipLabel(tr))
utilisateur système  écoulé
  1.124   0.036   1.161

And yes the memory footprint is substantially decreased:

R> print(object.size(tr), unit="Mb")
850.6 Mb
R> print(object.size(a), unit="Mb")
315.7 Mb

What is the size of your problem?

Do you use a recent version of ape? This function was improved one or 
two years ago.


Best,

Emmanuel

Le 14/12/2016 à 16:16, Yan Wong a écrit :

Hi,

I’m reading in a large number of newick trees with the same tips, all from a single 
file. If I do trees<-read.trees() followed by trees <- 
.compressTipLabel(trees), it reduces the memory footprint well, but takes an age to 
run. I can’t help thinking this could be sped up during the reading process by 
passing an option to read.trees() to specify that the tip labels are the same in each 
tree in the multiPhylo object. Has anyone implemented such an option?

Cheers

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/3014IN50W4c




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Comparing support values on different trees

2016-12-14 Thread Jacob Berv
To clarify - the idea here is that you are asking which clades appear in 
’subordinate’ trees relative to clades that exist in a consensus tree, and then 
interrogating the support values of the shared clades which exist in the 
’subordinate’ trees? So for example, clade A appears in consensus tree X, and 
also appears in gene trees 1-5 - so this will give me the summary of the 
support values for clade A in gene trees 1-5?

Seems like this would be a useful function to interrogate support values among 
clades recapitulated in gene trees relative to a species tree. Would there be 
an analogue for clades that exist in your consensus that don’t exist in 
’subordinate’ trees?

Jake


> On Dec 14, 2016, at 12:41 PM, Keith Barker  wrote:
> 
> Frank:
> 
> You can import all of the trees into one or more multiPhylo objects, then use 
> the ape functions prop.part or prop.clades (depending on what you want to do) 
> to summarize different subsets (e.g., from different analyses). Here is an 
> example with simulated trees:
> 
> x<-rmtree(50,100)
> plot(x[[1]])
> nodelabels(prop.clades(x[[1]],x))
> y<-rep(x[1],50)
> plot(x[[1]])
> nodelabels(prop.clades(x[[1]],c(x,y)))
> 
> The first part just creates a bunch of random trees, so most nodes will only 
> be supported by around 1-4 or so trees. The second part just repeats tree one 
> 50 times, and when you label the nodes with trees x+ tres yy, you get 50 plus 
> the number of trees from part 1. That should give you the idea.
> 
> You can find the shared clades from the "best" trees (if you are doing ML) by 
> first calculating the strict consensus using the consensus function in ape.
> 
> Hope that helps,
> Keith
> 
> On 12/14/16 10:19 AM, Frank T Burbrink wrote:
>> Hello,
>> 
>> I have one question
>> 
>> Is there a method to compare the support values (either bootstraps or Pp) 
>> across all shared clades between two or more different trees having 
>> identical taxa? I believe this method would have to first identify the 
>> shared clades and then determine the measure of support at each shared node.
>> 
>> Thank you,
>> 
>> Frank
>> 
>> Frank T. Burbrink, Ph.D.
>> Associate Curator
>> Department of Herpetology
>> American Museum of Natural History
>> Central Park West at 79th Street
>> New York, NY 10024-5192
>> 
>> fburbr...@amnh.org
>> 
>>  [[alternative HTML version deleted]]
>> 
>> ___
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
> 
> -- 
> F. Keith Barker, Ph.D.
> Associate Professor, Department of Ecology, Evolution and Behavior
> Curator of Genetic Resources, and
> Interim Curator of Birds, Bell Museum of Natural History
> University of Minnesota
> 140 Gortner Laboratory
> 1479 Gortner Ave
> Saint Paul, MN 55108
> 612.624.2737 (phone)
> 612.624.6777 (fax)
> barke...@umn.edu
> http://www.tc.umn.edu/~barke042
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Comparing support values on different trees

2016-12-14 Thread Keith Barker

Frank:

You can import all of the trees into one or more multiPhylo objects, 
then use the ape functions prop.part or prop.clades (depending on what 
you want to do) to summarize different subsets (e.g., from different 
analyses). Here is an example with simulated trees:


x<-rmtree(50,100)
plot(x[[1]])
nodelabels(prop.clades(x[[1]],x))
y<-rep(x[1],50)
plot(x[[1]])
nodelabels(prop.clades(x[[1]],c(x,y)))

The first part just creates a bunch of random trees, so most nodes will 
only be supported by around 1-4 or so trees. The second part just 
repeats tree one 50 times, and when you label the nodes with trees x+ 
tres yy, you get 50 plus the number of trees from part 1. That should 
give you the idea.


You can find the shared clades from the "best" trees (if you are doing 
ML) by first calculating the strict consensus using the consensus 
function in ape.


Hope that helps,
Keith

On 12/14/16 10:19 AM, Frank T Burbrink wrote:

Hello,

I have one question

Is there a method to compare the support values (either bootstraps or Pp) 
across all shared clades between two or more different trees having identical 
taxa? I believe this method would have to first identify the shared clades and 
then determine the measure of support at each shared node.

Thank you,

Frank

Frank T. Burbrink, Ph.D.
Associate Curator
Department of Herpetology
American Museum of Natural History
Central Park West at 79th Street
New York, NY 10024-5192

fburbr...@amnh.org

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


--
F. Keith Barker, Ph.D.
Associate Professor, Department of Ecology, Evolution and Behavior
Curator of Genetic Resources, and
Interim Curator of Birds, Bell Museum of Natural History
University of Minnesota
140 Gortner Laboratory
1479 Gortner Ave
Saint Paul, MN 55108
612.624.2737 (phone)
612.624.6777 (fax)
barke...@umn.edu
http://www.tc.umn.edu/~barke042

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] Comparing support values on different trees

2016-12-14 Thread Frank T Burbrink
Hello,

I have one question

Is there a method to compare the support values (either bootstraps or Pp) 
across all shared clades between two or more different trees having identical 
taxa? I believe this method would have to first identify the shared clades and 
then determine the measure of support at each shared node.

Thank you,

Frank

Frank T. Burbrink, Ph.D.
Associate Curator
Department of Herpetology
American Museum of Natural History
Central Park West at 79th Street
New York, NY 10024-5192

fburbr...@amnh.org

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Yan Wong

On 14 Dec 2016, at 15:33, Joseph W. Brown  wrote:

> I wonder if reading in a Nexus file with a translation table bypasses this 
> problem?

Cheers,

If I try read.nexus with a TAXLABELS entry, it still (oddly) results in a 
multiPhylo structure of the same size as before running .compressTipLabel. 
However, when I then do .compressTipLabel() it only takes a moment. My guess is 
this is something to do with skipping the renumbering process. It would be nice 
to have the option in both read.nexus and read.tree, so that I don’t have to 
allocate memory (many GB in my case) for the intermediate step.

Thanks

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Joseph W. Brown
I wonder if reading in a Nexus file with a translation table bypasses this 
problem?

JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 14 Dec, 2016, at 10:16, Yan Wong  wrote:
> 
> Hi,
> 
> I’m reading in a large number of newick trees with the same tips, all from a 
> single file. If I do trees<-read.trees() followed by trees <- 
> .compressTipLabel(trees), it reduces the memory footprint well, but takes an 
> age to run. I can’t help thinking this could be sped up during the reading 
> process by passing an option to read.trees() to specify that the tip labels 
> are the same in each tree in the multiPhylo object. Has anyone implemented 
> such an option?
> 
> Cheers
> 
> Yan
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] compressTipLabel as an option to read.trees()

2016-12-14 Thread Yan Wong
Hi,

I’m reading in a large number of newick trees with the same tips, all from a 
single file. If I do trees<-read.trees() followed by trees <- 
.compressTipLabel(trees), it reduces the memory footprint well, but takes an 
age to run. I can’t help thinking this could be sped up during the reading 
process by passing an option to read.trees() to specify that the tip labels are 
the same in each tree in the multiPhylo object. Has anyone implemented such an 
option?

Cheers

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/