[R-sig-phylo] Random sampling of branch lengths

2014-09-25 Thread Arbuckle, Kevin
Hi,



I have a problem that I am not entirely sure how to code appropriately, and any 
help here would be gratefully appreciated.



Imagine we have a tree consisting only of a topology, for example this newick 
format tree: '(A,(((B,C),D),E));'. Now, consider that we wish to explore how 
variation in branch lengths on this topology influences the results of a 
particular comparative method. One way to do this would be to generate lots of 
trees with randomly sampled branch lengths and look to see whether the 
conclusions drawn are strongly influenced by the branch lengths chosen.



In a simple case where ultrametric trees are unnecessary or at least not 
considered important then we could just randomly sample edge lengths from a 
given distribution, but this poses some problems when ultrametric trees are 
desired. I've attached a figure of the example tree given above for ease of 
explanation. If the total tree height is some arbitrary value (let's say 1), 
then the edge length of the branch leading to A is always going to be 1. If 
there was a sister species to A (let's call it A') along this branch then it 
would also be easy to do - simply randomly sample a value between 0 and 1 (call 
this x), assign that to both branches leading the common ancestor (A,A') to A 
and A', and then assign the edge length from the root to (A,A') as 1-x.



But my problem is thinking how to code a general case for the whole tree, 
incorporating two constraints. Firstly, that the sum of all branches leading 
from each species to root is 1. Secondly, that branch lengths are dependent on 
those of their sister taxon. To illustrate the last point consider that the 
edges of the branches leading from common ancestor (B,C) to each of B and C is 
0.2. This automatically constrains the branch leading to D (call it y) to be 
greater than 0.2 and also that the length of the branch leading from (B,C,D) to 
(B,C) must equal y-0.2. These dependencies make me unsure how to code such a 
function to 'randomly' sample branch lengths to generate an ultrametric tree.



As an additional complication, though not an absolutely necessary one, consider 
that we have information about some node ages. For instance, say we know that 
the age of node (B,C,D,E) is 0.7. How could I incorporate these additional 
(user-specified) constraints into such a function as I'm looking for?



I am currently using the compute.brtime function in ape to obtain branch 
lengths from a coalescent model, but these are often biased towards 'tippy' 
trees and so may not be a representative sample of the possible tree space.



Am I missing something in the form of an existing function, or does anyone have 
any code that could do what I'm asking?



I hope I have explained myself clearly enough.



Cheers,



Kev


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Random sampling of branch lengths

2014-09-25 Thread Emmanuel Paradis

Hi,

compute.brtime() can use a set of pre-calculated branching times (the 
default is indeed a set of coalescent times), for instance:


tr - read.tree(text = (A,(((B,C),D),E));)
compute.brtime(tr, method = 4:1/4)

will produce a tree with root time = 1 and regular inter-node times.

Best,

Emmanuel

Le 25/09/2014 13:43, Arbuckle, Kevin a écrit :

Hi,

I have a problem that I am not entirely sure how to code appropriately,
and any help here would be gratefully appreciated.

Imagine we have a tree consisting only of a topology, for example this
newick format tree: '(A,(((B,C),D),E));'. Now, consider that we wish to
explore how variation in branch lengths on this topology influences the
results of a particular comparative method. One way to do this would be
to generate lots of trees with randomly sampled branch lengths and look
to see whether the conclusions drawn are strongly influenced by the
branch lengths chosen.

In a simple case where ultrametric trees are unnecessary or at least not
considered important then we could just randomly sample edge lengths
from a given distribution, but this poses some problems when ultrametric
trees are desired. I've attached a figure of the example tree given
above for ease of explanation. If the total tree height is some
arbitrary value (let's say 1), then the edge length of the branch
leading to A is always going to be 1. If there was a sister species to A
(let's call it A') along this branch then it would also be easy to do -
simply randomly sample a value between 0 and 1 (call this x), assign
that to both branches leading the common ancestor (A,A') to A and A',
and then assign the edge length from the root to (A,A') as 1-x.

But my problem is thinking how to code a general case for the whole
tree, incorporating two constraints. Firstly, that the sum of all
branches leading from each species to root is 1. Secondly, that branch
lengths are dependent on those of their sister taxon. To illustrate the
last point consider that the edges of the branches leading from common
ancestor (B,C) to each of B and C is 0.2. This automatically constrains
the branch leading to D (call it y) to be greater than 0.2 and also that
the length of the branch leading from (B,C,D) to (B,C) must equal y-0.2.
These dependencies make me unsure how to code such a function to
'randomly' sample branch lengths to generate an ultrametric tree.

As an additional complication, though not an absolutely necessary one,
consider that we have information about some node ages. For instance,
say we know that the age of node (B,C,D,E) is 0.7. How could I
incorporate these additional (user-specified) constraints into such a
function as I'm looking for?

I am currently using the compute.brtime function in ape to obtain branch
lengths from a coalescent model, but these are often biased towards
'tippy' trees and so may not be a representative sample of the possible
tree space.

Am I missing something in the form of an existing function, or does
anyone have any code that could do what I'm asking?

I hope I have explained myself clearly enough.

Cheers,

Kev







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Random sampling of branch lengths

2014-09-25 Thread Liam J. Revell

Just to follow up on Emmanuel's point, something like this:

tr - read.tree(text = (A,(((B,C),D),E));)
bt-rexp(n=tr$Nnode,rate=Ntip(tr):2)
tree-compute.brtime(tr,method=cumsum(bt)[length(bt):1])
plotTree(tree)

(I believe) would give you internode times as if the tree arose under a 
pure-birth (i.e., Yule) process


All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 9/26/2014 12:51 AM, Emmanuel Paradis wrote:

Hi,

compute.brtime() can use a set of pre-calculated branching times (the
default is indeed a set of coalescent times), for instance:

tr - read.tree(text = (A,(((B,C),D),E));)
compute.brtime(tr, method = 4:1/4)

will produce a tree with root time = 1 and regular inter-node times.

Best,

Emmanuel

Le 25/09/2014 13:43, Arbuckle, Kevin a écrit :

Hi,

I have a problem that I am not entirely sure how to code appropriately,
and any help here would be gratefully appreciated.

Imagine we have a tree consisting only of a topology, for example this
newick format tree: '(A,(((B,C),D),E));'. Now, consider that we wish to
explore how variation in branch lengths on this topology influences the
results of a particular comparative method. One way to do this would be
to generate lots of trees with randomly sampled branch lengths and look
to see whether the conclusions drawn are strongly influenced by the
branch lengths chosen.

In a simple case where ultrametric trees are unnecessary or at least not
considered important then we could just randomly sample edge lengths
from a given distribution, but this poses some problems when ultrametric
trees are desired. I've attached a figure of the example tree given
above for ease of explanation. If the total tree height is some
arbitrary value (let's say 1), then the edge length of the branch
leading to A is always going to be 1. If there was a sister species to A
(let's call it A') along this branch then it would also be easy to do -
simply randomly sample a value between 0 and 1 (call this x), assign
that to both branches leading the common ancestor (A,A') to A and A',
and then assign the edge length from the root to (A,A') as 1-x.

But my problem is thinking how to code a general case for the whole
tree, incorporating two constraints. Firstly, that the sum of all
branches leading from each species to root is 1. Secondly, that branch
lengths are dependent on those of their sister taxon. To illustrate the
last point consider that the edges of the branches leading from common
ancestor (B,C) to each of B and C is 0.2. This automatically constrains
the branch leading to D (call it y) to be greater than 0.2 and also that
the length of the branch leading from (B,C,D) to (B,C) must equal y-0.2.
These dependencies make me unsure how to code such a function to
'randomly' sample branch lengths to generate an ultrametric tree.

As an additional complication, though not an absolutely necessary one,
consider that we have information about some node ages. For instance,
say we know that the age of node (B,C,D,E) is 0.7. How could I
incorporate these additional (user-specified) constraints into such a
function as I'm looking for?

I am currently using the compute.brtime function in ape to obtain branch
lengths from a coalescent model, but these are often biased towards
'tippy' trees and so may not be a representative sample of the possible
tree space.

Am I missing something in the form of an existing function, or does
anyone have any code that could do what I'm asking?

I hope I have explained myself clearly enough.

Cheers,

Kev







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/