----- Le 27 Juin 20, à 16:53, Yan Wong y...@pixie.org.uk a écrit :
> This is extremely helpful, thanks Emmanuel. I think it might be useful to note
> this in the documentation for multi2di (or give a pointer to the description),
> as it wasn’t obvious to me how to find out this information from within R. 
> Even
> better would be an option that allowed equiprobable topologies to be generated
> if need be, although that’s obviously more work.

I'll add an option to both rtree() and multi2di().

> In my use case I could easily have polytomies with thousands or tens of
> thousands of edges, so the algorithmic approach is much more suitable than the
> enumerative one, and thanks for the tips on making it work. But on the topic 
> of
> enumeration, an extremely capable student of ours has been working out ways of
> using integer partitions to enumerate topologies, even for very large trees,
> and sampling from a list of tree ranks, then unranking the result, would
> probably be a lot more scalable than actually creating the trees using 
> allTrees

allTrees() does not accept n > 10 tips to prevent you filling your computer's 
memory.

> (our treatment is in Python, however, and we haven’t published it yet: details
> in https://tskit.readthedocs.io/en/latest/combinatorics.html). I’d be
> interested in knowing what prior art there is in this area, and if what our
> student has done is of any use to APE? The concepts scale quite well to
> mid-sized trees, and there are also separate applications to trees with
> millions of tips (which we occasionally have to deal with).

ape has howmanytrees() which computes the number of possible topologies for 
most cases: (un)labelled, (un)rooted, binary or not.

Ranking topologies seems related to previous works on matchings by Diaconis & 
Holmes (see the reference cited in ?as.matching). A related work is the 
geodesic distance implemented in the package distory (which I maintain now). 
The latter is more general since it considers branch lengths.

Maybe there is also a connection between tree ranking and topological distances 
(see ?nni in phangorn).

Best,

Emmanuel

> Cheers
> 
> Yan
> 
>> On 27 Jun 2020, at 10:33, Emmanuel Paradis <emmanuel.para...@ird.fr> wrote:
>> 
>> Hi Yan,
>> 
>> multi2di() calls rtree() if random = TRUE. As you rightly guessed, the 
>> algorithm
>> used by this latter function is (described in APER2, p. 313):
>> 
>> 1. Draw randomly an integer a on the interval [1, n − 1]. Set b = n − a.
>> 2. If a > 1, apply (recursively) step 1 after substituting n by a.
>> 3. Repeat step 2 with b in place of a.
>> 4. Assign randomly the n tip labels to the tips.
>> 
>> This is indeed results in unequal frequencies of the possible topologies. If
>> step 1 "n − 1" is replaced by "floor(n/2)", then all topologies are generated
>> with equal probability.
>> 
>> Practically, I see two possibilities for you. If pratical, you generate a 
>> list
>> with all possible topologies, for instance with phangorn's function allTrees:
>> 
>> R> library(phangorn)
>> R> TR <- allTrees(4, rooted = TRUE)
>> R> TR
>> 15 phylogenetic trees
>> 
>> Then you just draw randomly some integers between 1 and 15 and use them as
>> indices:
>> 
>> R> s <- sample.int(length(TR), size = 1e5, replace = TRUE)
>> R> TR[s]
>> 100000 phylogenetic trees
>> 
>> If you want, you can play with the option 'prob' of sample.int().
>> 
>> The other solution is to make copies of rtree() and multi2di() in your
>> workspace: only rtree() needs to be modified and only where sample.int() is
>> called in the recursive function foo. You also need to copy multi2di() even
>> unchanged because of the package's namespace.
>> 
>> HTH
>> 
> > Best,

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to