----- Le 27 Juin 20, à 16:53, Yan Wong y...@pixie.org.uk a écrit : > This is extremely helpful, thanks Emmanuel. I think it might be useful to note > this in the documentation for multi2di (or give a pointer to the description), > as it wasn’t obvious to me how to find out this information from within R. > Even > better would be an option that allowed equiprobable topologies to be generated > if need be, although that’s obviously more work.
I'll add an option to both rtree() and multi2di(). > In my use case I could easily have polytomies with thousands or tens of > thousands of edges, so the algorithmic approach is much more suitable than the > enumerative one, and thanks for the tips on making it work. But on the topic > of > enumeration, an extremely capable student of ours has been working out ways of > using integer partitions to enumerate topologies, even for very large trees, > and sampling from a list of tree ranks, then unranking the result, would > probably be a lot more scalable than actually creating the trees using > allTrees allTrees() does not accept n > 10 tips to prevent you filling your computer's memory. > (our treatment is in Python, however, and we haven’t published it yet: details > in https://tskit.readthedocs.io/en/latest/combinatorics.html). I’d be > interested in knowing what prior art there is in this area, and if what our > student has done is of any use to APE? The concepts scale quite well to > mid-sized trees, and there are also separate applications to trees with > millions of tips (which we occasionally have to deal with). ape has howmanytrees() which computes the number of possible topologies for most cases: (un)labelled, (un)rooted, binary or not. Ranking topologies seems related to previous works on matchings by Diaconis & Holmes (see the reference cited in ?as.matching). A related work is the geodesic distance implemented in the package distory (which I maintain now). The latter is more general since it considers branch lengths. Maybe there is also a connection between tree ranking and topological distances (see ?nni in phangorn). Best, Emmanuel > Cheers > > Yan > >> On 27 Jun 2020, at 10:33, Emmanuel Paradis <emmanuel.para...@ird.fr> wrote: >> >> Hi Yan, >> >> multi2di() calls rtree() if random = TRUE. As you rightly guessed, the >> algorithm >> used by this latter function is (described in APER2, p. 313): >> >> 1. Draw randomly an integer a on the interval [1, n − 1]. Set b = n − a. >> 2. If a > 1, apply (recursively) step 1 after substituting n by a. >> 3. Repeat step 2 with b in place of a. >> 4. Assign randomly the n tip labels to the tips. >> >> This is indeed results in unequal frequencies of the possible topologies. If >> step 1 "n − 1" is replaced by "floor(n/2)", then all topologies are generated >> with equal probability. >> >> Practically, I see two possibilities for you. If pratical, you generate a >> list >> with all possible topologies, for instance with phangorn's function allTrees: >> >> R> library(phangorn) >> R> TR <- allTrees(4, rooted = TRUE) >> R> TR >> 15 phylogenetic trees >> >> Then you just draw randomly some integers between 1 and 15 and use them as >> indices: >> >> R> s <- sample.int(length(TR), size = 1e5, replace = TRUE) >> R> TR[s] >> 100000 phylogenetic trees >> >> If you want, you can play with the option 'prob' of sample.int(). >> >> The other solution is to make copies of rtree() and multi2di() in your >> workspace: only rtree() needs to be modified and only where sample.int() is >> called in the recursive function foo. You also need to copy multi2di() even >> unchanged because of the package's namespace. >> >> HTH >> > > Best, _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/