Re: [R-sig-phylo] cleaning code in ape

Emmanuel Paradis Fri, 30 Jan 2026 03:54:06 -0800

Dear all,

Thanks for the suggestions so far! Here are two things I have had in mind for 
some time:


1) Compression of data objects (on the same model than the sparse matrices in 
package Matrix and others). For instance, if you do:

library(ape)
data(woodmouse)
alview(woodmouse[, seg.sites(woodmouse, strict=TRUE)])

you can see that it's possible to store only the sites which are different 
compared to the 1st sequence. That would compress the data by more than 3 
times, and the object could be analysed without uncompressing it (base 
frequencies, distances, ...) There may be a way to do it in a "smart" way 
(compressing the sequences sequentially depending on their similarity).

Something similar might be feasable for trees.

2) Sequential access to large files (with caching). In some situations, it 
might be interesting to screen the sequences (and eventually drop some of them) 
before alignment (I'm thinking about users working viral sequences). Biostrings 
(in BioConductor) has this kind of functionality but maybe it'd be nice to have 
this in ape too (and now that we've re-introduced the function mafft() in ape 
that makes sense since MAFFT performs well with big alignments).

The same could be useful for tree files too (eg, if someone has run a very long 
MCMC run).

Best,

Emmanuel

----- Le 29 Jan 26, à 10:24, Vojtěch Zeisek [email protected] a écrit :

> Hello
> 
> Dne úterý 27. ledna 2026 17:14:22, středoevropský standardní čas jste
> napsal(a):
>> Dear all,
>> here are a few explanations of the suggested changes and some
>> subjective comments.
> 
> Thank You for the comments.
> 
>> > this a perfect idea. :-) I'd love to see one two things:
>> > 1) Support for parallelization whenever possible (various distance,
>> > work with multi.whatever objects, ...) to speed things up.
>> 
>> This is generally a good idea. The problem is that parallelization depends
>> on the hardware (cluster, multicore machine) of the user, the operating
>> system (usually easy on Linux, tricky on OSX and Windows) and
>> additionally whether you run R inside a GUI or from the console.
>> Additionally some matrix algebra code might already be parallelized.
>> This depends on the BLAS library you use, so if you use parallelization
>> on top you might slow down your computer.
>> This is why I started using the future and future.apply packages
>> in phangorn instead of mclapply. This puts the user in control to
>> choose the parallelization framework to use and I don't need to
>> check the operating system, number of cores, GUI etc.
>> Some low level openMP stuff in C/C++ code might be still nice.
> 
> Yeah, I know the parallelization is a difficult topic, and I don't know much
> about macOS and Windows. I use to have generally a good experience with
> future.apply. In any case I think we agree that we should add parallelization
> whenever possible. :-)
> IMHO all functions handling multiPhylo, or producing any sort of matrix,
> indices etc. should have parallelization support.
> 
>> > 2) Removal of all the *.mutliPhylo functions, i.e. IMHO the
>> > best would be if every relevant function would support
>> > phylo as well as multiPhylo objects. Now it's a bit confusing
>> > whenever to use which function...
>> 
>> I think there might be a misconception here. We introduced a lot of
>> generic functions to ape, e.g. root, is.rooted, is.ultrametric etc.
>> Now while there exist is.rooted.phylo() and is.rooted.multiPhylo(), users
>> should only need to use is.rooted(x) and don't need to care about the phylo
>> or multiPhylo versions. Maybe it is more a problem with the documentation?
> 
> That's a good point. Yeah, it might be rather confusing feature of
> documentation. From my experience, even more confusing is then that You can
> run just plot() to plot a phylo object, but to get respective help You must
> use ?plot.phylo...
> Sincerely,
> V.
> 
>> > On Mon, Jan 26, 2026 at 9:31 AM Emmanuel Paradis wrote:
>> > > Dear all,
>> > > We are in the process of reviewing the "old" code in ape (some
>> > > written in 2001). Here are a few things that came out recently:
>> > > 1) During a recent discussion, we wondered if the option "..." of
>> > > read.tree() is useful; it is passed internally to scan(). A review of
>> > > the CRAN packages suggests this option is useless so it could
>> > > be removed, at least without breaking those packages. There
>> > > may be other bits of code that can be removed safely in other functions.
>> > > 2) Printing of objects could be improved.
>> > > 3) I've (re)introduced a function mafft() in ape. A function with the
>> > > same name was formerly in ips which is now orphaned on CRAN.
>> > > 4) A review of the man pages (help) would be useful. For instance,
>> > > in ?read.tree one can read: "If there are two root edges (e.g.,
>> > > "(((A:1,B:1):10):10);"), then the tree is not read and an error message
>> > > is issued." [1] which is wrong since all types of Newick tree can be
>> > > read. There are certainly similar outdated statements in the 300
>> > > pages of the manual.
>> > > 5) Klaus suggests to have more functions returning their "return
>> > > value" invisibly to make easier the use of pipe operators (|> or %>%).
>> > > Any thoughts, ideas, or comments are welcome.
>> > > Best,
>> > > Emmanuel
>> > > [1] In version 5.8-1 currently on CRAN; now fixed on GitHub.
> --
> Vojtěch Zeisek
> https://trapa.cz/en/
> 
> Department of Botany, Faculty of Science
> Charles University, Prague, Czech Republic
> https://botany.natur.cuni.cz/
> 
> Institute of Botany, Czech Academy of Sciences
> Průhonice, Czech Republic
> https://www.ibot.cas.cz/en/
> Computing cluster
> https://sorbus.ibot.cas.cz/en/start
> 
> _______________________________________________
> R-sig-phylo mailing list - [email protected]
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/[email protected]/

_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

Re: [R-sig-phylo] cleaning code in ape

Reply via email to