Sweet! I considered preallocation, but figured it was not worth it. Guess I was wrong!
JWB ________________________________________ Joseph W. Brown Post-doctoral Researcher, Smith Laboratory University of Michigan Department of Ecology & Evolutionary Biology Room 2071, Kraus Natural Sciences Building Ann Arbor MI 48109-1079 josep...@umich.edu > On 9 Jun, 2017, at 18:28, Klaus Schliep <klaus.schl...@gmail.com> wrote: > > Hi Joseph > > I guess I figured out where some room for improvement was. > You grow the vector pars <- c(pars, nd) > I preallocated pars, but you could even just grow pars using pars[k] <- nd > without pre-allocation. This operation is now (with R >= 3.4) far less costly > as mentioned in the NEWS: > "Assigning to an element of a vector beyond the current length now > over-allocates by a small fraction. The new vector is marked internally as > growable, and the true length of the new vector is stored in the truelength > field. This makes building up a vector result by assigning to the next > element beyond the current length more efficient, though pre-allocating is > still preferred. The implementation is subject to change and not intended to > be used in packages at this time." > > One of the many little improvements recently making R quite a bit faster and > easier. > > So the worst case behavior was pretty bad > > tree <- stree(100000, "right") > system.time(get_mrca_fast(tree, c(1,2))) > user system elapsed > 13.424 0.012 13.440 > mrca > [1] 199999 > > with the changes it is much better: > > system.time(mrca <- get_mrca_fast(tree, c(1,2))) > user system elapsed > 0.012 0.000 0.012 > > mrca > [1] 199999 > > And here is the code: > > get_mrca_fast <- function (phy, tips) { > rootnd <- length(phy$tip.label) + 1; > pars <- integer(phy$Nnode); # worst case assignment, usually far too > long > mrcaind <- 1; # index in pars of the mrca node. will return highest > one traversed by other lineages > tnd <- NULL; # node ids of tips > if ("character" %in% class(tips)) { > tnd <- match(tips, phy$tip.label); > } else { > tnd <- tips; > } > > # build a lookup table to get parents faster > pvec <- integer(max(phy$edge)); > pvec[phy$edge[,2]] <- phy$edge[,1]; > > # get entire lineage for first tip > nd <- tnd[1]; > for(k in 1:phy$Nnode){ > nd <- pvec[nd] > pars[k] <- nd > if (nd == rootnd) break > } > pars <- pars[1:k] #delete the rest > > # traverse lineages for remaining tips, stop if hit common ancestor > for (i in 2:length(tnd)) { > done <- FALSE; > cnd <- tnd[i]; > while (!done) { > cpar <- pvec[cnd]; # get immediate parent > if (cpar %in% pars) { > if (cpar == rootnd) { > # early exit! if the mrca of any set of taxa is the root > then we are done > return(rootnd); > } > idx <- which(pars == cpar); > if (idx > mrcaind) { > mrcaind <- idx; > } > done <- TRUE; > } else { > # keep going! > cnd <- cpar; > } > } > } > return(pars[mrcaind]); > } > > > Have a nice weekend, > Klaus > > On Fri, Jun 9, 2017 at 3:59 PM, Joseph W. Brown <josep...@umich.edu > <mailto:josep...@umich.edu>> wrote: > Liam, > > I put the (updated) code up as a gist > <https://gist.github.com/josephwb/ad61fd29ed4fb06e712e23d67422c813 > <https://gist.github.com/josephwb/ad61fd29ed4fb06e712e23d67422c813>>. Feel > free to use/improve/whatever. Emmanuel sees room for improvement but frankly > taking 0.3 seconds to find the MRCA of 5000 taxa on a million-tip tree is > good enough for my purposes. > > JWB > ________________________________________ > Joseph W. Brown > Post-doctoral Researcher, Smith Laboratory > University of Michigan > Department of Ecology & Evolutionary Biology > Room 2071, Kraus Natural Sciences Building > Ann Arbor MI 48109-1079 > josep...@umich.edu <mailto:josep...@umich.edu> > > > > > On 9 Jun, 2017, at 15:20, Liam J. Revell <liam.rev...@umb.edu > > <mailto:liam.rev...@umb.edu>> wrote: > > > > On the other hand, phytools does have a function - the somewhat imprecisely > > named fastMRCA - which can find the MRCA of just a pair of species much > > faster than getMRCA (however still slower than or only about as fast as > > Joseph & Klaus's solutions). > > > > Liam J. Revell, Associate Professor of Biology > > University of Massachusetts Boston > > web: http://faculty.umb.edu/liam.revell/ > > <http://faculty.umb.edu/liam.revell/> > > email: liam.rev...@umb.edu <mailto:liam.rev...@umb.edu> > > blog: http://blog.phytools.org <http://blog.phytools.org/> > > > > On 6/9/2017 5:22 AM, Liam J. Revell wrote: > >> > >> Juan. findMRCA was written before getMRCA existed, but the latter was > >> faster so now it just calls getMRCA internally. All the best, Liam > >> > >> Liam J. Revell, Associate Professor of Biology > >> University of Massachusetts Boston > >> web: http://faculty.umb.edu/liam.revell/ > >> <http://faculty.umb.edu/liam.revell/> > >> email: liam.rev...@umb.edu <mailto:liam.rev...@umb.edu> > >> blog: http://blog.phytools.org <http://blog.phytools.org/> > >> > >> On 6/9/2017 1:57 AM, Juan Antonio Balbuena wrote: > >>> Package phytools includes a function, findMRCA, that is supposed to work > >>> very efficiently with large trees. you may wish to compare it with your > >>> function. > >>> > >>> Cheers > >>> > >>> Juan > >>> > >>> -- > >>> > >>> Dr. Juan A. Balbuena > >>> Cavanilles Institute of Biodiversity and Evolutionary Biology > >>> University of Valencia > >>> http://www.uv.es/~balbuena <http://www.uv.es/~balbuena> > >>> <http://www.uv.es/%7Ebalbuena <http://www.uv.es/%7Ebalbuena>> > >>> P.O. Box 22085 > >>> http://www.uv.es/cophylpaco <http://www.uv.es/cophylpaco> > >>> <http://www.uv.es/cavanilles/zoomarin/index.htm > >>> <http://www.uv.es/cavanilles/zoomarin/index.htm>> > >>> 46071 Valencia, Spain > >>> e-mail: j.a.balbu...@uv.es <mailto:j.a.balbu...@uv.es> > >>> <mailto:j.a.balbu...@uv.es <mailto:j.a.balbu...@uv.es>> tel. +34 963 > >>> 543 658 fax +34 963 543 733 <tel:%2B34%20963%20543%20733> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> *NOTE!*For shipments by EXPRESS COURIER use the following street address: > >>> C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain. > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> > >>> > >>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient > >>> > >>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>> > >>> > >>> Libre de virus. www.avast.com <http://www.avast.com/> > >>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient > >>> > >>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>> > >>> > >>> > >>> > >>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > >>> > >>> > >>> _______________________________________________ > >>> R-sig-phylo mailing list - R-sig-phylo@r-project.org > >>> <mailto:R-sig-phylo@r-project.org> > >>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > >>> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> > >>> Searchable archive at > >>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ > >>> <http://www.mail-archive.com/r-sig-phylo@r-project.org/> > >>> > >> > >> _______________________________________________ > >> R-sig-phylo mailing list - R-sig-phylo@r-project.org > >> <mailto:R-sig-phylo@r-project.org> > >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > >> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> > >> Searchable archive at > >> http://www.mail-archive.com/r-sig-phylo@r-project.org/ > >> <http://www.mail-archive.com/r-sig-phylo@r-project.org/> > > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > <mailto:R-sig-phylo@r-project.org> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ > <http://www.mail-archive.com/r-sig-phylo@r-project.org/> > > > -- > Klaus Schliep > Postdoctoral Fellow > Revell Lab, University of Massachusetts Boston > http://www.phangorn.org/ <http://www.phangorn.org/> [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/