Re: [R-sig-phylo] Analysis with Multiple cores on Mac Workstation

Brian O'Meara Sat, 08 Dec 2012 22:16:24 -0800

I agree with Daniel that going in parallel is probably overkill in this
case. However, if you do want to get into parallelization with R, a good
place to start is the CRAN task view on high performance and parallel
computing: http://cran.r-project.org/web/views/HighPerformanceComputing.html.
Like all task views, it provides an overview of the packages in that
domain and an easy way to install all of them at once (see
http://cran.r-project.org/web/views/ ). The built-in parallel package
(since R 2.14) makes it easy to use all your cores using mclapply(). If you
tend to think in for loops (as most of us do) rather than apply functions,
the foreach package combined with doMC is another easy way: there's a good
vignette for this at
http://cran.r-project.org/web/packages/doMC/vignettes/gettingstartedMC.pdf.


Best,
Brian


_______________________________________
Brian O'Meara
Assistant Professor
Dept. of Ecology & Evolutionary Biology
U. of Tennessee, Knoxville
http://www.brianomeara.info

Students wanted: Applications due Dec. 15, annually
Postdoc collaborators wanted: Check NIMBioS' website
Calendar: http://www.brianomeara.info/calendars/omeara


On Sat, Dec 8, 2012 at 7:01 AM, Daniel Barker <d...@st-andrews.ac.uk> wrote:

> Dear Jose,
>
> Is this a problem in practice? The calculation of branch lengths Brian
> describes do not sound very time-consuming to run. If you're dealing with
> an enormous tree or enormous sample, they would take some time - but
> presumably, the greater bottleneck would be obtaining the sample of trees
> in the first place.
>
> Anything can become time-consuming if repeated, e.g. if you're dealing
> with thousands of separate samples of trees. If things are taking too long
> in that situation, the best solution would be to run serial processes, not
> multithreaded - but many of them, submitted to a batch queuing system
> running on the computer. Examples include SLURM, GridEngine, LSF, Unix
> 'batch' (comes as part of OS X but I've never managed to set up the load
> threshold appropriately on OS X), or perhaps Xgrid (I don't know it).
>
> Except in very unusual situations, this kind of 'task farming', if
> properly set up, will always be at least as efficient as 'true'
> parallelisation like multithreading - often faster. It also doesn't
> require any special programming, and parallel programming can be fairly
> painful.
>
> An even simpler approach is to divide the task into n approximately
> equally time-consuming scripts (if you have n CPU cores) and launch them
> all at the same time - which uses all your cores, and doesn't even require
> a batch queuing system.
>
>
> Best wishes,
>
> Daniel
>
> On 08/12/2012 02:43, "José Hidasi" <hidasin...@gmail.com> wrote:
>
> >Dear all,
> >
> >I want to create a consensus tree with branch lengths (Brian O'Meara's
> >function on post "*[R-sig-phylo] Why no branch lengths on consensus
> >trees?*) using
> >a Mac Workstation. However, if I only type the function on it, R will not
> >use all cores for running the analysis. I would like to know if there is a
> >function or any way to divide the analysis within the cores, or to use all
> >cores for running the program.
> >
> >I know this is not the best forum to ask something like that (using
> >multiple cores), but I imagined that someone might have the solution for
> >that as some of you work with large databases.
> >
> >Best,
> >José Hidasi
> >
> >
> >
> >*
> >This is Brian O'Meara's function, that i am using to create the consensus
> >tree, on the post "*[R-sig-phylo] Why no branch lengths on consensus
> >trees?":*
> >
> >"I have a function to create a consensus tree with branch lengths. You
> >feed it a given topology (often a consensus topology, made with ape), then
> >a list of trees, and tell it what you want the branch lengths to
> >represent. It could be the proportion of input trees with that edge (good
> >for summarizing bootstrap or Bayes proportions) or the mean, median, or sd
> >of branch lengths for those trees that have that edge. Consensus branch
> >lengths in units of proportion of matching trees has obvious utility.
> >As Daniel says, the average branch lengths across a set of trees is
> >more difficult to see a use case for, but you could imagine doing
> >something
> >like taking the ratogram output from r8s on a set of trees and summarizing
> >the rate average and rate sd on a given, "best", tree as two sets of
> >branch
> >lengths on that tree.
> >
> >I've put the function source at
> >
> https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/R/consensusBrl
> >en.R?revision=110&root=omearalab
> >.
> > You can source the file for the function (consensusBrlen() ) and
> >other
> >functions it needs. It also uses phylobase. Note that this is
> >alpha-quality
> >code -- it's been checked a bit, but verify it's doing what you want.
> >
> >Here's an example of how to use it
> >
> > library(ape)
> >
> >library(phylobase)
> >
> >phy.a<-rcoal(15)
> >
> >phy.b<-phy.a
> >
> >phy.b$edge.length<-phy.b$edge.length+runif(length(phy.b$edge.length), 0,
> >0.1)
> >
> >phy.c<-rcoal(15)
> >
> >phy.list<-list(phy.a, phy.b, phy.c)
> >
> >phy.consensus<-consensusBrlen(phy.a, list(phy.a, phy.b, phy.c),
> >type="mean_brlen")"
> >
> >--
> >José Hidasi Neto
> >Graduated in Biological Sciences - Universidade Federal de Goiás (UFG)
> >Master's candidate in Ecology and Evolution - Community Ecology and
> >Functioning Lab - UFG
> >Lattes:
> >http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4293841A0
> >
> >       [[alternative HTML version deleted]]
> >
> >
>
>
> --
> Daniel Barker
> http://bio.st-andrews.ac.uk/staff/db60.htm
> The University of St Andrews is a charity registered in Scotland : No
> SC013532
>
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Analysis with Multiple cores on Mac Workstation

Reply via email to