Looks like great progress has been made. Here are some thoughts:

The *Params objects seem to have two roles: specifying the desired
resources and indicating the scheduler (the thing that actually executes
the jobs). Maybe it would be beneficial to have separate abstractions for
those two things. For example, one could model a resource abstraction based
on the JSDL (Job Submission Description Language). Then backends would
communicate those resource requirements to the scheduler. Overengineering?

The name "pvec" is not very intuitive. What about "bpchunk"? And since the
function passed to bpvectorize is already vectorized, maybe bpvectorize
should be bparallelize? I know everyone has different
intuitions/preferences when it comes to names, so feel free to disregard.

Michael




On Mon, Dec 3, 2012 at 5:32 PM, Martin Morgan <mtmor...@fhcrc.org> wrote:

> Bioc Developers --
>
> BiocParallel generated quite a bit of discussion, so I'm providing a brief
> update. Version 0.0.5 is available to R-devel users via biocLite; it's in
> svn
>
>   https://hedgehog.fhcrc.org/**bioconductor/trunk/madman/**
> Rpacks/BiocParallel<https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/BiocParallel>
>
> and github
>
>   
> https://github.com/**Bioconductor/BiocParallel<https://github.com/Bioconductor/BiocParallel>
>
> We have tried to incorporate some key ideas, though things are far from
> complete.
>
>
> The basic idea is that one creates a 'param'
>
>   p = MulticoreParam(workers=8)
>
> and uses that in computations
>
>   bplapply(1:8, function(i) Sys.sleep(1), param=p)
>
>
> There is a simple registry, populated at start-up with a 'greedy' (e.g.,
> MulticoreParam(workers=**parallel::detectCores()) param instance or
> invoked explicitly
>
>   register(p)
>
> the 'default' (most recently register'ed, with default=TRUE argument) is
> used if param is missing
>
>   bplapply(1:8, function(i) Sys.sleep(1))
>
>
> There are MulticoreParam, SnowParam, and DoparParam params so far;
> SnowParam is 'lazy' and bpstart / bpstop can be used to start the implied
> cluster
>
> > p = SnowParam(workers=2)
> > p = bpstart(p)
> Bioconductor version 2.12 (BiocInstaller 1.9.5), ?biocLite for help
> Bioconductor version 2.12 (BiocInstaller 1.9.5), ?biocLite for help
> > p = bpstop(p)
>
> DoparParam (currently) indicates that a foreach-style back-end has been
> registered (via standard foreach approaches), and bplapply(1:8, ...,
> param=DoparParam()) uses foreach for evaluation. *Param are S4 classes
> (should probably be reference classes) that  extend BiocParallelParam and
> so anyone can implement a new *Param; eventually BiocParallelParam will
> define 'required' fields (like 'workers' and 'setSeed') that all *Param
> objects are expected to support.
>
>
> bplapply has signature bplapply(X, FUN, ..., param) and is a generic in
> all three arguments, so again package developers can implement versions
> tailored to their clusters (Florian has sent me some code for an SGE
> scheduler, which I have not yet incorporated).
>
>
> Only bplapply and bpvec are currently implemented as 'algorithms'. They
> have a common signature and have been implemented to rely only on length,
> '[', '[[' (for bplapply) and 'c' (for bpvec); this is the 'contract' that
> we'll try to maintain. We'd like to implement other algorithms, and to make
> current algorithms more useful by including better error handling,
> scheduling, and reduction.
>
>
> bpvectorize is a simple way to convert 'vectorized' functions into a
> parallel, vectorized version, e.g., pcountOverlaps =
> bpvectorize(countOverlaps).
>
>
> I'm happy to hear of major mis-steps, and areas in pressing need of
> development, either on or off list or via the github interface.
>
>
> Ryan Thompson has made valuable contributions, especially DoparParam and
> cleaning up bpvec and bplapply; I haven't always managed to wrangle git and
> svn (thanks Laurent for the --add-author-name tip, which works when I do
> other things right) in a way that fully credits his contribution, for which
> I apologize.
>
> Martin
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
> ______________________________**_________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to