Re: [Bioc-devel] BiocParallel

Michael Lawrence Sat, 17 Nov 2012 20:43:07 -0800

On Fri, Nov 16, 2012 at 8:42 PM, Ryan C. Thompson <r...@thompsonclan.org>wrote:


> The difference is that in the parallel package, you use mclapply for
> multicore and parLapply for multi-machine parallelism. If you want to
> switch from one to the other, you have to change all your code that uses
> either function to the other one. If you use llply(..., .parallel=TRUE),
> then all you have to do is register a different backend, which is one line
> of code to load the new backend and a second one to register it, and the
> rest of your code stays the same.
>
>
You can use mclapply via parLapply using the fork backend. parLapply is the
most general interface. I favor calling mclapply directly in place of
parLapply + fork backend if the code relies on the implicit sharing of the
workspace, because it clarifies that constraint. But yes, we can easily
abstract all of this through the *apply methods on Bioc data structures.
For example, IRanges::List has lapply, mapply, etc methods. This gives us
the freedom to experiment.

Michael


>
> On Fri 16 Nov 2012 03:24:56 PM PST, Michael Lawrence wrote:
>
>>
>>
>> On Fri, Nov 16, 2012 at 11:44 AM, Ryan C. Thompson
>> <r...@thompsonclan.org <mailto:r...@thompsonclan.org>> wrote:
>>
>>     You don't have to use foreach directly. I use foreach almost
>>     exclusively through the plyr package, which uses foreach
>>     internally to implement parallelism. Like you, I'm not
>>     particularly fond of the foreach syntax (though it has some nice
>>     features that come in handy sometimes).
>>
>>     The appeal of foreach is that it supports pluggable parallelizing
>>     backends, so you can (in theory) write the same code and
>>     parallelize it across multiple cores, or across an entire cluster,
>>     just by plugging in different backends.
>>
>>
>> But isn't this also possible with the parallel package? It was
>> inherited from snow. I'd be more in favor of extending the parallel
>> package, simply because it's part of base R.
>>
>>
>>
>>     On Fri 16 Nov 2012 10:17:24 AM PST, Michael Lawrence wrote:
>>
>>         I'm not sure I understand the appeal of foreach. Why not do this
>>         within the functional paradigm, i.e, parLapply?
>>
>>         Michael
>>
>>         On Fri, Nov 16, 2012 at 9:41 AM, Ryan C. Thompson
>>         <r...@thompsonclan.org <mailto:r...@thompsonclan.org>
>>         <mailto:r...@thompsonclan.org <mailto:r...@thompsonclan.org>>**>
>>
>>         wrote:
>>
>>             You could write a %dopar% backend for the foreach package,
>>         which
>>             would allow any code using foreach (or plyr which uses
>>         foreach) to
>>             parallelize using your code.
>>
>>             On a related note, it might be nice to add
>>         Bioconductor-compatible
>>             versions of foreach and the plyr functions to BiocParallel if
>>             they're not already compatible.
>>
>>
>>             On 11/16/2012 12:18 AM, Hahne, Florian wrote:
>>
>>                 I've hacked up some code that uses BatchJobs but makes
>>         it look
>>                 like a
>>                 normal parLapply operation. Currently the main R
>>         process is
>>                 checking the
>>                 state of the queue in regular intervals and fetches
>>         results
>>                 once a job has
>>                 finished. Seems to work quite nicely, although there
>>         certainly
>>                 are more
>>                 elaborate ways to deal with the synchronous/asynchronous
>>                 issue. Is that
>>                 something that could be interesting for the broader
>>         audience?
>>                 I could add
>>                 the code to BiocParallel for folks to try it out.
>>                 The whole thing may be a dumb idea, but I find it kind of
>>                 useful to be
>>                 able to start parallel jobs directly from R on our
>>         huge SGE
>>                 cluster, have
>>                 the calling script wait for all jobs to finish and then
>>                 continue with some
>>                 downstream computations, rather than having to
>>         manually check
>>                 the job
>>                 status and start another script once the results are
>>         there.
>>                 Florian
>>
>>
>>
>>
>>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] BiocParallel

Reply via email to