On Fri, Nov 16, 2012 at 8:42 PM, Ryan C. Thompson <r...@thompsonclan.org>wrote:
> The difference is that in the parallel package, you use mclapply for > multicore and parLapply for multi-machine parallelism. If you want to > switch from one to the other, you have to change all your code that uses > either function to the other one. If you use llply(..., .parallel=TRUE), > then all you have to do is register a different backend, which is one line > of code to load the new backend and a second one to register it, and the > rest of your code stays the same. > > You can use mclapply via parLapply using the fork backend. parLapply is the most general interface. I favor calling mclapply directly in place of parLapply + fork backend if the code relies on the implicit sharing of the workspace, because it clarifies that constraint. But yes, we can easily abstract all of this through the *apply methods on Bioc data structures. For example, IRanges::List has lapply, mapply, etc methods. This gives us the freedom to experiment. Michael > > On Fri 16 Nov 2012 03:24:56 PM PST, Michael Lawrence wrote: > >> >> >> On Fri, Nov 16, 2012 at 11:44 AM, Ryan C. Thompson >> <r...@thompsonclan.org <mailto:r...@thompsonclan.org>> wrote: >> >> You don't have to use foreach directly. I use foreach almost >> exclusively through the plyr package, which uses foreach >> internally to implement parallelism. Like you, I'm not >> particularly fond of the foreach syntax (though it has some nice >> features that come in handy sometimes). >> >> The appeal of foreach is that it supports pluggable parallelizing >> backends, so you can (in theory) write the same code and >> parallelize it across multiple cores, or across an entire cluster, >> just by plugging in different backends. >> >> >> But isn't this also possible with the parallel package? It was >> inherited from snow. I'd be more in favor of extending the parallel >> package, simply because it's part of base R. >> >> >> >> On Fri 16 Nov 2012 10:17:24 AM PST, Michael Lawrence wrote: >> >> I'm not sure I understand the appeal of foreach. Why not do this >> within the functional paradigm, i.e, parLapply? >> >> Michael >> >> On Fri, Nov 16, 2012 at 9:41 AM, Ryan C. Thompson >> <r...@thompsonclan.org <mailto:r...@thompsonclan.org> >> <mailto:r...@thompsonclan.org <mailto:r...@thompsonclan.org>>**> >> >> wrote: >> >> You could write a %dopar% backend for the foreach package, >> which >> would allow any code using foreach (or plyr which uses >> foreach) to >> parallelize using your code. >> >> On a related note, it might be nice to add >> Bioconductor-compatible >> versions of foreach and the plyr functions to BiocParallel if >> they're not already compatible. >> >> >> On 11/16/2012 12:18 AM, Hahne, Florian wrote: >> >> I've hacked up some code that uses BatchJobs but makes >> it look >> like a >> normal parLapply operation. Currently the main R >> process is >> checking the >> state of the queue in regular intervals and fetches >> results >> once a job has >> finished. Seems to work quite nicely, although there >> certainly >> are more >> elaborate ways to deal with the synchronous/asynchronous >> issue. Is that >> something that could be interesting for the broader >> audience? >> I could add >> the code to BiocParallel for folks to try it out. >> The whole thing may be a dumb idea, but I find it kind of >> useful to be >> able to start parallel jobs directly from R on our >> huge SGE >> cluster, have >> the calling script wait for all jobs to finish and then >> continue with some >> downstream computations, rather than having to >> manually check >> the job >> status and start another script once the results are >> there. >> Florian >> >> >> >> >> [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel