Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

Bram Slabbinck Mon, 20 Feb 2012 00:08:10 -0800

Hi Dannon,

If I may further elaborate on this issue, I would like to mention thatthis kind of functionality is also supported by the Sun Grid Engine inthe form of 'array jobs'. With this functionality you can execute a jobmultiple times in an independent way, only differing for instance in theparameter settings. From your description below, it seems similar to theGalaxy parallelism tag. Is there or do you foresee any implementation ofthis SGE functionality through the drmaa interface in Galaxy? If not, isthere anybody who has achieved this through some custom coding? We wouldbe highly interested in this.


thanks
Bram

On 15/02/2012 18:08, Dannon Baker wrote:

It's definitely an experimental feature at this point, and there's no wiki, but basic support for breaking jobs into 
tasks does exist.  It needs a lot more work and can go in a few different directions to make it better, but check out 
the wrappers with<parallelism>  defined, and enable use_tasked_jobs in your universe_wsgi.ini and restart.  
That's all it should take from a fresh galaxy install to get, iirc, at least BWA and a few other tools working.  If 
you want a super trivial example to play with, change the tool .xml for text tool like "change case" to 
have<parallelism method="basic"></parallelism>  and give that a shot.

If you decide to try this out, do keep in mind that this feature is not at all 
complete and while there's a long list of things we still want to experiment 
with along these lines suggestions (and especially contributions) are 
absolutely welcome.

-Dannon

On Feb 15, 2012, at 11:36 AM, Peter Cock wrote:

Hi all,

The comments on this issue suggest that the Galaxy team is/were
working on splitting large jobs over multiple nodes/CPUs:

https://bitbucket.org/galaxy/galaxy-central/issue/79/split-large-jobs

Is there any relevant page on the wiki I should be aware of?

Specifically I am hoping for a general framework where one of the tool
inputs can be marked as "embarrassingly parallel" meaning it can be
subdivided easily (e.g. multiple sequences in FASTA or FASTQ format,
multiple annotations in BED format, multiple lines in tabular format) and
the outputs can all be easily combined (e.g. by concatenation in the
same order as the input was split).

Thanks,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/

--
==========================================================
Bram Slabbinck, PhD

Bioinformatics&  Systems Biology Division
VIB Department of Plant Systems Biology, UGent
Technologiepark 927, 9052 Gent, BELGIUM

Email: bram.slabbi...@psb.ugent.be
WWW: http://bioinformatics.psb.ugent.be
==========================================================
Please consider the environment before printing this email

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

Re: [galaxy-dev] Splitting large jobs over multiple nodes/CPUs?

Reply via email to