Hi Dannon,

If I may further elaborate on this issue, I would like to mention that this kind of functionality is also supported by the Sun Grid Engine in the form of 'array jobs'. With this functionality you can execute a job multiple times in an independent way, only differing for instance in the parameter settings. From your description below, it seems similar to the Galaxy parallelism tag. Is there or do you foresee any implementation of this SGE functionality through the drmaa interface in Galaxy? If not, is there anybody who has achieved this through some custom coding? We would be highly interested in this.

thanks
Bram

On 15/02/2012 18:08, Dannon Baker wrote:
It's definitely an experimental feature at this point, and there's no wiki, but basic support for breaking jobs into 
tasks does exist.  It needs a lot more work and can go in a few different directions to make it better, but check out 
the wrappers with<parallelism>  defined, and enable use_tasked_jobs in your universe_wsgi.ini and restart.  
That's all it should take from a fresh galaxy install to get, iirc, at least BWA and a few other tools working.  If 
you want a super trivial example to play with, change the tool .xml for text tool like "change case" to 
have<parallelism method="basic"></parallelism>  and give that a shot.

If you decide to try this out, do keep in mind that this feature is not at all 
complete and while there's a long list of things we still want to experiment 
with along these lines suggestions (and especially contributions) are 
absolutely welcome.

-Dannon

On Feb 15, 2012, at 11:36 AM, Peter Cock wrote:

Hi all,

The comments on this issue suggest that the Galaxy team is/were
working on splitting large jobs over multiple nodes/CPUs:

https://bitbucket.org/galaxy/galaxy-central/issue/79/split-large-jobs

Is there any relevant page on the wiki I should be aware of?

Specifically I am hoping for a general framework where one of the tool
inputs can be marked as "embarrassingly parallel" meaning it can be
subdivided easily (e.g. multiple sequences in FASTA or FASTQ format,
multiple annotations in BED format, multiple lines in tabular format) and
the outputs can all be easily combined (e.g. by concatenation in the
same order as the input was split).

Thanks,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/

--
==========================================================
Bram Slabbinck, PhD

Bioinformatics&  Systems Biology Division
VIB Department of Plant Systems Biology, UGent
Technologiepark 927, 9052 Gent, BELGIUM

Email: bram.slabbi...@psb.ugent.be
WWW: http://bioinformatics.psb.ugent.be
==========================================================
Please consider the environment before printing this email
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

Reply via email to