Hi Dannon,
If I may further elaborate on this issue, I would like to mention that
this kind of functionality is also supported by the Sun Grid Engine in
the form of 'array jobs'. With this functionality you can execute a job
multiple times in an independent way, only differing for instance in the
parameter settings. From your description below, it seems similar to the
Galaxy parallelism tag. Is there or do you foresee any implementation of
this SGE functionality through the drmaa interface in Galaxy? If not, is
there anybody who has achieved this through some custom coding? We would
be highly interested in this.
thanks
Bram
On 15/02/2012 18:08, Dannon Baker wrote:
It's definitely an experimental feature at this point, and there's no wiki, but basic support for breaking jobs into
tasks does exist. It needs a lot more work and can go in a few different directions to make it better, but check out
the wrappers with<parallelism> defined, and enable use_tasked_jobs in your universe_wsgi.ini and restart.
That's all it should take from a fresh galaxy install to get, iirc, at least BWA and a few other tools working. If
you want a super trivial example to play with, change the tool .xml for text tool like "change case" to
have<parallelism method="basic"></parallelism> and give that a shot.
If you decide to try this out, do keep in mind that this feature is not at all
complete and while there's a long list of things we still want to experiment
with along these lines suggestions (and especially contributions) are
absolutely welcome.
-Dannon
On Feb 15, 2012, at 11:36 AM, Peter Cock wrote:
Hi all,
The comments on this issue suggest that the Galaxy team is/were
working on splitting large jobs over multiple nodes/CPUs:
https://bitbucket.org/galaxy/galaxy-central/issue/79/split-large-jobs
Is there any relevant page on the wiki I should be aware of?
Specifically I am hoping for a general framework where one of the tool
inputs can be marked as "embarrassingly parallel" meaning it can be
subdivided easily (e.g. multiple sequences in FASTA or FASTQ format,
multiple annotations in BED format, multiple lines in tabular format) and
the outputs can all be easily combined (e.g. by concatenation in the
same order as the input was split).
Thanks,
Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
--
==========================================================
Bram Slabbinck, PhD
Bioinformatics& Systems Biology Division
VIB Department of Plant Systems Biology, UGent
Technologiepark 927, 9052 Gent, BELGIUM
Email: bram.slabbi...@psb.ugent.be
WWW: http://bioinformatics.psb.ugent.be
==========================================================
Please consider the environment before printing this email
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/