Re: [galaxy-dev] card 79: Split large jobs over multiple nodes for processing

Alex.Khassapov Thu, 07 Feb 2013 16:48:23 -0800

Thanks Peter. I see, <parallelism> works on a single large file by splitting it 
and using multiple instances to process the bits in parallel.

In our case we use 'composite' data type, simply an array of input files and we 
would like to process them in parallel, instead of having a 'foreach' loop in 
the tool wrapper.

Is it possible?

We are looking at CloudMan for creating a cluster in Galaxy now.

-Alex

-----Original Message-----
From: Peter Cock [mailto:[email protected]] 
Sent: Thursday, 7 February 2013 9:09 PM
To: Khassapov, Alex (CSIRO IM&T, Clayton)
Cc: [email protected]
Subject: Re: [galaxy-dev] card 79: Split large jobs over multiple nodes for 
processing

On Wed, Feb 6, 2013 at 11:43 PM, <[email protected]> wrote:
>
> Hi All,
>
> Can anybody please add a few words on how can we use the "initial 
> implementation" which " exists in the tasks framework"?
>
> -Alex
>

To enable this, set use_tasked_jobs = True in your universe_wsgi.ini file. The 
tools must also be configured to allow this via the <parallelism> tag. Many of 
my tools do this, for example see the NCBI
BLAST+ wrappers in the tool shed. Additionally the data file formats
must support being split, or being merged - which is done via Python code in 
the Galaxy datatype definition (see the split and merge methods in 
lib/galaxy/datatypes/*.py). Some other relevant Python code is in 
lib/galaxy/jobs/splitters/*.py

Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] card 79: Split large jobs over multiple nodes for processing

Reply via email to