Thanks Peter. I see, <parallelism> works on a single large file by splitting it and using multiple instances to process the bits in parallel.
In our case we use 'composite' data type, simply an array of input files and we would like to process them in parallel, instead of having a 'foreach' loop in the tool wrapper. Is it possible? We are looking at CloudMan for creating a cluster in Galaxy now. -Alex -----Original Message----- From: Peter Cock [mailto:[email protected]] Sent: Thursday, 7 February 2013 9:09 PM To: Khassapov, Alex (CSIRO IM&T, Clayton) Cc: [email protected] Subject: Re: [galaxy-dev] card 79: Split large jobs over multiple nodes for processing On Wed, Feb 6, 2013 at 11:43 PM, <[email protected]> wrote: > > Hi All, > > Can anybody please add a few words on how can we use the "initial > implementation" which " exists in the tasks framework"? > > -Alex > To enable this, set use_tasked_jobs = True in your universe_wsgi.ini file. The tools must also be configured to allow this via the <parallelism> tag. Many of my tools do this, for example see the NCBI BLAST+ wrappers in the tool shed. Additionally the data file formats must support being split, or being merged - which is done via Python code in the Galaxy datatype definition (see the split and merge methods in lib/galaxy/datatypes/*.py). Some other relevant Python code is in lib/galaxy/jobs/splitters/*.py Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
