Re: [galaxy-dev] Defining Job Runners Dynamically

2012-03-05 Thread Tony Raymond
Hi Nate,

Has there been any progress on this? This enhancement would actually be very 
useful for our local Galaxy instance.

Cheers,
Tony

-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Nate Coraor
Sent: Wednesday, January 25, 2012 10:56 AM
To: John Chilton
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Defining Job Runners Dynamically

Hey John,

This hasn't been forgotten.  I appreciate the code submission, and will review 
it as soon as possible.  I also created a relevant issue in Bitbucket:

https://bitbucket.org/galaxy/galaxy-central/issue/709/add-more-control-over-where-jobs-run

--nate

On Oct 15, 2011, at 10:42 PM, John Chilton wrote:

> Hello All,
> 
>   I just issued a pull request that augments Galaxy to allow defining job 
> runners dynamically at runtime 
> (https://bitbucket.org/galaxy/galaxy-central/pull-request/12/dynamic-job-runners).
>  Whether it makes the cut or not, I thought I would describe enhancements 
> here in case anyone else would find it useful.
> 
>   There a couple use cases we hope this will help us address for our 
> institution - one is dynamically switching queues based on user (we have a 
> very nice shared memory resource that can only be used by researchers with 
> NIH funding) and the other is inspecting input sizes to give more accurate 
> max walltimes to pbs (a small number of cufflinks jobs for instance take over 
> three days on our cluster but defining max walltimes in excess of that for 
> all jobs could result in our queue sitting idle around our monthly 
> downtimes). You might also imagine using this to dynamically switch queues 
> entirely based on input sizes or parameters, or alter queue priorities based 
> on the submitting user or input sizes/parameters.
> 
>   There are two steps to use this - you must add a line in universe.ini and 
> define a function to compute the true job runner string in the new file 
> lib/galaxy/jobs/rules.py. 
> 
>   This first step is similar to what you would do to statically assign a tool 
> to a particular job runner. If you would like to dynamically assign a job 
> runner for cufflinks you would start by adding a line like one of the 
> following to universe.ini
> 
> cufflinks = dynamic:///python
> -or-
> cufflinks = dynamic:///python/compute_runner
> 
> If you use the first form, a function called cufflinks must be defined in 
> rules.py, adding the extra argument after python/  lets you specify a 
> particular function by name (compute_runner in this example). This second 
> option could let you assign job runners with the same function for multiple 
> tools.
> 
> The only other step is to define a python function in rules.py that produces 
> a string corresponding to a valid job runner such as "local:///" or 
> "pbs:///queue/-l walltime=48:00:00/". 
> 
> If the functions defined in this file take in arguments, these arguments 
> should have names from the follow list: job_wrapper, user_email, app, job, 
> tool, tool_id, job_id, user. The plumbing will map these arguments to the 
> implied galaxy object. For instance, job_wrapper is the JobWrapper instance 
> for the job that gets passed to the job runner, user_email is the user's 
> email address or None, app is the main application configuration object used 
> throughout the code base that can be used for instance to get values defined 
> in universe.ini, job, tool, and user are model objects, and job_id and 
> tool_id the relevant ids.
> 
> If you are writing a function that routes a certain list of users to a 
> particular queue or increases their priority, you will probably only need to 
> take in one argument - user_email. However, if you are going to look at input 
> file sizes you may want to take in an argument called job and use the 
> following piece of code to find the input size for input named "input1" in 
> the tool xml. 
> 
> inp_data = dict( [ ( da.name, da.dataset ) for da in job.input_datasets ] 
> )
> inp_data.update( [ ( da.name, da.dataset ) for da in 
> job.input_library_datasets ] )
> input1_file = inp_data[ "input1" ].file_name
> input1_size = os.path.getsize( input1_file )
> 
> This whole concept works for a couple of small tests on my local machine, but 
> there are certain aspects of the job runner code that makes me feel there may 
> be corner cases I am not seeing where this approach may not work - so your 
> millage may vary.
> 
> -John
> 
>  
> John Chilton 
> Software Developer 
> University of Minnesota Supercomputing Institute 
> Office: 612-625-0917 
> Cell: 612-226-9223 
> E-Mai

Re: [galaxy-dev] Defining Job Runners Dynamically

2012-01-25 Thread Nate Coraor
Hey John,

This hasn't been forgotten.  I appreciate the code submission, and will review 
it as soon as possible.  I also created a relevant issue in Bitbucket:

https://bitbucket.org/galaxy/galaxy-central/issue/709/add-more-control-over-where-jobs-run

--nate

On Oct 15, 2011, at 10:42 PM, John Chilton wrote:

> Hello All,
> 
>   I just issued a pull request that augments Galaxy to allow defining job 
> runners dynamically at runtime 
> (https://bitbucket.org/galaxy/galaxy-central/pull-request/12/dynamic-job-runners).
>  Whether it makes the cut or not, I thought I would describe enhancements 
> here in case anyone else would find it useful.
> 
>   There a couple use cases we hope this will help us address for our 
> institution - one is dynamically switching queues based on user (we have a 
> very nice shared memory resource that can only be used by researchers with 
> NIH funding) and the other is inspecting input sizes to give more accurate 
> max walltimes to pbs (a small number of cufflinks jobs for instance take over 
> three days on our cluster but defining max walltimes in excess of that for 
> all jobs could result in our queue sitting idle around our monthly 
> downtimes). You might also imagine using this to dynamically switch queues 
> entirely based on input sizes or parameters, or alter queue priorities based 
> on the submitting user or input sizes/parameters.
> 
>   There are two steps to use this - you must add a line in universe.ini and 
> define a function to compute the true job runner string in the new file 
> lib/galaxy/jobs/rules.py. 
> 
>   This first step is similar to what you would do to statically assign a tool 
> to a particular job runner. If you would like to dynamically assign a job 
> runner for cufflinks you would start by adding a line like one of the 
> following to universe.ini
> 
> cufflinks = dynamic:///python
> -or-
> cufflinks = dynamic:///python/compute_runner
> 
> If you use the first form, a function called cufflinks must be defined in 
> rules.py, adding the extra argument after python/  lets you specify a 
> particular function by name (compute_runner in this example). This second 
> option could let you assign job runners with the same function for multiple 
> tools.
> 
> The only other step is to define a python function in rules.py that produces 
> a string corresponding to a valid job runner such as "local:///" or 
> "pbs:///queue/-l walltime=48:00:00/". 
> 
> If the functions defined in this file take in arguments, these arguments 
> should have names from the follow list: job_wrapper, user_email, app, job, 
> tool, tool_id, job_id, user. The plumbing will map these arguments to the 
> implied galaxy object. For instance, job_wrapper is the JobWrapper instance 
> for the job that gets passed to the job runner, user_email is the user's 
> email address or None, app is the main application configuration object used 
> throughout the code base that can be used for instance to get values defined 
> in universe.ini, job, tool, and user are model objects, and job_id and 
> tool_id the relevant ids.
> 
> If you are writing a function that routes a certain list of users to a 
> particular queue or increases their priority, you will probably only need to 
> take in one argument - user_email. However, if you are going to look at input 
> file sizes you may want to take in an argument called job and use the 
> following piece of code to find the input size for input named "input1" in 
> the tool xml. 
> 
> inp_data = dict( [ ( da.name, da.dataset ) for da in job.input_datasets ] 
> )
> inp_data.update( [ ( da.name, da.dataset ) for da in 
> job.input_library_datasets ] )
> input1_file = inp_data[ "input1" ].file_name
> input1_size = os.path.getsize( input1_file )
> 
> This whole concept works for a couple of small tests on my local machine, but 
> there are certain aspects of the job runner code that makes me feel there may 
> be corner cases I am not seeing where this approach may not work - so your 
> millage may vary.
> 
> -John
> 
>  
> John Chilton 
> Software Developer 
> University of Minnesota Supercomputing Institute 
> Office: 612-625-0917 
> Cell: 612-226-9223 
> E-Mail: chil...@msi.umn.edu 
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Defining Job Runners Dynamically

2011-10-15 Thread John Chilton

Hello All,

  I just issued a pull request that augments Galaxy to allow defining 
job runners dynamically at runtime 
(https://bitbucket.org/galaxy/galaxy-central/pull-request/12/dynamic-job-runners). 
Whether it makes the cut or not, I thought I would describe enhancements 
here in case anyone else would find it useful.


  There a couple use cases we hope this will help us address for our 
institution - one is dynamically switching queues based on user (we have 
a very nice shared memory resource that can only be used by researchers 
with NIH funding) and the other is inspecting input sizes to give more 
accurate max walltimes to pbs (a small number of cufflinks jobs for 
instance take over three days on our cluster but defining max walltimes 
in excess of that for all jobs could result in our queue sitting idle 
around our monthly downtimes). You might also imagine using this to 
dynamically switch queues entirely based on input sizes or parameters, 
or alter queue priorities based on the submitting user or input 
sizes/parameters.


  There are two steps to use this - you must add a line in universe.ini 
and define a function to compute the true job runner string in the new 
file lib/galaxy/jobs/rules.py.


  This first step is similar to what you would do to statically assign 
a tool to a particular job runner. If you would like to dynamically 
assign a job runner for cufflinks you would start by adding a line like 
one of the following to universe.ini


cufflinks = dynamic:///python
-or-
cufflinks = dynamic:///python/compute_runner

If you use the first form, a function called cufflinks must be defined 
in rules.py, adding the extra argument after python/  lets you specify a 
particular function by name (compute_runner in this example). This 
second option could let you assign job runners with the same function 
for multiple tools.


The only other step is to define a python function in rules.py that 
produces a string corresponding to a valid job runner such as 
"local:///" or "pbs:///queue/-l walltime=48:00:00/".


If the functions defined in this file take in arguments, these arguments 
should have names from the follow list: job_wrapper, user_email, app, 
job, tool, tool_id, job_id, user. The plumbing will map these arguments 
to the implied galaxy object. For instance, job_wrapper is the 
JobWrapper instance for the job that gets passed to the job runner, 
user_email is the user's email address or None, app is the main 
application configuration object used throughout the code base that can 
be used for instance to get values defined in universe.ini, job, tool, 
and user are model objects, and job_id and tool_id the relevant ids.


If you are writing a function that routes a certain list of users to a 
particular queue or increases their priority, you will probably only 
need to take in one argument - user_email. However, if you are going to 
look at input file sizes you may want to take in an argument called job 
and use the following piece of code to find the input size for input 
named "input1" in the tool xml.


inp_data = dict( [ ( da.name, da.dataset ) for da in 
job.input_datasets ] )
inp_data.update( [ ( da.name, da.dataset ) for da in 
job.input_library_datasets ] )

input1_file = inp_data[ "input1" ].file_name
input1_size = os.path.getsize( input1_file )

This whole concept works for a couple of small tests on my local 
machine, but there are certain aspects of the job runner code that makes 
me feel there may be corner cases I am not seeing where this approach 
may not work - so your millage may vary.


-John


John Chilton
Software Developer
University of Minnesota Supercomputing Institute
Office: 612-625-0917
Cell: 612-226-9223
E-Mail: chil...@msi.umn.edu
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/