Re: [galaxy-dev] suggestion for multithreading

2011-08-15 Thread Nate Coraor
Andrew Warren wrote:
> So would the current correct method for setting up multi-threaded jobs on a
> cluster be to specify custom runners in the [galaxy:tool_runners] section of
> the universe config file for EVERY tool that uses a multiple threads
> (assuming the default is set to one)?
> 
> For example, for the bowtie program and a queue named "galaxy":
> *bowtie = pbs:///galaxy/-l ppn=4,mem=16gb/*

Hi Andrew,

You'll need to use the tool id from the  tag in the XML config
file.  The bowtie file is 'tools/sr_mapping/bowtie_wrapper.xml' and the
tool id is 'bowtie_wrapper'.

Unfortunately, you also need to set the number of threads in the same
XML file, although 4 happens to be the default:

--threads="4"

Unfortunately this value isn't read from the config currently.

--nate

> *
> *
> Is this currently the only way for galaxy to inform the queuing system how
> many threads a program will use?
> And does this mean that without custom runners in the config file any
> muti-threaded program that has multiple instances in an asychronous workflow
> has the opportunity to overload a cluster node since the queuing system
> doesn't "know" how many threads the program will be using?
> 
> Just want to make sure I'm not missing out on the latest and greatest method
> for process management. :)
> 
> Thanks,
> Andrew
> *
> *
> Louise-Amélie Schmitt wrote:
> 
> > >
> > > default_cluster_job_runner will remain for backwards compatibility, but
> > > we'll ship a sample job_conf.xml that runs everything locally by
> > > default.
> > >
> > > --nate
> >
> > Haha, and I did that before realizing I could do just what I needed by
> > writing tool-specific *pbs*:// URLs at the end of the config file... I'm
> such
> > an idiot.
> 
> Haha, okay, I don't think i even noticed since I was distracted by your
> implementation being a step in the way we want to go with it.
> 
> > But I really like what you did of it and I have a couple of questions.
> >
> > Concerning the single-threaded tools, what would happen if the number of
> > threads set in the xml file was >1 ?
> 
> It'd consume extra slots, but the tool itself would just run as usual.
> 
> > Could it be possible to forbid a tool to run on a given node?
> 
> Hrm.  In *PBS* you could do it using node properties/neednodes or resource
> requirements.  I'd have to think a bit about how to do this in a more
> general way in the XML.
> 
> --nate
> 
> >
> > Thanks,
> > L-A
> >
> >
> > >
> > >>
> > >> Peter
> > >>

> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>   http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] suggestion for multithreading

2011-08-09 Thread Louise-Amélie Schmitt
Well, you still can use my method, which I described at the beginning of 
the thread. But that means modifying some code.


If I'm not mistaken, Galaxy's built-in scheduler is a simple FIFO 
scheduler with no means to tune the needed resources. So if you set 
multithreaded tools, yeah I guess the nodes can expect surprises. That 
could happen with pbs too if you don't set the proper number of needed 
cpu per node / necessary amount of memory.


Or I missed something too.

Best,
L-A


Le 08/08/2011 21:07, Andrew Warren a écrit :
So would the current correct method for setting up multi-threaded jobs 
on a cluster be to specify custom runners in the [galaxy:tool_runners] 
section of the universe config file for EVERY tool that uses a 
multiple threads (assuming the default is set to one)?


For example, for the bowtie program and a queue named "galaxy":
*bowtie = pbs:///galaxy/-l ppn=4,mem=16gb/*
*
*
Is this currently the only way for galaxy to inform the queuing system 
how many threads a program will use?
And does this mean that without custom runners in the config file any 
muti-threaded program that has multiple instances in an asychronous 
workflow has the opportunity to overload a cluster node since the 
queuing system doesn't "know" how many threads the program will be using?


Just want to make sure I'm not missing out on the latest and greatest 
method for process management. :)


Thanks,
Andrew
*
*
Louise-Amélie Schmitt wrote:

> >
> > default_cluster_job_runner will remain for backwards compatibility, 
but

> > we'll ship a sample job_conf.xml that runs everything locally by
> > default.
> >
> > --nate
>
> Haha, and I did that before realizing I could do just what I needed by
> writing tool-specific *pbs*:// URLs at the end of the config file... 
I'm such

> an idiot.

Haha, okay, I don't think i even noticed since I was distracted by your
implementation being a step in the way we want to go with it.

> But I really like what you did of it and I have a couple of questions.
>
> Concerning the single-threaded tools, what would happen if the number of
> threads set in the xml file was >1 ?

It'd consume extra slots, but the tool itself would just run as usual.

> Could it be possible to forbid a tool to run on a given node?

Hrm.  In *PBS* you could do it using node properties/neednodes or 
resource

requirements.  I'd have to think a bit about how to do this in a more
general way in the XML.

--nate

>
> Thanks,
> L-A
>
>
> >
> >>
> >> Peter
> >>


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] suggestion for multithreading

2011-08-08 Thread Andrew Warren
So would the current correct method for setting up multi-threaded jobs on a
cluster be to specify custom runners in the [galaxy:tool_runners] section of
the universe config file for EVERY tool that uses a multiple threads
(assuming the default is set to one)?

For example, for the bowtie program and a queue named "galaxy":
*bowtie = pbs:///galaxy/-l ppn=4,mem=16gb/*
*
*
Is this currently the only way for galaxy to inform the queuing system how
many threads a program will use?
And does this mean that without custom runners in the config file any
muti-threaded program that has multiple instances in an asychronous workflow
has the opportunity to overload a cluster node since the queuing system
doesn't "know" how many threads the program will be using?

Just want to make sure I'm not missing out on the latest and greatest method
for process management. :)

Thanks,
Andrew
*
*
Louise-Amélie Schmitt wrote:

> >
> > default_cluster_job_runner will remain for backwards compatibility, but
> > we'll ship a sample job_conf.xml that runs everything locally by
> > default.
> >
> > --nate
>
> Haha, and I did that before realizing I could do just what I needed by
> writing tool-specific *pbs*:// URLs at the end of the config file... I'm
such
> an idiot.

Haha, okay, I don't think i even noticed since I was distracted by your
implementation being a step in the way we want to go with it.

> But I really like what you did of it and I have a couple of questions.
>
> Concerning the single-threaded tools, what would happen if the number of
> threads set in the xml file was >1 ?

It'd consume extra slots, but the tool itself would just run as usual.

> Could it be possible to forbid a tool to run on a given node?

Hrm.  In *PBS* you could do it using node properties/neednodes or resource
requirements.  I'd have to think a bit about how to do this in a more
general way in the XML.

--nate

>
> Thanks,
> L-A
>
>
> >
> >>
> >> Peter
> >>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] suggestion for multithreading

2011-06-02 Thread Nate Coraor
Assaf Gordon wrote:
> (moved to galaxy-dev)
> 
> Nate Coraor wrote, On 06/02/2011 01:31 PM:
> > Peter Cock wrote:
> >> On Thu, Jun 2, 2011 at 6:23 PM, Nate Coraor  wrote:
> >>>
> >>> pbs.py then knows to translate '8' to
> >>> '-l nodes=1:ppn=8'.
> >>>
> >>> Your tool can access that value a bunch, like $__resources__.cores.
> >>>
> >>> The same should be possible for other consumables.
> >>>
> 
> Just a thought here:
> 
> The actual parameters that are passed to the scheduler are not necessarily 
> hard-coded.
> Meaning, at least with SGE, specifying the number of cores can be:
>  qsub -pe threads=8
> or
>  qsub -pe cores=8
> or
>  qsub -pe jiffies=8
> 
> and same thing for memory limitation (e.g. "-l virtual_free=800M").
> 
> The reason is that those resources (e.g. "threads", "cores", "virtual_free") 
> are just identifiers, and they are created and configured by whomever 
> installed SGE - they are not built-in or hard-coded).
> 
> So just be careful in your design/implementation when automatically 
> translating XML resources to hard-coded parameters.
> 
> If you do hard-code them, just make sure the specifically document it (i.e. 
> Galaxy expect the SGE threads parameter to be "-pe threads=8" and nothing 
> else).

Hrm, I didn't realize that SGE didn't have a standard resource name for
this.  It's probably something we can just add into the XML as "cores"
in Galaxy == "threads" in my SGE install.

Thanks for the heads up.

> 
> -gordon
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] suggestion for multithreading

2011-06-02 Thread Assaf Gordon
(moved to galaxy-dev)

Nate Coraor wrote, On 06/02/2011 01:31 PM:
> Peter Cock wrote:
>> On Thu, Jun 2, 2011 at 6:23 PM, Nate Coraor  wrote:
>>>
>>> pbs.py then knows to translate '8' to
>>> '-l nodes=1:ppn=8'.
>>>
>>> Your tool can access that value a bunch, like $__resources__.cores.
>>>
>>> The same should be possible for other consumables.
>>>

Just a thought here:

The actual parameters that are passed to the scheduler are not necessarily 
hard-coded.
Meaning, at least with SGE, specifying the number of cores can be:
 qsub -pe threads=8
or
 qsub -pe cores=8
or
 qsub -pe jiffies=8

and same thing for memory limitation (e.g. "-l virtual_free=800M").

The reason is that those resources (e.g. "threads", "cores", "virtual_free") 
are just identifiers, and they are created and configured by whomever installed 
SGE - they are not built-in or hard-coded).

So just be careful in your design/implementation when automatically translating 
XML resources to hard-coded parameters.

If you do hard-code them, just make sure the specifically document it (i.e. 
Galaxy expect the SGE threads parameter to be "-pe threads=8" and nothing else).

-gordon
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/