Re: [galaxy-dev] Galaxy sending jobs to multiple clusters

Nate Coraor Wed, 03 Feb 2016 06:57:07 -0800

On Tue, Feb 2, 2016 at 9:18 AM, Nikolay Aleksandrov Vazov <
n.a.va...@usit.uio.no> wrote:


>
> Many thanks to all of you!!
>
>
>
> Definitely Nate's approach is a better choice. We are running Slurm 14.03,
> but Nate's manual is exhaustive enough to recompile even the existing
> version. (I don't know how we can do this on a running cluster though :) I
> will most probably go for this solution.
>
>
> There is a sentence in Nate's answer  I don't really understand :
>
>
> "... using `--clusters` means you have to have your controllers integrated
> using slurmdbd, ..."
>
>
> what do you mean by this, Nate?
>

You have to run slurmdbd (it's optional) and your slurm controllers must
connect to a single slurmdbd instance. This is Slurm's accounting server.
Here's the documentation:

    http://slurm.schedmd.com/accounting.html

The setup is relatively simple, you just need to have a MySQL (or
derivative) server for it to store records in.

--nate

>
>
>
>
> Carie, I don't actually get how you implemented the hack : did you
> reduplicate the
>
>
> class DRMAAJobRunner
>
>
> under a different name in drmaa.py? And where do you define every next
> cluster (controller machines)?
>
>
> Can you give me some more detalis?
>
>
> Thank you
>
>
> Nikolay
>
>
>
>
>
>
> ===============
> Nikolay Vazov, PhD
> Department for Research Computing, University of Oslo
> ------------------------------
> *From:* Nate Coraor <n...@bx.psu.edu>
> *Sent:* 01 February 2016 17:28
> *To:* Ganote, Carrie L
> *Cc:* John Chilton; Nikolay Aleksandrov Vazov; dannon.ba...@gmail.com;
> galaxy-dev@lists.galaxyproject.org
>
> *Subject:* Re: Galaxy sending jobs to multiple clusters
>
> Hi Nikolay,
>
> It's worth noting that using `--clusters` means you have to have your
> controllers integrated using slurmdbd, and they must share munge keys. You
> can set up separate destinations as in Carrie's example without having to
> "integrate" your controllers at the slurm level. The downside of this
> approach is that you can't have slurm automatically "balance" across
> clusters, although Slurm's algorithm for doing this with `--clusters` is
> fairly primitive. If you don't use `--clusters` you can attempt to do the
> balancing with a dynamic job destination.
>
> If you're not using slurmdbd, you may still need to share the same munge
> key across clusters to allow the slurm client lib on the Galaxy server to
> talk to both clusters. There could be ways around this if it's a problem,
> though.
>
> --nate
>
> On Mon, Feb 1, 2016 at 11:10 AM, Ganote, Carrie L <cgan...@iu.edu> wrote:
>
>> Hi Nikolay,
>> The slurm branch that John mentioned sounds great! That might be your
>> best bet.
>> I didn't get drmaa to run with multiple clusters with flags, but I did
>> 'assign' different job handlers to different destinations in the drmaa.py
>> runner in Galaxy - but that is a bit of a hacky way to do it.
>>
>> -Carrie
>>
>> From: John Chilton <jmchil...@gmail.com>
>> Date: Monday, February 1, 2016 at 11:02 AM
>> To: Nikolay Aleksandrov Vazov <n.a.va...@usit.uio.no>
>> Cc: "dannon.ba...@gmail.com" <dannon.ba...@gmail.com>, "
>> galaxy-dev@lists.galaxyproject.org" <galaxy-dev@lists.galaxyproject.org>,
>> Carrie Ganote <cgan...@iu.edu>, Nate Coraor <n...@bx.psu.edu>
>> Subject: Re: Galaxy sending jobs to multiple clusters
>>
>> Nate has a branch of slurm drmaa that allows specifying a --clusters
>> argument in the native specification this can be used to target
>> multiple hosts.
>>
>> More information can be found here:
>>
>> https://github.com/natefoo/slurm-drmaa
>>
>> Here is how Nate uses it to configure usegalaxy.org:
>>
>>
>> https://github.com/galaxyproject/usegalaxy-playbook/blob/master/templates/galaxy/usegalaxy.org/config/job_conf.xml.j2
>>
>> I guess instead of installing slurm-drmaa for a package manager or the
>> default source - you will just need to install Nate's version.
>>
>> -John
>>
>>
>>
>> On Wed, Jan 20, 2016 at 1:18 PM, Nikolay Aleksandrov Vazov
>> <n.a.va...@usit.uio.no> wrote:
>>
>> Hi, John, Dan, Carrie and all others,
>>
>>
>> I am considering a task of setting up a Galaxy instance which shall send
>> jobs to more than on cluster at a time. In my case I am using drmaa-python
>> and I was wondering if it was possible to configure multiple drmaa runners
>> each "pointing" at a different (slurm) control host, e.g.
>>
>>
>> local
>>
>> drmaa1
>>
>> drmaa2
>>
>>
>> Thanks a lot for your advice
>>
>>
>> Nikolay
>>
>>
>>
>>
>> ===============
>> Nikolay Vazov, PhD
>> Department for Research Computing, University of Oslo
>>
>>
>>
>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Galaxy sending jobs to multiple clusters

Reply via email to