[slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread Manuel Rodríguez Pascual
Hi all,

Although this is not strictly related to Slurm, maybe you can recommend me
some actions to deal with a particular user.

On our small cluster, currently there are no limits to run applications in
the frontend. This is sometimes really useful for some users, for example
to have scripts monitoring the execution of jobs and taking decisions
depending on the partial results.

However, we have this user that keeps abusing this system: when the job
queue is long and there is a significant time wait, he sometimes runs his
jobs on the frontend, resulting on a CPU load of 100% and some delays on
using it for the things it is supposed to serve (user login, monitoring and
so).

Have you faced the same issue?  Is there any solution? I am thinking about
using ulimit to limit the execution time of this jobs in the frontend to 5
minutes or so. This however does not look so elegant as other users can
perform the sabe abuse on the future, and he should also be able to run low
cpu-consuming jobs for a longer period. However I am not an experienced
sysadmin so I am completely open to suggestions or different ways of facing
this issue.

Any thoughts?

cheers,




Manuel


Re: [slurm-users] Creating priority quotas

2018-01-25 Thread Manuel Rodríguez Pascual
In case any of you is interested, my team integrated DMTCP checkpoint
library with Slurm, thus allowing this preemption to be done without
loosing any computation (and some other fancy stuff).  An important use
case for us is, in fact, these VIP queues :)

Here is a link to the documentation, including tests, configuration and
howtos: https://github.com/ciemat-tic/codec/wiki/Slurm-DMTCP

cheers,

Manuel


2018-01-25 17:18 GMT+01:00 Brian Novogradac :

> Thank you for your input!
>
>
> Would any of you have an example on how you setup the conf file for the
> queue using the QOS method?  I'm reading the QOS docs as I type.
>
>
> Brian
> --
> *From:* slurm-users  on behalf of
> Loris Bennett 
> *Sent:* Thursday, January 25, 2018 11:09 AM
> *To:* Slurm User Community List
> *Subject:* Re: [slurm-users] Creating priority quotas
>
> Hi Brian,
>
> QOS is probably the right way to go.  You can set up a QOS 'vip', which
> can preempt other QOS.  We don't preempt, but use multifactor priority
> with various QOS with different priority values.  In that case, VIP jobs
> won't start immediately, but just get pushed to the front of the queue.
> On the other hand, no CPU-time is lost due to low-status jobs being
> terminated early due to preemption (although if the low-status jobs are
> able to do some form of checkpointing that will be less of an issue).
> Depending on how pushy your VIPs are, they might go for a preemptionless
> solution, too.
>
> Cheers,
>
> Loris
>
> John Hearns  writes:
>
> > Brian, not my area of expertise. Do you want 'premption' - ie the VIP
> user runs something and other jobs are pre-empted?
> > https://slurm.schedmd.com/preempt.html
> Slurm Workload Manager 
> slurm.schedmd.com
> Preemption. Slurm supports job preemption, the act of stopping one or more
> "low-priority" jobs to let a "high-priority" job run. Job preemption is
> implemented as a ...
>
>
> >
> > On 25 January 2018 at 16:27, Brian Novogradac <
> brian.novogra...@utoronto.ca> wrote:
> >
> >  I'm new to Slurm, and looking for some assistance.
> >
> >  I need to create various queues. The one i am having issues with is a
> "VIP" queue
> >
> >  I want to create a queue for a specific node that overides all jobs on
> that node when a "VIP" uses the queue.
> >
> >  We are using SSSD for our authentication system to the login node.
> >
> >  I'm looking at the QOS docs and am stumped.
> >
> >  Any help or direction much appreciated.
> >
> >  Brian Novogradac
> >
> >
> --
> Dr. Loris Bennett (Mr.)
> ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
>
>


Re: [slurm-users] "command not found"

2017-12-15 Thread Manuel Rodríguez Pascual
Hi David,

The command to be executed must be present on the node where the script is
run.  When you submit a job in Slurm only the script is copied to the slave
node, so the data and binaries must be there prior to the script execution.

The are many alternatives to deal with this situation depending on the
system and software requirements. A common practice (as far as I know, I am
not an experienced sysadmin) is to install the software in a shared storage
and mount it both on the master node and in the slaves. This is analog to
what you are probably doing with the user $HOME.

Of course this cannot always be done, and sometimes the only solution is
install a particular library or application in every node. In this case
there are many solutions to automatice the process.

Cheers,

Manuel

2017-12-15 12:21 GMT+01:00 david :

> Hi,
>
> when running a sbatch script i get "command not found".
>
>
> The command is blast (quite used bioinformatics tool).
>
>
> The problem comes from the fact that the blast binary is installed in the
> master node  but not on the other nodes. When the job runs on another node
> the binary is not found.
>
>
> What would be way to deal with this situation ? what is common practice ?
>
>
> thanks,
>
> david
>
>
>
>


Re: [slurm-users] [slurm-dev] Re: Installing SLURM locally on Ubuntu 16.04

2017-11-08 Thread Manuel Rodríguez Pascual
it looks like munge is not correctly configured, or you have some kind of
permission problems. This manual explains how to configure and test it.
https://github.com/dun/munge/wiki/Installation-Guide

good luck!

2017-11-08 14:38 GMT+01:00 Will L :

> Benjamin,
>
>
> Thanks for following up. I just tried again as you said, with the
> following result.
>
> $ sudo slurmctld -D -f /etc/slurm-llnl/slurm.conf
> slurmctld: slurmctld version 17.02.9 started on cluster cluster
> slurmctld: error: Couldn't find the specified plugin name for crypto/munge
> looking at all files
> slurmctld: error: cannot find crypto plugin for crypto/munge
> slurmctld: error: cannot create crypto context for crypto/munge
> slurmctld: fatal: slurm_cred_creator_ctx_create((null)): Operation not
> permitted
>
> Will
>
> will-landau.com
> linkedin.com/in/wlandau
> github.com/wlandau
>
> On Sun, Nov 5, 2017 at 7:18 PM, Benjamin Redling <
> benjamin.ra...@uni-jena.de> wrote:
>
>>
>> Hi Will,
>>
>> looking at your stackoverflow postings there doesn't seem to be anything
>> helpful. Did you solve your problem in the meantime?
>>
>> Am 30.10.2017 um 03:12 schrieb Will L:
>> > I am trying to install SLURM 15.08.7 locally on an Ubuntu 16.04 machine.
>> > In my case, the master and worker nodes are the same.
>> [...]
>>
>> Have you tried starting both slurmctld and slurmd in the foreground (-D)?
>> When I have real trouble with a cluster I open two terminals
>> side-by-side, set debugging in the slurm.conf to something reasonable
>> high. Then I start...
>> ... one with: slurmctld -D -f 
>> ... another with: slurmd -D -f 
>>
>> (I only remember one case where that wasn't helpful: a seemingly random
>> "user unknown" file access problem)
>>
>> Regards,
>> Benjamin
>> --
>> FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
>> ☎ +49 3641 9 44323
>>
>
>