[slurm-dev] How to apply changes in the slurm.conf

2014-06-10 Thread José Manuel Molero
Dear Slurm user,
Maybe this are dummy questions, but I can't find the response in the manual.
Recently we have installed in a cluster, the slurm 14.03 version, in a Red Hat/ 
Scientific Linux enviroment.In order to tune the configuration, we want to test 
different parameters of the slurm.confBut there are several users running 
important jobs for several days.
How can I change the configuration of slurm and restart the slurmctld without 
affecting to the users and the jobs of the users? Its also necessary restart 
the slurm daemons?Is also possible to upgrade or change the slurm version while 
there are jobs running?
Thanks in advance.
  

[slurm-dev] Re: How to apply changes in the slurm.conf

2014-06-10 Thread Barbara Krasovec

On 06/10/2014 08:24 AM, José Manuel Molero wrote:

Dear Slurm user,

Maybe this are dummy questions, but I can't find the response in the 
manual.


Recently we have installed in a cluster, the slurm 14.03 version, in a 
Red Hat/ Scientific Linux enviroment.
In order to tune the configuration, we want to test different 
parameters of the slurm.conf

But there are several users running important jobs for several days.

How can I change the configuration of slurm and restart the slurmctld 
without affecting to the users and the jobs of the users? Its also 
necessary restart the slurm daemons?
Is also possible to upgrade or change the slurm version while there 
are jobs running?


Thanks in advance.


Hello!

We apply new configuration parameters with scontrol reconfigure (first 
I arrange new slurm.conf on all nodes).


Upgrading slurm: in my experience, when upgrading to a minor release 
(e.g. from 2.6.4 to 2.6.X), it is not a problem to do it on a running 
cluster, jobs are conserved. But upgrading to a major release (e.g. from 
2.5 to 2.6), cluster has to be drained first, otherwise jobs are killed.


Cheers,
Barbara


[slurm-dev] Enforce to use srun and application logger

2014-06-10 Thread Jordi Blasco
Hi,

we are using Snoopy library (https://github.com/a2o/snoopy) in order to
monitor and collect statistics regarding to the applications used in the
HPC resources.
Since there are more than 30% of the jobs in our database without any
information in this regard, it seems that Snoopy is not capable to track
everything.

Some other tools like PerfMiner or monitor (
http://web.eecs.utk.edu/~mucci/monitor/) are used in several places, but
since it relies on PapiEx (http://icl.cs.utk.edu/~mucci/papiex/), and this
project is no longer supported, I would like to know if there is some other
approach to collect this data.

In addition to that, I would like to know if it can be possible to enforce
to use srun in the submit script. I used a sbatch wrapper before, but maybe
there is now a better way to do it.

Thanks!

Regards,

Jordi


[slurm-dev] Re: How to apply changes in the slurm.conf

2014-06-10 Thread jette


Pending and running jobs should be preserved across major releases too.

Quoting Barbara Krasovec barba...@arnes.si:


On 06/10/2014 08:24 AM, José Manuel Molero wrote:

Dear Slurm user,

Maybe this are dummy questions, but I can't find the response in the manual.

Recently we have installed in a cluster, the slurm 14.03 version,  
in a Red Hat/ Scientific Linux enviroment.
In order to tune the configuration, we want to test different  
parameters of the slurm.conf

But there are several users running important jobs for several days.

How can I change the configuration of slurm and restart the  
slurmctld without affecting to the users and the jobs of the users?  
Its also necessary restart the slurm daemons?
Is also possible to upgrade or change the slurm version while there  
are jobs running?


Thanks in advance.


Hello!

We apply new configuration parameters with scontrol reconfigure  
(first I arrange new slurm.conf on all nodes).


Upgrading slurm: in my experience, when upgrading to a minor release  
(e.g. from 2.6.4 to 2.6.X), it is not a problem to do it on a  
running cluster, jobs are conserved. But upgrading to a major  
release (e.g. from 2.5 to 2.6), cluster has to be drained first,  
otherwise jobs are killed.


Cheers,
Barbara




[slurm-dev] Re: More odd scheduler reservation behavior

2014-06-10 Thread Bill Barth

No thoughts on this from the list? I wouldn't have thought we were the
only ones encountering this issue.

Best,
Bill.
--
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu|   Phone: (512) 232-7069
Office: ROC 1.435 |   Fax:   (512) 475-9445







On 6/5/14 3:09 PM, Bill Barth bba...@tacc.utexas.edu wrote:


All,

I'm experiencing the following unexpected behavior with SLURM
reservations. If I create a reservation on some nodes and forget to point
it to a specific partition, and I update the reservation later to point at
the correct partition, it doesn't remove any nodes reserved in the wrong
partition and replace them with nodes from the partition specified.

Here's the details, first beginning with some info about the relevant
defined partitions:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
SB2.7*   up 2-00:00:00  2  down* c3-[401,421]
SB2.7*   up 2-00:00:00 26   idle c3-[402-420,422-428]
IB2.2up 2-00:00:00 12   idle c3-[501-512]


Create the reservation:

-bash-4.2$ sudo scontrol create reservation StartTime=2014-06-06T08:00:00
Duration=1:00:00 NodeCnt=4 Users=bbarth
Reservation created: bbarth_3
-bash-4.2$ scontrol show res
ReservationName=bbarth_3 StartTime=2014-06-06T08:00:00
EndTime=2014-06-06T09:00:00 Duration=01:00:00
   Nodes=c3-[402-405] NodeCnt=4 CoreCnt=64 Features=(null)
PartitionName=SB2.7 Flags=
   Users=bbarth Accounts=(null) Licenses=(null) State=INACTIVE


Observe that the nodes happen to come from the SB2.7 partition. If update
the partition on the reservation to be IB2.2, we see that the nodes from
SB2.7 are still the ones reserved:

-bash-4.2$ sudo scontrol update ReservationName=bbarth_3 Partition=IB2.2
Reservation updated.
-bash-4.2$ scontrol show res
ReservationName=bbarth_3 StartTime=2014-06-06T08:00:00
EndTime=2014-06-06T09:00:00 Duration=01:00:00
   Nodes=c3-[402-405] NodeCnt=4 CoreCnt=64 Features=(null)
PartitionName=IB2.2 Flags=
   Users=bbarth Accounts=(null) Licenses=(null) State=INACTIVE

Is this the expected behavior?

I also notice that if I drain a node it doesn't get replaced in the
reservation, and if I stop SLURM on the node (/etc/init.d/slurm stop) it
doesn't get replaced either. I would have sworn up and down that at least
the latter worked.

Can anyone provide some feedback?

Thanks,
Bill.

--
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu|   Phone: (512) 232-7069
Office: ROC 1.435 |   Fax:   (512) 475-9445





[slurm-dev] jobs killed on controller restart

2014-06-10 Thread Michael Gutteridge
We've had some trouble with curious job failures- the jobs aren't even
assigned nodes:

​   JobIDNodeList  State ExitCode
 --- -- 
​
7229124None assigned FAILED  0:1
​​


​We finally got some better log data (I'd turned it way too low) which
suggests that restarting and/or reconfiguring the controller is at the
root.  After some preliminaries (purging job records, recovering active
jobs) there will be these sorts of messages​
​:

​
[2014-06-09T23:10:15.920] No nodes satisfy job 7228909 requirements in
partition full
[2014-06-09T23:10:15.920] sched: schedule: JobId=7228909 non-runnable:
Requested node configuration is not available

​The indicated job has specified --mem and --tmp, but the values are within
the capacities for all nodes in that full partition.  Typically if a user
were to request resources exceeding those available on nodes in this
partition the submission is failed.​  It appears that this failure only
occurs for jobs with memory and/or disk constraints.  Worse yet, it's not
consistent- only seems to happen sometime.  I also cannot reproduce this in
our test environment.

A typical node configuration line looks thus:

NodeName=gizmod[51-60] Sockets=2 CoresPerSocket=6 RealMemory=48000
Weight=10 Feature=full,restart,rx200,ssd

though I've got FastSchedule=0.  Honestly it *feels* like there's a moment
where the node data isn't fully loaded from the slurmd and thus the
scheduler doesn't see any nodes that satisfy the requirements.

Thanks all...

Michael


[slurm-dev] Re: Create preemptable QOS

2014-06-10 Thread Christopher B Coffey
Hi,

I did figure out the issue with my setup and thought I’d post the fix in
case anyone was curious.  I had neglected to add the newly created qos as
a possibility for the account association.

So for me I needed to do:

sacctmgr modify account name=normal set qos=normal,free


That way a normal account could request the free qos.  Hope this helps for
someone.



On 6/3/14, 2:18 PM, Christopher B Coffey chris.cof...@nau.edu wrote:

Hi,

I’m trying to create a QOS that when specified in a jobscript, makes a job
preemptable by jobs in the normal account.  Another goal is to have this
qos, when in use, not subtract fairshare points from the user.  I tried
the following:

Slurm.conf:

PreemptType = preempt/qos
PreemptMode = REQUEUE


And:

sacctmgr add qos free PreemptMode=cluster usagefactor=0
description=Preemptable QOS, no fairshare use
sacctmgr modify qos name=normal set preempt=free

Jobs using the free qos are correctly preempted, but I get this in the
logs when jobs are submitted and are running:

Jun  3 11:33:47 head slurmctld[5078]: sched: JobId=275 has invalid QOS
Jun  3 11:33:47 head slurmctld[5078]: sched: JobId=276 has invalid QOS
Jun  3 11:33:47 head slurmctld[5078]: sched: JobId=277 has invalid QOS


Any ideas? Hopefully it’s not just a Monday detail, thank you!


Chris





[slurm-dev] Re: Enforce to use srun and application logger

2014-06-10 Thread jette



Quoting Jordi Blasco jbllis...@gmail.com:

In addition to that, I would like to know if it can be possible to enforce
to use srun in the submit script. I used a sbatch wrapper before, but maybe
there is now a better way to do it.


A job submit plugin may be your best option for that. See:
http://slurm.schedmd.com/job_submit_plugins.html


[slurm-dev] Re: Enforce to use srun and application logger

2014-06-10 Thread Bill Barth

Jordi,

It's basically impossible to force people to call srun somewhere in their
batch script. If you only want to allow the very simplest of batch
scripts, then you can grep them at job submit time with a job submit
plugin, but if their script calls a script which calls a script (etc)
which calls srun, you'll never detect that they've done what you wanted.
Worse, you'll raise false positives all the time even though the users
have done what you wanted, just some levels down.

We have a wrapper around the MPI job starters that we support (MVAPICH2
and Intel MPI) that calls the right startup mechanisms with the right
arguments. But we haven't tried to force our users to use this script. The
vast majority of them do what we want because a) we train them on it and
document it well, and b) our method is generally easier to use than the
other options. 

For monitoring, you might check out the project that I work on called TACC
Stats which provides accounting and performance monitoring for HPC jobs.
Some parts of the project are in a state a flux as we are adding new
features, but things should begin to stabilize this summer. TACC Stats
will also be working with a sister project called XALT which will also
have its first release this summer which will provide information about
executables and libraries used by HPC jobs. More information and source
code for TACC Stats can be found on GitHub, and XALT should be available
on GitHub later this summer.

git clone g...@github.com:rtevans/tacc_stats.git (this will eventually move
the the main TACC GitHub, but that's a work in progress)


Best,
Bill.



--
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu|   Phone: (512) 232-7069
Office: ROC 1.435 |   Fax:   (512) 475-9445







On 6/10/14 6:19 AM, Jordi Blasco jbllis...@gmail.com wrote:

Hi,


we are using Snoopy library (https://github.com/a2o/snoopy) in order to
monitor and collect statistics regarding to the applications used in the
HPC resources. 
Since there are more than 30% of the jobs in our database without any
information in this regard, it seems that Snoopy is not capable to track
everything.


Some other tools like PerfMiner or monitor
(http://web.eecs.utk.edu/~mucci/monitor/) are used in several places, but
since it relies on PapiEx (http://icl.cs.utk.edu/~mucci/papiex/),
 and this project is no longer supported, I would like to know if there
is some other approach to collect this data.


In addition to that, I would like to know if it can be possible to
enforce to use srun in the submit script. I used a sbatch wrapper before,
but maybe there is now a better way to do it.


Thanks!


Regards,


Jordi 



[slurm-dev] Fairshare=parent on an account: What should it do?

2014-06-10 Thread Ryan Cox


We're trying to figure out what the intended behavior of 
Fairshare=parent is when set on an account 
(http://bugs.schedmd.com/show_bug.cgi?id=864).  We know what the actual 
behavior is but we're wondering if anyone actually likes the current 
behavior.  There could be some use case out there that we don't know about.


For example, you can end up with a scenario like the following:
   acctProf
/|\
   / | \
 acctTA(parent)   uD(5)uE(5)
   /   |   \
  /|\
uA(5) uB(5) uC(5)


The number in parenthesis is Fairshare according to sacctmgr.  We 
incorrectly thought that Fairshare=parent would essentially collapse the 
tree so that uA - uE would all be on the same level.  Thus, all five 
users would each get 5 / 25 shares.


What actually happens is you get the following shares at the user level:
shares (uA) = 5 / 15 = .333
shares (uB) = 5 / 15 = .333
shares (uC) = 5 / 15 = .333
shares (uD) = 5 / 10 = .5
shares (uE) = 5 / 10 = .5

That's pretty far off from each other, but not as far as it would be if 
one account had two users and the other had forty.  Assuming this 
demonstration value of 5 shares, that would be:

user_in_small_account = 5 / (2*5) = .5
user_in_large_account = 5 / (40*5) = .025

Is that actually useful to someone?

We want to use subaccounts below a faculty account to hold, for example, 
a grad student or postdoc who teaches a class.  It would be nice for the 
grad student to have administrative control over the subaccount since he 
actually knows the students but not have it affect priority calculations.


Ryan

--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University