Re: [slurm-users] Excessive use of backfill on a cluster

2018-11-24 Thread Chris Samuel
On Thursday, 22 November 2018 2:21:45 AM AEDT Baker D. J.  wrote:

> Hi Chris,

Hi David,

> Our SchedulerParameters are...
> 
> SchedulerParameters = bf_window=3600,bf_resolution=180,bf_max_job_user=4
>
> I gather that the "bf_window" should be as high as the highest maximum time
> limit on the partitions (set at 2.5 days = 3600 minutes).

I've set it ours to be twice the length of time of max walltime, so (if I'm 
understanding correctly) a job of that length can be planned to fit in at a 
point when a job also of that size which had just started will finish.

So ours is (amongst others): bf_window=23040,bf_resolution=600

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] About x11 support

2018-11-24 Thread Chris Samuel
On Saturday, 24 November 2018 9:12:26 AM AEDT Mark Hahn wrote:

> I think it makes sense.  Traditionally, DISPLAY=:0 means "the X server on
> the machine where the client is running".  You can trivally
>   export DISPLAY=`hostname`$DISPLAY
> and Slurm will be happy, won't it?  IE, you have given it an actual
> network-aware DISPLAY setting.

Slurm will be happy, but your X server may not be...

chris@quad:~$ echo $DISPLAY
:0
chris@quad:~$ export DISPLAY=localhost:0
chris@quad:~$ xterm
xterm: Xt error: Can't open display: localhost:0

So you'll need to add an xauth cookie for that trick to work.

> But isn't the case being discussed where the submit host actually is
> running an X server, and also on the same (trusted/routable) network
> as the compute node?
> 
> In which case you don't want Slurm doing anything at all.  Just let the
> X client read DISPLAY from the environment propagated by Slurm.

Yes, but see above.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] About x11 support

2018-11-23 Thread Chris Samuel
Hi Mahmood,

On Saturday, 24 November 2018 6:52:54 AM AEDT Mahmood Naderan wrote:

> >I suspect if you do:
> >echo $DISPLAY
> >it will say something like :0 and Slurm doesn't allow that at present.
> 
> Actually that is not applicable here. Please see below
> 
> [mahmood@rocks7 ~]$ echo $DISPLAY
> 
> :1

Sadly that's exactly what I'm saying.   Your $DISPLAY variable is : followed 
by a number and that's what I'm saying that Slurm forbids, though I'm not 
clear why.   The code checks like this:

if (display[0] == ':') {
error("Cannot forward to local display. "
  "Can only use X11 forwarding with network displays.");
exit(-1);
}

As your $DISPLAY starts with a : (signifying a local display) it will be 
rejected.

That code was introduced in a block by this commit so unfortunately there's no 
reasoning for this given.

commit e3140b7f8d96ced9dc85089caa65dd7c6be396fd
Author: Tim Wickberg 
Date:   Wed Sep 20 12:09:34 2017 -0600

Add new x11_util.c file to src/common.

Utility functions for new x11 forwarding implementation.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] About x11 support

2018-11-23 Thread Chris Samuel
On Friday, 23 November 2018 7:34:42 PM AEDT Mahmood Naderan wrote:

> Now, the question is, why the following error happens when we now that x11
> support had been enabled during the compilation.
> 
> [mahmood@rocks7 ~]$ srun --x11 --nodelist=compute-0-5 -n 1 -c 6 --mem=8G -A
> y8 -p RUBY xclock
> srun: error: Cannot forward to local display. Can only use X11 forwarding
> with network displays.

As explained elsewhere in this thread, it's a limitation in Slurm.

I suspect if you do:

echo $DISPLAY

it will say something like :0 and Slurm doesn't allow that at present.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] new user; ExitCode reporting

2018-11-23 Thread Chris Samuel
On Friday, 23 November 2018 10:21:09 PM AEDT Matthew Goulden wrote:

> I've spent some time reading through the (excellent, frankly) documentation
> for sbatch and job_exit_code and while learning a great deal nothing has
> explained with anomaly.

I suspect Slurm is trying to be helpful, as exit codes > 128 are usually the 
result of a process being terminated by signal N + 128, so sacct subtracts 128 
from exit values greater than 128.   The bash manual page says:

   The return value of a simple command is its exit status, or 128+n if
   the command is terminated by signal n.

This is what sacct does (it appears the right value is in the DB):

if (exit_code != NO_VAL) {
if (WIFSIGNALED(exit_code))
tmp_int2 = WTERMSIG(exit_code);
else if (WIFEXITED(exit_code))
tmp_int = WEXITSTATUS(exit_code);
if (tmp_int >= 128)
tmp_int -= 128;
}

For you 128+13 = 141.

*If* your job uses srun you can ask Slurm to tell you the DerivedExitCode, but 
that will be the highest exit code from all the invocations, but it will be 
your expected number as it's not converted by sacct.

$ sbatch --wrap 'srun bash -c "exit 141"'
Submitted batch job 1795583

$ sacct -j 1795583
   JobIDJobName  PartitionAccount  AllocCPUS  State ExitCode
 -- -- -- -- -- 
1795583wrapskylake   hpcadmin  1 FAILED 13:0
1795583.bat+  batch  hpcadmin  1 FAILED 13:0
1795583.ext+ extern  hpcadmin  1  COMPLETED  0:0
1795583.0  bash  hpcadmin  1 FAILED 13:0

$ sacct -j 1795583 -o jobid,jobname,state,derivedexitcode -X
   JobIDJobName  State DerivedExitCode
 -- -- ---
1795583wrap FAILED   141:0


Hope that helps!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] $TMPDIR does not honor "TmpFS"

2018-11-22 Thread Chris Samuel
On Thursday, 22 November 2018 9:26:13 PM AEDT Christoph Brüning wrote:

> Hi Chris,

Hi Christoph!

[...]
> I was wondering if constantly making and deleting XFS projects has a
> considerable impact on performance and stability. So I'd be glad if you
> could share some of your experience with that setup.

It's been pretty transparent to the users, the local disks on the nodes are 
only used for local scratch (the root filesystem is mounted from Lustre with 
some neat hacks to OneSIS and the kernel from our Lustre guru) so there's very 
little competition for the SSDs.

> Also, would you mind providing access to your prolog and epilog scripts?

Attached!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
#!/bin/bash

if [ "${SLURM_RESTART_COUNT}" == "" ]; then
   SLURM_RESTART_COUNT=0
fi

JOBSCRATCH=/jobfs/local/slurm/${SLURM_JOB_ID}.${SLURM_RESTART_COUNT}

# Create a temporary directory and set an XFS quota on it to match the 
requested --tmp (or 100MB if not set)
if [ -d ${JOBSCRATCH} ]; then
exec > >(tee "/tmp/quota.log") 2>&1
set -x
QUOTA=$(/apps/slurm/latest/bin/scontrol show JobId=${SLURM_JOB_ID} | 
egrep MinTmpDiskNode=[0-9] | awk -F= '{print $NF}')
if [ "${QUOTA}" == "0" ]; then
QUOTA=100M
fi
/usr/sbin/xfs_quota -x -c "project -s -p ${JOBSCRATCH} ${SLURM_JOB_ID}" 
/jobfs/local
/usr/sbin/xfs_quota -x -c "limit -p bhard=${QUOTA} ${SLURM_JOB_ID}" 
/jobfs/local

# Set up a directory to be used as ${JOBFS}
/bin/mkdir ${JOBSCRATCH}/var_tmp/jobfs
/bin/chown --reference=${JOBSCRATCH}/var_tmp/ 
${JOBSCRATCH}/var_tmp/jobfs -v
set +x
else
echo "$(date): TMPDIR ${JOBSCRATCH} not there" >> 
/jobfs/local/slurm/slurmdprologfail.txt
fi


exit 0
#!/bin/bash
#
# Remove job's scratch directory

if [ "${SLURM_RESTART_COUNT}" == "" ]; then
   SLURM_RESTART_COUNT=0
fi

JOBSCRATCH=/jobfs/local/slurm/${SLURM_JOB_ID}.${SLURM_RESTART_COUNT}
SHMSCRATCH=/dev/shm/slurm/${SLURM_JOB_ID}.${SLURM_RESTART_COUNT}

# Delete the scratch directory for the job (as long as it exists)
test -d ${JOBSCRATCH} && rm -rf ${JOBSCRATCH}
test -d ${SHMSCRATCH} && rm -rf ${SHMSCRATCH}

# Exit OK here to prevent the node getting marked down.

exit 0


Re: [slurm-users] About x11 support

2018-11-22 Thread Chris Samuel
On Thursday, 22 November 2018 9:24:50 PM AEDT Tina Friedrich wrote:

> I really don't want to start a flaming discussion on this - but I don't
> think it's an unusual situation.

Oops sorry, I wasn't intending to imply it wasn't a valid way to do it, it's 
just that across the many organisations I've helped with HPC systems down here 
it's not something I'd come across before.   Even the couple that had common 
authN/authZ configs between user workstations and clusters had the management 
nodes firewalled off so the only access to the batch system was by ssh into 
the login nodes of the cluster.

I think it's good to hear from sites where this is the case because we can 
easily get stuck in our own little bubbles until something comes and trips us 
up like that.

All the best!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Excessive use of backfill on a cluster

2018-11-20 Thread Chris Samuel
On Tuesday, 20 November 2018 11:42:49 PM AEDT Baker D. J.  wrote:

> We are running Slurm 18.08.0 on our cluster and I am concerned that Slurm
> appears to be using backfill scheduling excessively.

What are your SchedulerParameters ?

All the beest,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] About x11 support

2018-11-20 Thread Chris Samuel
On Wednesday, 21 November 2018 12:16:04 AM AEDT Mahmood Naderan wrote:

> So, I am *guessing* that the latest version of slurm is not compatible with
> 1804 from Centos. In other word, something has been added/fixed in the ssh
> library which is now causing some mismatches.

It's not getting that far, you first need to fix the problem that causes this 
error:

srun: error: Cannot forward to local display. Can only use X11 forwarding with 
network displays.

as srun looks likes it's stopping for you there with the required --x11 flag.


We're running 18.05.3, built from source, with CentOS 7.5.  Haven't gone to 7.6 
yet.

One thing I just realised I'd not mentioned is that for this to work the user
needs to be able to SSH from the compute node back into the login node without
being prompted for any reason.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] About x11 support

2018-11-20 Thread Chris Samuel
On Wednesday, 21 November 2018 2:27:15 AM AEDT Christopher Benjamin Coffey 
wrote:

> Are you using the built in slurm x11 support? Or that spank plugin? We
> haven't been able to get the right combo of things in place to get the
> built in x11 to work.

We're using the built in X11 support with SSH host based authentication 
including from compute nodes back into the login node (that's important)!

Also you need to have configured your /etc/ssh/ssh_known_hosts files so the 
ssh client doesn't prompt to confirm host keys.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] About x11 support

2018-11-20 Thread Chris Samuel
On Tuesday, 20 November 2018 2:51:26 AM AEDT Mahmood Naderan wrote:

> With and without --x11, I am not able to see xclock on a compute node.
> 
> [mahmood@rocks7 ~]$ srun --x11 --nodelist=compute-0-3 -n 1 -c 6 --mem=8G -A
> y8 -p RUBY xclock
> srun: error: Cannot forward to local display. Can only use X11 forwarding
> with network displays.

So that looks like for some reason your display is set to :0 (or similar). Are 
you by some chance trying to run this on an X server on the console of rocks7?

> [mahmood@rocks7 ~]$ rocks run host compute-0-3 "yum list libssh2-devel"
> Warning: untrusted X11 forwarding setup failed: xauth key data not generated

That also looks like an error you should look into fixing first.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Slurm missing non primary group memberships

2018-11-20 Thread Chris Samuel
On Tuesday, 20 November 2018 10:12:59 PM AEDT Janne Blomqvist wrote:

> I reworked the logic so that it should only be required in some special
> weird cases. But that patch was several years ago, hopefully whatever
> bugs were caused by it have been ironed out by now (*knocking on wood*).

It's worked well for us at Swinburne (17.11.x and now 18.08.x) running with 
sssd and enumeration disabled.  Not a vast number of users though!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] About x11 support

2018-11-19 Thread Chris Samuel
On Sunday, 18 November 2018 4:24:08 AM AEDT Mahmood Naderan wrote:

>  >What does this command say?
> >
> >scontrol show config | fgrep PrologFlags
> 
> [root@rocks7 ~]#  scontrol show config | fgrep PrologFlags
> PrologFlags = Alloc,Contain,X11
> 
> That means x11 has been compiled in the code (while Werner created the
> roll).

No, that means that you've got the PrologFlags set in your slurm.conf, I'm 
afriad it doesn't mean Slurm is compiled with X11 support.

> >Check your slurmd logs on the compute node.  What errors are there?
> 
> In one terminal, I run the following command
> 
> [mahmood@rocks7 ~]$ srun --nodelist=compute-0-5 -n 1 -c 6 --mem=8G -A y8 -p
> RUBY xclock
> Error: Can't open display :1
> srun: error: compute-0-5: task 0: Exited with exit code 1

You forgot the --x11 flag to srun!

Can you try again with that flag please?

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] About x11 support

2018-11-16 Thread Chris Samuel
On Friday, 16 November 2018 10:26:31 PM AEDT Mahmood Naderan wrote:

> So, is it still possible to use spank even when the code is compiled for
> x11?

No. You need to recompile Slurm without X11 support.

What does this command say?

scontrol show config | fgrep PrologFlags

> Does that mean everything is ok?
> I wonder why the second command fails?

Check your slurmd logs on the compute node.  What errors are there?

> >Another thing is we had to set:
> >  * X11Parameters=local_xauthority
> 
> Where? sshd config file?

No, that's in slurm.conf.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] About x11 support

2018-11-15 Thread Chris Samuel
On Thursday, 15 November 2018 9:36:08 PM AEDT Mahmood Naderan wrote:

> Is there any update about native support of x11 in slurm v18?

It works here...

$ srun --x11 xdpyinfo   
srun: job 1744869 queued and waiting for resources
srun: job 1744869 has been allocated resources
name of display:localhost:57.0
version number:11.0
vendor string:The X.Org Foundation
vendor release number:11906000
X.Org version: 1.19.6
[...]

Remember that the internal version in Slurm uses libssh2 and so that imposes 
some restrictions on the types of keys that can be used, i.e. they have to be 
RSA keys.

Extra info here: https://slurm.schedmd.com/faq.html#x11

You can (apparently) still use the external plugin if you build Slurm without 
its internal X11 support.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] accrue_cnt underflow

2018-11-12 Thread Chris Samuel
On Tuesday, 6 November 2018 1:02:02 AM AEDT kamil wrote:

> Any idea what these mean and how to handle it?

No, but we've just upgraded and see the same.  I've opened a bug:

https://bugs.schedmd.com/show_bug.cgi?id=6016

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] constraints question

2018-11-12 Thread Chris Samuel
On Monday, 12 November 2018 9:30:27 AM AEDT Christopher Samuel wrote:

> Thanks! That's planned for us today (though we're not using constraints)

18.08.3 fixed it here, I now get the error I expected instead.

$ srun -C "broadwell|haswell" hostname
srun: error: Unable to allocate resources: Invalid feature specification

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Reserving a GPU

2018-11-11 Thread Chris Samuel
On Tuesday, 6 November 2018 5:30:31 AM AEDT Christopher Benjamin Coffey wrote:

> Can anyone else confirm that it is not possible to reserve a GPU? Seems a
> bit strange.

This looks like the bug that was referred to previously.

https://bugs.schedmd.com/show_bug.cgi?id=5771

Although looking at the manual page for scontrol in the current master it only 
says:

   TRES=
  Comma-separated list of TRES required for the reservation. Current
  supported TRES types with reservations are: CPU, Node, License and
  BB.

But it's early days yet for that release..

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] constraints question

2018-11-11 Thread Chris Samuel
On Tuesday, 6 November 2018 11:06:43 PM AEDT Tina Friedrich wrote:

> So what am I doing wrong with the 'or'?

I don't have node features defined (other than for KNL nodes), so I can't test 
your scenario, but I do see similar as I get the error:

$ srun -C "broadwell|haswell" --pty /bin/bash
srun: error: Unable to allocate resources: Invalid KNL configuration (MCDRAM 
or NUMA option)

but when using and I get:

$ srun -C "broadwell&haswell" --pty /bin/bash 
srun: error: Unable to allocate resources: Invalid feature specification

Just a rough guess as I don't have time to try and trace the code, but the 
ESLURM_INVALID_KNL error only occurs in the KNL plugin, whereas 
ESLURM_INVALID_FEATURE occurs in slurmctld code so I'm wondering if the main 
code tries to satisfy the constraints and (for me) fails, and so drops through 
to the KNL plugin to check that and that gets the last say on what the Slurm 
error number could be.

That would imply that for some reason your node features aren't getting 
properly processed, either by the presence of the KNL plugin or for some other 
reason.   It would be worth trying to disable the KNL plugin and see what 
effect that has.

Best of luck!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Slurm missing non primary group memberships

2018-11-09 Thread Chris Samuel
On Friday, 9 November 2018 2:47:51 AM AEDT Aravindh Sampathkumar wrote:

> navtp@console2:~> ssh c07b07 id
> uid=29865(navtp) gid=510(finland) groups=510(finland),508(nav),5001(ghpc)
> context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

Do you have SElinux configured by some chance?

If so you might want to check if it works with it disabled first..

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] bug 2119 with slurm 18.08.2

2018-11-09 Thread Chris Samuel
On Saturday, 10 November 2018 6:22:26 AM AEDT Brian Andrus wrote:

> There are no firewalls and I have always been able to do 'sacctmgr show
> clusters' as well as things like  'squeue -M ALL' from both the db
> server and the cluster head.

What does "sacctmgr list clusters" say for you?

Remember just because you can run squeue on the DB server and talk to the 
control daemon doesn't mean that the slurmctld has told the slurmdbd to use 
that same working IP address that squeue is getting via slurm.conf.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] virtual memory limit exceeded

2018-11-09 Thread Chris Samuel
On Friday, 9 November 2018 2:16:48 PM AEDT Noam Bernstein wrote:

> Can anyone shed some light on where the _virtual_ memory limit comes from? 
>
> We're getting jobs killed with the message
> slurmstepd: error: Step 3664.0 exceeded virtual memory limit (79348101120 > 
> 72638634393), being killed
>
> Is this a limit that's dictated by cgroup.conf

It's not cgroups, that is enforced by the kernel instead, whereas this
is Slurm monitoring jobs and deciding it's used too much memory
and it needs to kill it.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] bug 2119 with slurm 18.08.2

2018-11-08 Thread Chris Samuel
On Friday, 9 November 2018 5:38:22 AM AEDT Brian Andrus wrote:

> Where, slurmctld is not picking up new accounts unless it is restarted.

This is usually because slurmdbd cannot connect back to the slurmctld on the 
management node to do the RPC to tell it that a new account/user/etc has 
appeared.   When you restart slurmctld it connects to slurmdbd and grabs all 
that information.  That can be because either slurmctld has registered an IP 
address for itself that slurmdbd cannot connect to or because of intervening 
firewalls/ACLs.

Check that the connection can be made, you can see the IP address & port 
number that slurmctld has registered with "sacctmgr show clusters".

Best of luck!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Accounting - running with 'wrong' account on cluster

2018-11-07 Thread Chris Samuel
On Wednesday, 7 November 2018 3:46:01 PM AEDT Brian Andrus wrote:

> Ah. I was getting ahead of myself. I used 'limits' and I have no limits
> configured, only associations. Changed it to just associations and all is
> good.

Excellent! Well spotted..

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Seff error with Slurm-18.08.1

2018-11-06 Thread Chris Samuel

On 6/11/18 7:49 pm, Baker D.J. wrote:

The good new is that I am assured by SchedMD that the bug has been fixed 
in v18.08.3.


Looks like it's fixed in this commmit.

commit 3d85c8f9240542d9e6dfb727244e75e449430aac
Author: Danny Auble 
Date:   Wed Oct 24 14:10:12 2018 -0600

Handle symbol resolution errors in the 18.08 slurmdbd.

Caused by b1ff43429f6426c when moving the slurmdbd agent internals.

Bug 5882.


Having said that we will probably live with this issue 
rather than disrupt users with another upgrade so soon .


An upgrade to 18.08.3 from 18.08.1 shouldn't be disruptive though, 
should it?  We just flip a symlink and the users see the new binaries, 
libraries, etc immediately, we can then restart daemons as and when we 
need to (in the right order of course, slurmdbd, slurmctld and then 
slurmd's).


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] Default value inconsistency between CompleteWait and KillWait in slurm.conf docs

2018-11-04 Thread Chris Samuel
On Monday, 5 November 2018 1:13:56 PM AEDT Kevin Buckley wrote:

> which would appear to be suggesting that the default for CompleteWait
> should be 32 seconds.

Not quite, the documentation says:

# To provide jobs with the minimum response time, a value of zero is
# recommended (no waiting). To minimize fragmentation of resources,
# a value equal to KillWait plus two is recommended.

So the default is 0 to match the first recommendation, but sites can choose 
the second recommendation should they want to.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] How to get num_nodes in job_submit.lua

2018-11-03 Thread Chris Samuel
On Sunday, 4 November 2018 12:27:59 AM AEDT Ade Fewings wrote:

> I think that, if specified, the ‘sbatch -N’ number comes through as
> min_nodes (& max_nodes if relevant) in the job descriptor.

That's correct, it does.  For instance you're using Lua it's:

job_desc.min_nodes
job_desc.max_nodes

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] slurmstepd crash 18.03 when using pmi2 interface

2018-11-02 Thread Chris Samuel
On Friday, 2 November 2018 11:06:11 PM AEDT Martijn Kruiten wrote:

> We pinpointed it to `ConstrainDevices=yes` in cgroup.conf. The solution
> was to set `/dev/*` in cgroup_allowed_devices_file.conf. We did not
> have anything there. We're now looking into the specific device that is
> needed by pmi2.

This is what we have working (with 17.11.x):

/dev/null
/dev/urandom
/dev/zero
/dev/sda*
/dev/cpu/*/*
/dev/pts/*
/dev/ram
/dev/random
/dev/hfi*

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] slurmdbd: error: mysql_query failed: 1064

2018-10-30 Thread Chris Samuel
On Wednesday, 31 October 2018 5:08:13 AM AEDT 宋亚磊 wrote:

> Dear Chris,

Hi Yalei,

> Thank you very much!
> I downgraded the MySQL and reinstalled SLURM, it works now!

Wonderful!  So glad that fixed it for you.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] slurmdbd: error: mysql_query failed: 1064

2018-10-30 Thread Chris Samuel
On Tuesday, 30 October 2018 6:38:59 PM AEDT 宋亚磊 wrote:

> slurmdbd: error: mysql_query failed: 1064 You have an error in your SQL
> syntax; check the manual that corresponds to your MySQL server version for
> the right syntax to use near 'desc' at line 1

Unfortunately that seems to look like an incompatibility between slurmdbd and
that version of MySQL.   The syntax was deprecated in 8.0.12 and removed
just one point release later in 8.0.13! :-(

The release notes for MySQL 8 say:

https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-13.html#mysqld-8-0-13-sql-syntax



SQL Syntax Notes

Incompatible Change: The deprecated ASC or DESC qualifiers for GROUP By
clauses have been removed. Queries that previously relied on GROUP BY sorting
may produce results that differ from previous MySQL versions. To produce a
given sort order, provide an ORDER BY clause.



So you'll probably need to back up, downgrade MySQL and then
reimport your backup before restarting slurmdbd.

If you have a support contract I would strongly recommend
opening a bug on this with SchedMD.

Best of luck,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-25 Thread Chris Samuel
On Thursday, 25 October 2018 6:13:52 PM AEDT Ole Holm Nielsen wrote:

> Nice command, Chris!  I added a couple of usernames from CentOS 7 as
> seen below.

We're on CentOS7 too (for compute nodes), I guess we're a bit more minimal.

> However, defunct processes seem to escape cgroups, for example:

We don't see those here, but defunct (zombie) processes don't really exist; 
they're just caching the exit status until their parent gets around to 
wait()ing for them and then they can be reaped.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Slurm not building with HWLOC 2.0.2

2018-10-24 Thread Chris Samuel
On Wednesday, 24 October 2018 7:16:58 PM AEDT Andreas Henkel wrote:

> PS: sorry, I missed to tell the SLurm-Version: it's 17.11.7

It's always worth checking the NEWS file in git for changes after the release 
you're on in case it's since been fixed.

https://github.com/SchedMD/slurm/blob/slurm-17.11/NEWS

For Slurm 17.11.9 it says:

 -- Enable support for hwloc version 2.0.1.

So you'll need to upgrade.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-24 Thread Chris Samuel
On Wednesday, 24 October 2018 8:20:26 PM AEDT Chris Samuel wrote:

> However, we've seen it now too.

For extra LULZ it's not consistent, I've even got two users shells, one not 
constrained and the other constrained on the same compute node, both started 
today (in the last 6 hours). :-/

Nothing reported in syslog either.

We're on 17.11.7 (for the moment, starting to plan upgrade to 18.08.x).

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-24 Thread Chris Samuel
On Friday, 24 August 2018 7:00:05 PM AEDT Christian Peter wrote:

> we're using the same distro.

Yeah, I think (because of the way we run things) whilst the RPM upgrade for 
systemd was installed into the OS image and sync'd out to the nodes systemd 
hadn't been restarted so we'd not noticed it by then.

However, we've seen it now too.

We'll try the disable/mask trick for `systemd-logind` too.

cheers!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Socket timed out on send/recv operation

2018-10-20 Thread Chris Samuel
On Friday, 19 October 2018 4:58:37 AM AEDT Kirk Main wrote:

> I'm a new administrator to Slurm and I've just got my new cluster up and
> running. We started getting a lot of "Socket timed out on send/recv
> operation" errors when submitting jobs, and also if you try to "squeue"
> while others are submitting jobs. The job does eventually run after about a
> minute, but the entire system feels very sluggish and obviously this isn't
> normal. Not sure whats going on here...

Hmm, you're trying to do HA for Slurm with NFS.  I suspect that's going to be 
killing you unless your NFS server is very very fast.

>From conversations I've had with folks in the past if you want to do HA you 
need shared storage that can sustain a lot of IOPS for it to really be usable.

Try it without HA first *AND* use local disk for your state directory, to see 
if the problem goes away.   If it does then you know you're going to need to 
find a different way to do that storage that in future if you really want to 
do HA.

If it doesn't go away then you'll know there's something more fundamental 
going on, but from what you describe it really does sound like NFS latencies 
are the problem here.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Can frequent hold-release adversely affect slurm?

2018-10-20 Thread Chris Samuel
On Friday, 19 October 2018 3:59:17 AM AEDT Daniel Letai wrote:

> Do you have any recommendations, or might suggest a better approach to solve
> this problem?

Not sure it will help, but you can specify:

bf_max_job_array_resv=#

to tell Slurm how many array elements to do forward reservations for.  It 
won't affect how many Slurm will start per cycle though so I suspect it's not 
a useful feature.

I was hoping the Slurm "Workload Characterization Key" (WCKey) feature would 
help, but it looks like it's only a way to label jobs for reporting purposes 
rather than making scheduling decisions on them.

I'm out of ideas sorry!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] requesting entire vs. partial nodes

2018-10-20 Thread Chris Samuel
On Saturday, 20 October 2018 9:57:16 AM AEDT Noam Bernstein wrote:

> If not, is there another way to do this?

You can use --exclusive for jobs that want whole nodes.

You will likely also want to use:

SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE

to ensure jobs are given one core (with all its associated threads) per task.

Also set DefMemPerCPU so that jobs get allocated a default amount of RAM per 
core if they forget to ask for it.

> And however we achieve this, how does slurm decide what order to assign
> nodes to jobs in the presence of jobs that don't take entire nodes.  If we
> have a 2 16 core nodes and two 8 task jobs, are they going to be packed
> into a single node, or each on its own node (leaving no free node for
> another 16 task job that requires an entire node)?

As long as you don't use CR_LLN (least loaded node) as your select parameter 
and you don't use pack_serial_at_end in SchedulerParameters then Slurm (I 
believe) is meant to use a best fit algorithm.

However, the thing that can still happen is that when you have lots of 
variable size jobs with very different walltimes you can start off with a 
nicely packed system at the beginning but holes then open up as jobs finish. 
So hopefully you'll have a nice mix of job sizes that will fit those holes.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Resource sharing between different clusters

2018-10-19 Thread Chris Samuel
On Saturday, 20 October 2018 6:00:55 AM AEDT Cao, Lei wrote:

> Yes but I was a little confused by it. Does the cluster being shared run its
> own slurmctld and slurmds on its nodes, or it has to run multiple sets of
> slurmds, each of which belongs to a cluster that is sharing it?

My understanding (having never tried federation) is that each cluster will run 
its own slurmctld's and slurmds, but they must share the same slurmdbd.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Cgroups and swap with 18.08.1?

2018-10-19 Thread Chris Samuel
On Tuesday, 16 October 2018 2:47:34 PM AEDT Bill Broadley wrote:

> AllowedSwapSpace=0
> 
> So I expect jobs to not use swap.  Turns out if I run a 3GB ram process with
> sbatch --mem=1000 I just get a process that uses 1GB ram and 2GB of swap.

That's intended.  The manual page says:

   AllowedSwapSpace=
  Constrain  the  job cgroup swap space to this percentage of the
  allocated memory.  The default value is 0, which means that
  RAM+Swap will  be  limited  to  AllowedRAMSpace. 

You probably want this as well:

   MemorySwappiness=
  Configure the kernel's priority for swapping out anonymous pages
  (such as program data) verses file cache pages for the job
  cgroup.  Valid  values are  between  0 and 100, inclusive. A
  value of 0 prevents the kernel from swapping out program data.

cheers!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Job walltime

2018-10-18 Thread Chris Samuel
On Wednesday, 17 October 2018 10:10:07 PM AEDT Andy Georges wrote:

> I am wondering is there is a way to set the job walltime in the job
> environment (to set $PBS_WALLTIME). It’s unclear to me how this information
> can be retrieved on the worker node, e.g., in the SPANK environment
> (prolog, or in each job step).

You can set arbitrary variables for a user from the task prolog script.

A quick *untested* example hack (caveat emptor, batteries not included):

echo "export SLURM_WALLTIME=$(squeue -j ${SLURM_JOB_ID} -o %l -h | head -n1)"

WARNING:  the head -n1 is there because if the job is the first element of a 
job array it'll return the walltimes of every element in the job array, not 
just the element in question.

Note that this means that jobs of longer than 1 day will get reported in the 
day-hour:minute:second format, for example "6-16:00:00".

Hope this helps!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] slurmdbd not showing job accounting

2018-10-17 Thread Chris Samuel
On Sunday, 14 October 2018 3:30:39 PM AEDT Steven Dick wrote:

> I've found that when creating a new cluster, slurmdbd does not
> function correctly right away.  It may be necessary to restart
> slurmdbd at several points during the slurm installation process to
> get everything working correctly.

That's... odd.  I've never seen that.

Worth trying by hand on a clean install running slurmdbd like this:

slurmdbd -Dvvv

to see if there's anything obvious showing up in the debug logs to indicate 
some problems.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] x11 forwarding not available?

2018-10-16 Thread Chris Samuel
On Wednesday, 17 October 2018 12:04:05 AM AEDT Jeffrey Frey wrote:

> Make sure you're using RSA keys in users' accounts

We use SSH's host based authentication instead (along with pam_slurm_adopt on 
compute nodes so users can only get into nodes they have a job on).

X11 forwarding works here.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] SLURMDBD fails trying to talk to MariaDB - Help debugging configuration

2018-10-11 Thread Chris Samuel

On 12/10/18 07:58, Aravindh Sampathkumar wrote:

I'm trying to setup a SLURM cluster in a virtual environment before 
actually deploying it for serious work. I hit a snag where Slurmdbd 
fails soon after starting because of trouble connecting to MariaDB.


I don't see any errors there, just that systemd is killing slurmdbd for 
some reason.


What happens if you run slurmdbd by hand as root? Like this:

slurmdbd -D -

That should run it in the foreground and output debug info to the screen.

--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] Heterogeneous job one MPI_COMM_WORLD

2018-10-10 Thread Chris Samuel

On 11/10/18 01:27, Christopher Benjamin Coffey wrote:


That is interesting. It is disabled in 17.11.10:


Yeah, I seem to remember seeing a commit that disabled in 17.11.x.

I don't think it's meant to work before 18.08.x (which is what the 
website will be talking about).


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] Heterogeneous job one MPI_COMM_WORLD

2018-10-10 Thread Chris Samuel

On 10/10/18 05:07, Christopher Benjamin Coffey wrote:


Yet, we get an error: " srun: fatal: Job steps that span multiple
components of a heterogeneous job are not currently supported". But
the docs seem to indicate it should work?


Which version of Slurm are you on?  It was disabled by default in
17.11.x (and I'm not even sure it works if you enable it there) and
seems to be enabled by default in 18.08.x.

To see check the _enable_pack_steps() function src/srun/srun.c

All the best,
Chris (currently away in the UK)
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08

2018-10-01 Thread Chris Samuel
On Saturday, 29 September 2018 1:18:24 AM AEST Ole Holm Nielsen wrote:

> Does anyone have a good explanation of usage of the Archive and Purge
> features for the Slurm database?  For example, how can the archived data
> be used for accounting etc.?

I've never archived data in Slurm, at VLSCI we would get reporting questions 
about usage (even after systems had been decommissioned) that we needed to go 
back to get data out of Slurm for.  Luckily we had some beefy Percona MySQL 
servers in a cluster!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08

2018-09-26 Thread Chris Samuel
On Tuesday, 25 September 2018 11:54:31 PM AEST Baker D. J.  wrote:

> That will certainly work, however the slurmctld (or in the case of my test
> node, the slurmd) will be killed. The logic is that at v17.02 the slurm rpm
> provides slurmctld and slurmd. So upgrading that rpm will destroy/kill the
> existing slurmctld or slurmd processes.

If you do that with the --noscripts then will it really kill the process?  
Nothing should invoke the systemd commands with that, should it?  Or do you 
mean taking the libraries, etc, away out underneath of the running process 
will cause it to crash?

Might be worth testing that on on a VM to see if it will happen.

Best of luck!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08

2018-09-25 Thread Chris Samuel
On Tuesday, 25 September 2018 9:41:10 PM AEST Baker D. J.  wrote:

> I guess that the only solution is to upgrade all the slurm at once. That
> means that the slurmctld will be killed (unless it has been stopped first).

We don't use RPMs from Slurm [1], but the rpm command does have a --noscripts 
option to (allegedly, I've never used it) suppress the execution of pre/post 
install scripts.

A big warning would be do not use systemctl to start the new slurmdbd for the 
first time when upgrading!

Stop the older one first (and then take a database dump) and then run the new 
slurmdbd process with the "-Dvvv" options (inside screen, just in case) so 
that you can watch its progress and systemd won't decide it's been taking too 
long to start and try and kill it part way through the database upgrades).

Once that's completed successfully then you can ^C it and start it up via the 
systemctl once more.

Hope that's useful!

All the best,
Chris

[1] - I've always installed into ${shared_local_area}/slurm/${version} and had 
a symlink called "latest" that points at the currently blessed version of 
Slurm.  Then I stop slurmdbd, upgrade that as above, then I can do slurmctld 
(with partitions marked down, just in case).  Once those are done I can 
restart slurmd's around the cluster.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Renaming a Reservation

2018-09-25 Thread Chris Samuel
On Tuesday, 25 September 2018 1:54:19 PM AEST Kevin Buckley wrote:

> Is there a way to rename a Reservation ?

I've never come across a way to do that, I've just had to delete and recreate.

Sorry Kevin!

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] swap size

2018-09-22 Thread Chris Samuel
On Saturday, 22 September 2018 4:19:09 PM AEST Raymond Wan wrote:

> SLURM's ability to suspend jobs must be storing the state in a
> location outside of this 512 GB.  So, you're not helping this by
> allocating more swap.

I don't believe that's the case.  My understanding is that in this mode it's 
just sending processes SIGSTOP and then launching the incoming job so you 
should really have enough swap for the previous job to get swapped out to in 
order to free up RAM for the incoming job.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Job allocating more CPUs than requested

2018-09-22 Thread Chris Samuel
On Saturday, 22 September 2018 2:35:34 PM AEST Ryan Novosielski wrote:

> We constrain using cgroups, and occasionally someone will request 1
> core (-n1 -c1) and then run something that asks for way more
> cores/threads, or that tries to use the whole machine. They won't
> succeed obviously. Is this any sort of problem?

At that point as all their processes will be contesting a particular core.  
Load average is just about what's trying to run but can't - and in this case 
it doesn't have a relationship to how difficult it can be for other jobs to do 
things because of the core restriction.

I guess it's possible the next level caches might get a work out, but then 
unless you're restricting OS daemon processes to cores that are not used by 
Slurm then you're probably still going to get some amount of cache pollution 
anyway.

All the best!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Job allocating more CPUs than requested

2018-09-21 Thread Chris Samuel
On Saturday, 22 September 2018 2:53:58 AM AEST Nicolas Bock wrote:

> shows as requesting 1 CPU when in queue, but then allocates all
> CPU cores once running. Why is that?

Do you mean that Slurm expands the cores requested to all the cores on the 
node or allocates the node in exclusive mode, or do you mean that the code 
inside the job uses all the cores on the node instead of what was requested?

The latter is often the case for badly behaved codes and that's why using 
cgroups to contain applications is so important.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Dealing with wrong things that users do

2018-09-20 Thread Chris Samuel
On Thursday, 20 September 2018 5:57:56 PM AEST Mahmood Naderan wrote:

> It seems that when their fluent job crashes for some reasons, or they
> decide to  close the fluent window without terminating the job or
> closing the terminal suddenly or ... the fluent processes remain in
> the node while the job is not listed in the output of squeue command.

If you use cgroups to contain jobs along with pam_slurm_adopt to put any SSH 
sessions into the jobs "extern" cgroup then Slurm should be able to track and 
clean up pretty much anything your users can throw at it.

https://slurm.schedmd.com/cgroups.html

https://slurm.schedmd.com/pam_slurm_adopt.html

Best of luck!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Setting up a separate timeout for interactive jobs

2018-09-20 Thread Chris Samuel
On Thursday, 20 September 2018 1:50:39 AM AEST Siddharth Dalmia wrote:

> Is it possible to have a separate timeout for interactive jobs? Or can
> someone help me come up with a hack to do this?

I believe you should be able to catch interactive jobs in the submit filter by 
looking for the absence of a batch script.  Have a look at this bug:

https://bugs.schedmd.com/show_bug.cgi?id=3094

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] External provisioning for accounts and other things (?)

2018-09-19 Thread Chris Samuel
On Wednesday, 19 September 2018 5:00:58 AM AEST David Rhey wrote:

> First time caller, long-time listener. Does anyone use any sort of external
> tool (e.g. a form submission) that generates accounts for their Slurm
> environment (notably for new accounts/allocations)? An example of this
> would be: a group or user needs us to provision resources for them to run
> on and so they submit a form to us with information on their needs and we
> provision for them.

The Karaage cluster management software that was originally written by folks
at ${JOB-2} and which we used with Slurm at ${JOB-1} does all this.  I'm not
sure how actively maintained it is (as we have our own system at ${JOB}), but
it's on Github here:

https://github.com/Karaage-Cluster/karaage/

The Python code that handles the Slurm side of things is here:

https://github.com/Karaage-Cluster/karaage/blob/master/karaage/datastores/slurm.py

Hope that helps!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] email preferences

2018-09-15 Thread Chris Samuel
On Thursday, 13 September 2018 4:24:41 AM AEST Ariel Balter wrote:

> How do I set email preferences for this group?

https://lists.schedmd.com/cgi-bin/mailman/options/slurm-users

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Create users

2018-09-15 Thread Chris Samuel
On Thursday, 13 September 2018 3:10:19 AM AEST Paul Edmon wrote:

> Another way would be to make all your Linux users and then map that in to
> Slurm using sacctmgr.

At ${JOB} and ${JOB-1} we've wired user creation in Slurm into our online user 
management systems (both Django based & independently created), so when people 
are added/modified/deleted then it runs sacctmgr to keep everything in step.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Slurm on POWER9

2018-09-15 Thread Chris Samuel

On 15/9/18 2:45 am, Keith Ball wrote:


So we figured out the problem with "slurmd -C": we had run rpmbuild
on the POWER9 node, but did not have the hwloc-package installed. The
build process looks for this, and if not found, will apparently note
use hwloc/lstopo even if installed post-build.


Correct - autoconf will detect hwloc if the headers & library are
present there at compile time.  It links against it so it *must*
be there when you are compiling in order to use it.

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] Meetup in Madrid before SLUG

2018-09-14 Thread Chris Samuel
On Saturday, 15 September 2018 12:25:18 AM AEST Jessica Nettelblad wrote:

> Anyone else hanging out in Madrid before Slurm User Group meeting? We're a
> bunch of people registered for SLUG who certainly will. Meet with us if you
> want to!

Would love to, but cannot attend for family reasons this year (have to travel 
to the UK in 2 weeks).  Hope you all have a great time!

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Elastic Compute

2018-09-10 Thread Chris Samuel
On Tuesday, 11 September 2018 12:52:27 AM AEST Brian Haymore wrote:

> I believe the default value of this would prevent jobs from sharing a node. 

But the jobs _do_ share a node when the resources become available, it's just 
that the cloud part of Slurm is bringing up the wrong number of nodes compared 
to what it will actually use.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] can't create memory group (cgroup)

2018-09-10 Thread Chris Samuel
On Tuesday, 11 September 2018 2:05:51 AM AEST Mike Cammilleri wrote:

> Just an update: the cgroup.conf file could not be parsed when I added
> ConstrainKmemSpace=no. I guess this option is not compatible with our
> kernel/slurm versions on Ubuntu? Not sure.

I think that'll just be your version of Slurm - works happily on 17.11.x.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Slurm on POWER9

2018-09-10 Thread Chris Samuel
Hi Keith,

On Tuesday, 11 September 2018 7:46:14 AM AEST Keith Ball wrote:

> 1.) Slurm seems to be incapable of recognizing sockets/cores/threads on
> these systems.
[...]
> Anyone know if there is a way to get Slurm to recognize the true topology
> for POWER nodes?

IIIRC Slurm uses hwloc for discovering topology, so "lstopo-no-graphics" might 
give you some insights into whether it's showing you the right config.

I'd be curious to see what "lscpu" and "slurmd -C" say as well.

> 2.) Another concern is the gres.conf. Slurm seems to have trouble taking
> processor ID's that are > "#Sockets". The true processor ID as given by
> nvidia-smi topo -m output will range up to 159, and slurm doesn't like
> this. Are we to use "Cores=" entries in gres.conf, and use the number of
> the physical cores, instead of what nvidia-smi outputs?

Again I *think* Slurm is using hwloc's logical CPU numbering for this, so 
lstopo should help - using a quick snippet on my local PC (HT enabled) here:

  Package L#0 + L3 L#0 (8192KB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
  PU L#0 (P#0)
  PU L#1 (P#4)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
  PU L#2 (P#1)
  PU L#3 (P#5)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
  PU L#4 (P#2)
  PU L#5 (P#6)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
  PU L#6 (P#3)
  PU L#7 (P#7)

you can see that the logical numbering (L#0 and L#1) is done to be contiguous 
compared to how the firmware has enumerated the CPUs.

> 3.) A related gres.conf question: there seems to be no documentation of
> using "CPUs=" instead of "Cores=", yet I have seen several online examples
> using "CPUs=" (and I myself have used it on an x86 system without issue).
> Should one use "Cores" instead of "CPUs", when specifying binding to
> specific GPUs?

I think CPUs= was the older syntax which has been replaced with Cores=.

The gres.conf we use on our HPC cluster uses Cores= quite happily.

Name=gpu Type=p100 File=/dev/nvidia0 Cores=0-17
Name=gpu Type=p100 File=/dev/nvidia1 Cores=18-35

All the best!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] can't create memory group (cgroup)

2018-09-10 Thread Chris Samuel
On Monday, 10 September 2018 4:42:00 PM AEST Janne Blomqvist wrote:

> One workaround is to reboot the node whenever this happens.  Another is
> to set ConstrainKmemSpace=no is cgroup.conf (but AFAICS this option was
> added in slurm 17.02 and is not present in 16.05 that you're using).

Phew, we had to set ConstrainKmemSpace=no to avoid breaking Intel Omnipath so 
looks like we dodged a bullet there.  Nice work tracking it down!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] can't create memory group (cgroup)

2018-09-10 Thread Chris Samuel
On Monday, 10 September 2018 9:39:28 PM AEST Patrick Goetz wrote:

> On 9/8/18 5:11 AM, John Hearns wrote:
>
> > Not an answer to your question - a good diagnostic for cgroups is the
> > utility 'lscgroups'
> 
> Where does one find this utility?

It's in the libcgroup-tools package in RHEL/CentOS and cgroup-tools in Debian.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Configuration issue on Ubuntu

2018-09-05 Thread Chris Samuel
On Wednesday, 5 September 2018 5:48:25 PM AEST Gennaro Oliva wrote:

> It can be possible that Umut installed slurm-wlm-emulator package
> together with the regular package and the emulated daemon was picked by
> the alternatives system.

That sounds eminently possible, that's a great catch Gennaro!

Ah, just noticed you're the Debian package maintainer for Slurm. :-)

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] how can users start their worker daemons using srun?

2018-08-31 Thread Chris Samuel
On Saturday, 1 September 2018 2:33:39 AM AEST Priedhorsky, Reid wrote:

> That is, it exceeds both the CPU count (1) and memory (1KiB) that I told
> Slurm it would use. This is what I want. Is allowing such exceedance a
> common configuration? I don’t want to rely on quirks of our site.

I think you can configure Slurm to do that, but in my experience sites are 
always doing their best to constrain jobs to what they ask for and so we use 
cgroups for this (tasks can only access the cores, memory and GPUs they 
request and the kernel will prevent them accessing anything else).

For your situation using using CR_Core as your SelectTypeParameters basically 
tells Slurm to ignore memory for scheduling.

> The drawback here is that for real daemons, I’ll need “sleep infinity”, so
> I’ll need to manually kill the srun. So, this is still a workaround. The
> ideal behavior would be to have Slurm not clean up processes when the job
> step completes, but instead at the end of the job.

You've got a race condition here though then - the job doesn't complete until 
all the steps are done, and if you've got a step with processes that never end 
then the job will keep running until it hits its time limit (unless, as you 
say, you manually kill that step yourself).

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] [External] Re: serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-31 Thread Chris Samuel
On Friday, 31 August 2018 1:48:33 AM AEST Chaofeng Zhang wrote:

> This result should be CUDA_VISIBLE_DEVICES=NoDevFiles, and it really is
> NoDevFiles in 17.02. So this must be a bug in 17.11.7.

Looking at git it looks like this code got refactored out of the GPU GRES plugin
and in to some common GRES code for 17.11 in this commit:

commit 0e0cdd7d791ee48e5c4a44c307eea0d521ce91d0
Author: Danny Auble 
Date:   Thu Oct 5 15:35:00 2017 -0600

Convert the 3 different arrays used for devices in GRES into a nice 
structure.
Not only that, but also make it so the slurmd sends this information over to
the stepd on init.

This also makes it so GRES of the same name and different types can happen.


If you have a support contract for Slurm I would suggest opening a bug
with them about this change in behaviour, it feels like it's not expected.

However, this will not save you from users setting CUDA_VISIBLE_DEVICES
themselves and accessing GPUs they are not meant to, you really really do
need to use cgroups to stop that happening.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] serious bug about CUDA_VISBLE_DEVICES in the slurm 17.11.7

2018-08-30 Thread Chris Samuel
On Thursday, 30 August 2018 6:38:08 PM AEST Chaofeng Zhang wrote:

> The CUDA_VISBLE_DEVICES can't be set NoDevFiles in Slurm 17.11.7.  This is
> worked when we use Slurm 17.02.

You probably should be using cgroups instead to constrain access to GPUs.  
Then it doesn't matter what you set CUDA_VISBLE_DEVICES to be as processes 
will only be able to access what they requested.

Hope that helps!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Configuration issue on Ubuntu

2018-08-29 Thread Chris Samuel
On Wednesday, 29 August 2018 8:23:43 PM AEST Umut Arus wrote:

> Thank you Chris. After your suggestion I compiled latest stable version on a
> CentOS. And installed Munge packages firstly from Centos repository. Now
> I'm getting the below error.
[...]
> slurmctld: debug3: Trying to load plugin 
> /root/sl/sl2/lib/slurm/crypto_munge.so

To me that looks like you managed to compile Slurm against a
version of Munge installed under root's home directory.

This is unlikely to be what you want.

If you build Slurm as a non-root user then it won't find that.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Configuration issue on Ubuntu

2018-08-28 Thread Chris Samuel
On Tuesday, 28 August 2018 11:43:54 PM AEST Umut Arus wrote:

> It seems the main problem is; slurmctld: fatal: No front end nodes defined

Frontend nodes are for IBM BlueGene and Cray systems where you cannot run 
slurmd on the compute nodes themselves so a proxy system must be used instead 
(at $JOB-1 we used this on our BG/Q system).  I strongly suspect you are not 
running on either of those!

https://slurm.schedmd.com/slurm.conf.html

# These options may only be used on systems configured and built with the
# appropriate parameters (--have-front-end, --enable-bluegene-emulation)
# or a system determined to have the appropriate architecture by the
# configure script (BlueGene or Cray systems).

If you built Slurm yourself you'll need to check you didn't use those 
arguments by mistake or configure didn't enable them in error, and if this is 
an Ubuntu package then it's probably an bug in how they packaged it!

Best of luck,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] how can users start their worker daemons using srun?

2018-08-28 Thread Chris Samuel
On Tuesday, 28 August 2018 10:21:45 AM AEST Chris Samuel wrote:

> That won't happen on a well configured Slurm system as it is Slurm's role to
> clear up any processes from that job left around once that job exits.

Sorry Reid, for some reason I misunderstood your email and the fact you were 
talking about job steps! :-(

One other option in this case is that you can say add 2 cores per node for the 
daemons to the overall job request and then do in your jobs

srun --ntasks-per-node=1 -c 2 ./foo.py &

and ensure that foo.py doesn't exit after the daemons launch (if you are using 
cgroups then those daemons should be contained within the job steps cgroup so 
you should be able to spot their PIDs easily enough).

That then gives you the rest of the cores to play with, so you would launch 
future job steps on n-2 cores per node (you could use the environment 
variables SLURM_CPUS_PER_TASK  & SLURM_NTASKS_PER_NODE to avoid having to hard 
code these for instance).

Of course at the end then your batch script would need to kill off that first 
job step.

Would that help?

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] how can users start their worker daemons using srun?

2018-08-27 Thread Chris Samuel
On Tuesday, 28 August 2018 8:15:55 AM AEST Priedhorsky, Reid wrote:

> I am trying to figure out how to advise users on starting worker daemons in
> their allocations using srun. That is, I want to be able to run “srun foo”,
> where foo starts some child process and then exits, and the child
> process(es) persist and wait for work.

That won't happen on a well configured Slurm system as it is Slurm's role to 
clear up any processes from that job left around once that job exits.   This 
is why cgroups and pam_slurm_adopt are so useful, you can track and kill those 
off far more easily.

If you want processes to stick around you either need to ask for enough time 
in the job and ensure that the script doesn't exit (and thus signal the end of 
the job) until those daemons are done or you will need to find a way outside 
of Slurm to do it.

One possible way for the latter would be to configure something like systemd 
to allow specific users to run daemons as themselves.   Then you could let 
them submit a job where they do:

systemctl start --user mydaemon.service

to start it up (and check it has started successfully before exiting).

There's a bit about how to do this here (which I've just started using for a 
side radio-astronomy project at the observatory I volunteer at):

https://www.brendanlong.com/systemd-user-services-are-amazing.html

Hope this helps!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Slurm Environment Variable for Memory

2018-08-23 Thread Chris Samuel
On Tuesday, 21 August 2018 6:17:59 PM AEST Chris Samuel wrote:

> My apologies - I've just tested here (with Slurm 17.11.7) and you are indeed
> correct, they only appear when launched with sbatch and salloc and not when
> you launch jobs directly with srun!

I think the confusion is because they are *input* environment variables to 
srun, so srun consumes them.  It doesn't create them.  I hadn't noticed this 
until I search back from when I found them in the manual page and saw they 
were listed as:

INPUT ENVIRONMENT VARIABLES
   Some srun options may be set via environment variables.  These environ‐
   ment  variables,  along  with  their  corresponding options, are listed
   below.  Note: Command line options will always override these settings.


Which is why they are set by sbatch/salloc but not directly inside of srun.

All the best!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-08-23 Thread Chris Samuel
On Thursday, 23 August 2018 12:27:59 AM AEST Christian Peter wrote:

> however, another shell started by an SSH login is handled by
> pam_slurm_adopt. that process is only affected by the freezer and
> cpuset cgroups setup as "/slurm/uid_5001/job_410318/step_extern". it
> lacks the configuration of the "memory" cgroup. (see output below)

I don't see that on our CentOS 7.5 system, which distro are you using?

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Slurm Environment Variable for Memory

2018-08-21 Thread Chris Samuel
On Monday, 20 August 2018 9:21:39 PM AEST Juan A. Cordero Varelaq wrote:

> I am just running an interactive job with "srun -I --pty /bin/bash" and
> then run "echo $SLURM_MEM_PER_NODE", but it shows nothing. Does it have
> to be defined in any conf file?

My apologies - I've just tested here (with Slurm 17.11.7) and you are indeed 
correct, they only appear when launched with sbatch and salloc and not when 
you launch jobs directly with srun!

Also you only ever get ${SLURM_MEM_PER_CPU} *or* ${SLURM_MEM_PER_NODE} but not 
both together, so you'll need to check for both.

Hope this helps!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Determine usage for a QOS?

2018-08-20 Thread Chris Samuel
On Tuesday, 21 August 2018 2:28:27 AM AEST Kilian Cavalotti wrote:

> I _think_ that "scontrol show assoc_mgr" could get you close. We're
> not using TRESMins with our QOSes, so it's just a hunch, but I would
> look there, that's the closest I could think of of a representation of
> the various counters and limits the controller keeps in memory.

Awesome, thanks Kilian!

$ scontrol show assoc_mgr QOS=astac_oz045 | fgrep UsageRaw=
UsageRaw=18641632.000000

Looking promising...

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Slurm Environment Variable for Memory

2018-08-20 Thread Chris Samuel
On Monday, 20 August 2018 4:43:57 PM AEST Juan A. Cordero Varelaq wrote:

> That variable does not exist somehow on my environment. Is it possible
> my Slurm version (17.02.3) does not include it?

They should be there, from the NEWS file they were introduced in 2.3.0.rc1.
Is something else nuking your shells environment perhaps?

17.02.11 is the last released version of 17.02.x and all previous versions
have been pulled from the SchedMD website due to CVE-2018-10995.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






[slurm-users] Taking a break from slurm-users

2018-05-12 Thread Chris Samuel
Hey folks,

I'm going to be unsubscribing from slurm-users for a while as I'll be 
travelling to the US & UK for a number of weeks & I don't want to drown in 
email. 

I'll be back...

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] User limits for multiple associated accounts

2018-05-11 Thread Chris Samuel
On Friday, 11 May 2018 11:15:49 PM AEST Mahmood Naderan wrote:

> Excuse me... I see the output of squeue which says
> 170   IACTIVE bash  mahmood PD   0:00  1 (AssocGrpMemLimit)
> 
> I don't understand why the memory limit is reach?

That's based on what your job requests, not what is currently in use.

So the sum of the requested memory of all jobs running in that association 
doesn't leave enough permitted resources free to allow this job to begin.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] How to check if there's a reservation

2018-05-11 Thread Chris Samuel
Hey Prentice,

On Friday, 11 May 2018 6:23:06 AM AEST Prentice Bisbal wrote:

> They would like to have their submission framework automatically
> detect if there's a reservation that may interfere with their jobs, and
> act accordingly.

As an additional data point there is also srun's "--test-only" option which 
will give you an estimate of when a job of particular dimensions  might start. 

So this would let them test whether it is likely to be successfully completed 
within their deadline.

You don't need to tell it what you're going to run either, you can just do:

[csamuel@farnarkle1 ~]$ srun --test-only -t 7-0 -c 32
srun: Job 130288 to start at 2018-05-12T16:43:59 using 32 processors on john2

[csamuel@farnarkle1 ~]$ srun --test-only -t 7-0 -c 32 -n 64
srun: Job 130289 to start at 2018-05-14T10:36:02 using 2048 processors on 
bryan[1-8],john[1-4,6-10,15-31,34-35,37-45,52-53,65-66,72-86]

It does result in a job being allocated which will never appear in your 
accounting though, so you'll need to be prepared for that.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Historical License Usage by Jobs

2018-05-11 Thread Chris Samuel
On Saturday, 12 May 2018 12:47:29 AM AEST Barry Moore wrote:

> This works perfectly, I appreciate the pointer.

Great to hear!  My pleasure.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Issue with salloc

2018-05-11 Thread Chris Samuel
On Saturday, 12 May 2018 3:35:39 PM AEST Mahmood Naderan wrote:

> Although I specified one compute node in an interactive partition, the
> salloc doesn't ssh to that node.

salloc doesn't do that.

We use a 2 line script called "sinteractive" to do this, it's really simple.

#!/bin/bash
exec srun $* --pty -u ${SHELL} -l

That's it..

Hope that helps!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Memory oversubscription and sheduling

2018-05-11 Thread Chris Samuel
Hey Michael!

On Friday, 11 May 2018 1:00:24 AM AEST Michael Jennings wrote:

> I'm surprised to hear that; this is the first time I've ever heard
> that in regards to SLURM.  I'd only ever heard folks complain about
> TORQUE having that issue.

Hmm, you might well be right, I might have done that work before we switched 
to Slurm on x86 (2013 - we always ran Slurm on BG/P and BG/Q).  So yes, it 
could have been because we saw that issue on Torque instead and I assumed it 
would do the same on Slurm.

Still, even in a threaded mode it would end up with threads blocked in the 
dreaded 'D' state if the health-check scripts end up blocking wouldn't it?

Nice to hear NHC can work around that!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Historical License Usage by Jobs

2018-05-11 Thread Chris Samuel
On Friday, 11 May 2018 4:54:32 AM AEST Barry Moore wrote:

> Is it possible to track all jobs which requested a specific license? I am
> using Slurm 16.05.6. I looked through `sacct ... --format=all`, but maybe I
> am missing something.

I don't think licenses are stored in Slurmdbd by default, I think you need to 
add them to the AccountingStorageTRES definition.

https://slurm.schedmd.com/tres.html

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] --uid , --gid option is root only now :'(

2018-05-11 Thread Chris Samuel
On Friday, 11 May 2018 4:48:16 AM AEST Christopher Benjamin Coffey wrote:

> What was the reasoning in making this change? Do people not trust the folks
> in the slurm administrator group to allow this behavior? Seems odd.

The change was here:

https://github.com/SchedMD/slurm/commit/52086a9bc0ff2aefbac468e2ec19d2a8687a9797

Which references this bug:

https://bugs.schedmd.com/show_bug.cgi?id=4101

where Tim wrote:

# although I'm confident the original intent was never to expose this
# to non-root users, as this was done as a compatibility mechanism
# with Moab/Maui which would have been running as root anyways.

So it appears they didn't think it was being used in this way at all.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Python and R installation in a SLURM cluster

2018-05-11 Thread Chris Samuel
On Friday, 11 May 2018 5:11:38 PM AEST John Hearns wrote:

> Eric, my advice would be to definitely learn the Modules system and
> implement modules for your users.

I will echo that, and the suggestion of shared storage (we use our Lustre 
filesystem for that).  I would also suggest looking at a system to help you 
automate building of software packages.   Not only does this help replicate 
builds, but it also gives you access to the community who write the recipes 
for them - and that itself can be very valuable.

We use Easybuild (which also automates the creation of software modules - and 
I would suggest using the Lmod system for that):

https://easybuilders.github.io/easybuild/

But there's also Spack too:

https://spack.io/

As another resource (as we are going off topic from Slurm here), I would 
suggest the Beowulf list as a mailing list that deals with Linux based HPC 
systems of many different scales.  Disclosure: I now caretake the list, but 
it's been going since the 1990s.

http://beowulf.org/

All the best!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] srun --reboot in sbatch

2018-05-10 Thread Chris Samuel
On Monday, 7 May 2018 11:42:03 PM AEST Tueur Volvo wrote:

> why ? can i have srun --reboot in sbatch file ?

It doesn't make sense to reboot the node part way through running your job, 
you're just going to kill the running job.

Instead add this near the top of your batch script:

#SBATCH --reboot

Good luck!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] slurm reboot node with spank plugin

2018-05-10 Thread Chris Samuel
On Wednesday, 9 May 2018 10:17:17 PM AEST Tueur Volvo wrote:

> I currently use a plugin node feature like knl
> but i don't like use node feature because i must write "feature" in
> slurm.conf file

Actually you don't.  The knl_generic plugin does that work for us, it 
populates the features available and what the currently selected config is:

[csamuel@farnarkle2 etc]$ sinfo -o "%10N %15b %f" -p knl
NODELIST   ACTIVE_FEATURES AVAIL_FEATURES
gina[1-4]  cache,quad  cache,hybrid,flat,a2a,snc2,snc4,hemi,quad

None of those features are in our config files.

Hope that helps!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Memory oversubscription and sheduling

2018-05-10 Thread Chris Samuel
On Monday, 7 May 2018 11:58:38 PM AEST Cory Holcomb wrote:

> Thank you, for the reply  I was beginning to wonder if my message was seen.

It's a busy list at times. :-)

> While I understand how batch systems work, if you have a system daemon that
> develops a memory leak and consumes the memory outside of allocation.

Understood.

> Not checking the used memory on the box before dispatch seems like a good
> way to black hole a bunch of jobs.

This is why Slurm has support for healthcheck scripts that can run regularly 
as well as before/after a job is launched.  These can knock nodes offline.  
It's 
documented in the slurm.conf manual page.

For instance there's the LBNL Node Health Check (NHC) system that plugs into 
both Slurm and Torque.

https://slurm.schedmd.com/SUG14/node_health_check.pdf

https://github.com/mej/nhc

At ${JOB-1} we would run our in-house health check from cron and write to a 
file in /dev/shm so that all the actual Slurm health check script would do is 
send that to Slurm (and raise an error if it was missing).This was because 
we used to see health checks block due to issues and so slurmd would lock up 
running them.   Decoupling them fixed that.

Best of luck,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Nodes are down after 2-3 minutes.

2018-05-10 Thread Chris Samuel
On Thursday, 10 May 2018 1:02:36 AM AEST Eric F. Alemany wrote:

> All seem good for now

Great news!

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Splitting mpi rank output

2018-05-10 Thread Chris Samuel
On Thursday, 10 May 2018 2:25:49 AM AEST Christopher Benjamin Coffey wrote:

> I have a user trying to use %t to split the mpi rank outputs into different
> files and it's not working. I verified this too. Any idea why this might
> be? This is the first that I've heard of a user trying to do this.

I think they want to use that as an argument to srun, not sbatch.

I don't know why it doesn't work for sbatch, I'm guessing it doesn't get 
passed on in the environment?  From the look of the srun manual page it 
probably should set SLURM_STDOUTMODE.  But then you'd get both the batch 
output and rank 0 going to the first one.  Seems like a bug to me.

However, I can confirm that it works if you pass it to srun instead.

[csamuel@farnarkle1 tmp]$ cat test-rank.sh
#!/bin/bash
#SBATCH --ntasks=10
#SBATCH --ntasks-per-node=1

srun -o foo-%t.out hostname

[csamuel@farnarkle1 tmp]$ ls -ltr
total 264
-rw-rw-r-- 1 csamuel hpcadmin 89 May 10 17:34 test-rank.sh
-rw-rw-r-- 1 csamuel hpcadmin  0 May 10 17:34 slurm-127420.out
-rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-9.out
-rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-8.out
-rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-7.out
-rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-6.out
-rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-5.out
-rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-4.out
-rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-3.out
-rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-2.out
-rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-1.out
-rw-rw-r-- 1 csamuel hpcadmin  7 May 10 17:34 foo-0.out


[csamuel@farnarkle1 tmp]$ more foo-*
::
foo-0.out
::
john37
::
foo-1.out
::
john38
::
foo-2.out
::
john39
::
foo-3.out
::
john40
::
foo-4.out
::
john41
::
foo-5.out
::
john42
::
foo-6.out
::
john43
::
foo-7.out
::
john44
::
foo-8.out
::
john45
::
foo-9.out
::::::
john46

Hope that helps,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] "Low socket*core*thre" - solution?

2018-05-10 Thread Chris Samuel
On Thursday, 10 May 2018 12:27:29 AM AEST Mahmood Naderan wrote:

> To be honest, I see many commands in the manual that look similar for
> a not professional user. For example, restarting slurmd, slurmctl and
> now scontrol reconfigure and they look confusing. Do you agree with
> that?

The commands for starting/stopping daemons are OS and distro specific, scontrol 
is the Slurm command for interacting with the control daemon slurmctld.

I suspect the hard part is that there are some things that can only be set by 
restarting daemons as they require too deep rewiring of data structures inside 
Slurm (adding nodes is one of these things).

So for those "scontrol reconfigure" is not enough.  However, for the vast 
majority of cases it's the better way to do things.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] slurm reboot node with spank plugin

2018-05-09 Thread Chris Samuel
On Wednesday, 9 May 2018 9:16:37 PM AEST Tueur Volvo wrote:

> if i use srun --reboot hostname, how to tell him to update the kernel before
> rebooting ?

Ah, now I understand why you mention a spank plugin, as that would allow you 
to specify a new command line option for sbatch to specify a kernel and then 
somehow talk to your cluster management system to provision a node with that 
configuration, is that right?

The KNL code might provide some ideas there as that needs to reboot nodes to 
change hardware configuration and has to get slurmd to run code on the node 
first to change the BIOS settings for the desired HBM mode. 

The BlueGene/Q code allowed users to select an image to boot, but that relies 
on a heap of IBM code that is designed for their defunct BG/Q systems (sigh).

> if I understand what you mean, I have to use a job-submit plugin ?

That's what I was thinking, but now you've mentioned what you want to do I'm 
not so sure it's a good fit.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] slurm reboot node with spank plugin

2018-05-09 Thread Chris Samuel
On Wednesday, 9 May 2018 7:09:12 PM AEST Tueur Volvo wrote:

> Hello, i have question, it's possible to reboot slurm node in spank plugin
> before execute job ?

I don't know about that, but sbatch has a --reboot flag and you could use a
submit filter to set it.We do the opposite and always strip it out in our 
Lua
submit filter.

[...]
-- Users are not allowed to reboot nodes, silently clear the flag and 
log what we did.
if submit_uid ~= 0 and
   job_desc.reboot ~= slurm.NO_VAL16 then
job_desc.reboot = slurm.NO_VAL16
slurm.log_info("clear reboot flag for user %d", submit_uid);
end
[...]

You can't log the job ID at that stage of course because it's not been
accepted by slurmctld yet. :-)

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] "Low socket*core*thre" - solution?

2018-05-09 Thread Chris Samuel
On Wednesday, 9 May 2018 6:09:08 PM AEST Werner Saar wrote:

> I think, the problem was:
> the python script
> /opt/rocks/lib/python2.7/site-packages/rocks/commands/sync/slurm/__init__py,
> which is called by the command rocks sync slurm
> did not restart slurmd on the Head-Node.

Depending on the changes it might be enough to do scontrol reconfigure.

But generally it's best for the site to decide what happens when pushing out 
config changes so you may want to make that configurable.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Nodes are down after 2-3 minutes.

2018-05-07 Thread Chris Samuel
On Tuesday, 8 May 2018 9:40:53 AM AEST Eric F. Alemany wrote:

> I followed the link as well as the instruction on “Securing the
> installation” and “Testing the installation”

Great.

> The only thing that i am not able to do is:  Check if a credential can be
> remotely decoded

So one possibility there is that the clocks are out of step between the nodes. 
Usually that's configured via NTP to have a common reference source.

That's pretty standard as if you're running an HPC system with a distributed 
filesystem like GPFS or Lustre then you need the clocks in lockstep for it to 
function properly.

Good luck!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] slurmdbd: mysql/accounting errors on 17.11.6 upgrade

2018-05-07 Thread Chris Samuel
On Tuesday, 8 May 2018 6:19:16 AM AEST Tina Fora wrote:

> slurmdbd: error: mysql_query failed: 1062 Duplicate entry
> '3508-1399520701' for key 'id_job'

That doesn't look good, not sure what to advise there.  Do you have a backup 
of the database from before you started?

If you've got a support contract I would be opening a bug with SchedMD now.

cheers!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Nodes are down after 2-3 minutes.

2018-05-07 Thread Chris Samuel
On Tuesday, 8 May 2018 8:38:47 AM AEST Eric F. Alemany wrote:

> I thought i did but I will do it again

If that doesn't work then check the "Securing the Installation" and "Testing 
the Installation" parts of the munge docs here (ignore the installation part):

https://github.com/dun/munge/wiki/Installation-Guide

Good luck!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Nodes are down after 2-3 minutes.

2018-05-07 Thread Chris Samuel
On Tuesday, 8 May 2018 8:21:46 AM AEST Eric F. Alemany wrote:

> copied the /etc/munge/munge.key from the master to all the nodes.
> Checked the date on master and nodes - OK
> 
> systemctl restart slurmctld  on master
> 
> systemctl restart slurmd on all nodes

Did you restart munged as well?  That's what's reading the key, not Slurm.

Munge is just an external service that Slurm talks to.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] "Low socket*core*thre" - solution?

2018-05-07 Thread Chris Samuel
On Tuesday, 8 May 2018 2:27:07 AM AEST Mahmood Naderan wrote:

> So the trick was to UNDRAIN the node and not RESUME it.

That's strange, because UNDRAIN only does a subset of what RESUME does.

   "UNDRAIN"  clears  the  node from  being  drained  (like  "RESUME"),
   but will not change the node's base state (e.g. "DOWN").

Have you been restarting slurmd after changing the config file in the past?

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




<    1   2   3   4   >