[slurm-users] Re: Upgrade node while jobs running

2024-08-02 Thread Christopher Samuel via slurm-users

G'day Sid,

On 7/31/24 5:02 pm, Sid Young via slurm-users wrote:

I've been waiting for node to become idle before upgrading them however 
some jobs take a long time. If I try to remove all the packages I assume 
that kills the slurmstep program and with it the job.


Are you looking to do a Slurm upgrade, an OS upgrade, or both?

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Can Not Use A Single GPU for Multiple Jobs

2024-06-21 Thread Christopher Samuel via slurm-users

On 6/21/24 3:50 am, Arnuld via slurm-users wrote:

I have 3500+ GPU cores available. You mean each GPU job requires at 
least one CPU? Can't we run a job with just GPU without any CPUs?


No, Slurm has to launch the batch script on compute node cores and it 
then has the job of launching the users application that will run 
something on the node that will access the GPU(s).


Even with srun directly from a login node there's still processes that 
have to run on the compute node and those need at least a core (and some 
may need more, depending on the application).


--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Unsupported RPC version by slurmctld 19.05.3 from client slurmd 22.05.11

2024-06-17 Thread Christopher Samuel via slurm-users

On 6/17/24 7:24 am, Bjørn-Helge Mevik via slurm-users wrote:


Also, server must be newer than client.


This is the major issue for the OP - the version rule is:

slurmdbd >= slurmctld >= slurmd and clients

and no more than the permitted skew in versions.

Plus, of course, you have to deal with config file compatibility issues 
between versions.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Building Slurm debian package vs building from source

2024-05-23 Thread Christopher Samuel via slurm-users

On 5/22/24 3:33 pm, Brian Andrus via slurm-users wrote:


A simple example is when you have nodes with and without GPUs.
You can build slurmd packages without for those nodes and with for the 
ones that have them.


FWIW we have both GPU and non-GPU nodes but we use the same RPMs we 
build on both (they all boot the same SLES15 OS image though).


--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Location of Slurm source packages?

2024-05-15 Thread Christopher Samuel via slurm-users

Hi Jeff!

On 5/15/24 10:35 am, Jeffrey Layton via slurm-users wrote:

I have an Ubuntu 22.04 server where I installed Slurm from the Ubuntu 
packages. I now want to install pyxis but it says I need the Slurm 
sources. In Ubuntu 22.04, is there a package that has the source code? 
How to download the sources I need from github?


You shouldn't need Github, this should give you what you are after 
(especially the "Download slurm-wlm" section at the end):


https://packages.ubuntu.com/source/jammy/slurm-wlm

Hope that helps!

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64

2024-05-06 Thread Christopher Samuel via slurm-users

On 5/6/24 3:19 pm, Nuno Teixeira via slurm-users wrote:


Fixed with:


[...]


Thanks and sorry for the noise as I really missed this detail :)


So glad it helped! Best of luck with this work.

--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64

2024-05-06 Thread Christopher Samuel via slurm-users

On 5/6/24 6:38 am, Nuno Teixeira via slurm-users wrote:


Any clues about "elf_aarch64" and "aarch64elf" mismatch?


As I mentioned I think this is coming from the FreeBSD patching that's 
being done to the upstream Slurm sources, specifically it looks like 
elf_aarch64 is being injected here:


/usr/bin/sed -i.bak -e 's|"/proc|"/compat/linux/proc|g'  -e 
's|(/proc)|(/compat/linux/proc)|g' 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmstepd/req.c
/usr/bin/find 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/api 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/plugins/openapi 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sacctmgr 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/sackd 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scontrol 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrontab 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/scrun 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmctld 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/slurmd/slurmd 
/wrkdirs/usr/ports/sysutils/slurm-wlm/work/slurm-23.11.6/src/squeue 
-name Makefile.in | /usr/bin/xargs	 /usr/bin/sed -i.bak -e 's|-r -o|-r 
-m elf_aarch64 -o|'


So I guess that will need to be fixed to match what FreeBSD supports.

I don't think this is a Slurm issue from what I see there.

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: FreeBSD/aarch64: ld: error: unknown emulation: elf_aarch64

2024-05-04 Thread Christopher Samuel via slurm-users

On 5/4/24 4:24 am, Nuno Teixeira via slurm-users wrote:


Any clues?

 > ld: error: unknown emulation: elf_aarch64


All I can think is that your ld doesn't like elf_aarch64, from the log 
your posting it looks that's being injected from the FreeBSD ports 
system. Looking at the man page for ld on Linux it says:


  -m emulation
   Emulate the emulation linker.  You can list the available 
emulations with the --verbose or -V options.


So I'd guess you'd need to look at what that version of ld supports and 
then update the ports system to match.


Good luck!

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Jobs of a user are stuck in Completing stage for a long time and cannot cancel them

2024-04-11 Thread Christopher Samuel via slurm-users

On 4/10/24 10:41 pm, archisman.pathak--- via slurm-users wrote:


In our case, that node has been removed from the cluster and cannot be
added back right now ( is being used for some other work ). What can we
do in such a case?


Mark the node as "DOWN" in Slurm, this is what we do when we get jobs 
caught in this state (and there's nothing else on the node for our 
shared nodes).


Best of luck!
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Is SWAP memory mandatory for SLURM

2024-03-04 Thread Christopher Samuel via slurm-users

On 3/3/24 23:04, John Joseph via slurm-users wrote:


Is SWAP a mandatory requirement


All our compute nodes are diskless, so no swap on them.

--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: slurm-23.11.3-1 with X11 and zram causing permission errors: error: _forkexec_slurmstepd: slurmstepd failed to send return code got 0: Resource temporarily unavailable; Requeue of Jo

2024-02-23 Thread Christopher Samuel via slurm-users

Hi Robert,

On 2/23/24 17:38, Robert Kudyba via slurm-users wrote:

We switched over from using systemctl for tmp.mount and change to zram, 
e.g.,

modprobe zram
echo 20GB > /sys/block/zram0/disksize
mkfs.xfs /dev/zram0
mount -o discard /dev/zram0 /tmp

[...]
> [2024-02-23T20:26:15.881] [530.extern] error: setup_x11_forward: 
failed to create temporary XAUTHORITY file: Permission denied


Where do you set the permissions on /tmp ?  What do you set them to?

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


Re: [slurm-users] sacct --name --status filtering

2024-01-10 Thread Christopher Samuel

On 1/10/24 19:39, Drucker, Daniel wrote:

What am I misunderstanding about how sacct filtering works here? I would 
have expected the second command to show the exact same results as the 
first.


You need to specify --end NOW for this to work as expected. From the man 
page:


  WITHOUT --jobs AND WITH --state specified:
  --starttime defaults to Now.
  --endtime defaults to --starttime and to Now if --starttime is not 
specified.


Eg

> sacct --starttime $(date -d "7 days ago" +"%Y-%m-%d") -X --format 
JobID,JobName,State,Elapsed  --name bash

JobID   JobName  StateElapsed
 -- -- --
570741 bash  COMPLETED   00:00:02
570742 bash  COMPLETED   00:00:02
570743 bash FAILED   00:00:01
570744 bash FAILED   00:00:01
570745 bash FAILED   00:00:01
570746 bash  COMPLETED   00:00:02
570747 bash  COMPLETED   00:00:02
570748 bash  COMPLETED   00:00:02

> sacct --starttime $(date -d "7 days ago" +"%Y-%m-%d") -X --format 
JobID,JobName,State,Elapsed  --name bash --state COMPLETED

JobID   JobName  StateElapsed
 -- -- --
>

> sacct --starttime $(date -d "7 days ago" +"%Y-%m-%d") -X --format 
JobID,JobName,State,Elapsed  --name bash --state COMPLETED --end now

JobID   JobName  StateElapsed
 -- -- --
570741 bash  COMPLETED   00:00:02
570742 bash  COMPLETED   00:00:02
570746 bash  COMPLETED   00:00:02
570747 bash  COMPLETED   00:00:02
570748 bash  COMPLETED   00:00:02


Hope this helps!
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] parastation (mpi)

2023-11-24 Thread Christopher Samuel

On 11/24/23 06:16, Heckes, Frank wrote:

My colleagues are using this toolchains on Jülich cluster (especially 
Juwels). My question is whether these eb files can be shared ? I would 
be interested especially in the ones using NVHPC as core module.


If Jülich developed that toolchain then I think you'd need to ask them 
whether they are agreeable to sharing them.


Does anyone knows whether the parastation MPI is an active project 
still, because the github doesn’t show so many recent changes?


There's a number of different repos under that umbrella, and whilst 
psmpi does look active it seems the psmgmt one has had more commits 
recently. So it does look active to me.


https://github.com/ParaStation/

All the best.
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] SLURM , maximum scalable instance is which one

2023-11-06 Thread Christopher Samuel

On 10/29/23 03:13, John Joseph wrote:


Like to know that what is the maximum scalled up instance of SLURM so far.


Cori (which we retired mid-year) had ~12,000 compute nodes in case that 
helps.


--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] scontrol reboot does not allow new jobs to be scheduled if nextstate=RESUME is set

2023-10-24 Thread Christopher Samuel

On 10/24/23 12:39, Tim Schneider wrote:

Now my issue is that when I run "scontrol reboot ASAP nextstate=RESUME 
", the node goes in "mix@" state (not drain), but no new jobs get 
scheduled until the node reboots. Essentially I get draining behavior, 
even though the node's state is not "drain". Note that this behavior is 
caused by "nextstate=RESUME"; if I leave that away, jobs get scheduled 
as expected. Does anyone have an idea why that could be?


The intent of the "ASAP` flag for "scontrol reboot" is to not let any 
more jobs onto a node until it has rebooted.


IIRC that was from work we sponsored, the idea being that (for how our 
nodes are managed) we would build new images with the latest software 
stack, test them on a separate test system and then once happy bring 
them over to the production system and do an "scontrol reboot ASAP 
nextstate=resume reason=... $NODES" to ensure that from that point 
onwards no new jobs would start in the old software configuration, only 
the new one.


Also slurmctld would know that these nodes are due to come back in 
"ResumeTimeout" seconds after the reboot is issued and so could plan for 
them as part of scheduling large jobs, rather than thinking there was no 
way it could do so and letting lots of smaller jobs get in the way.


Hope that helps!

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Slurm versions 23.02.6 and 22.05.10 are now available (CVE-2023-41914)

2023-10-16 Thread Christopher Samuel

On 10/16/23 08:22, Groner, Rob wrote:


It is my understanding that it is a different issue than pmix.


That's my understanding too. The PMIx issue wasn't in Slurm, it was in 
the PMIx code that Slurm was linked to. This CVE is for Slurm itself.


--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Fairshare: Penalising unused memory rather than used memory?

2023-10-16 Thread Christopher Samuel

On 10/11/23 07:27, Cristian Huza wrote:

I recall there was a built in tool named seff (slurm efficiency), not 
sure if it is still maintained


"seff" is in the Slurm sources in the contribs/seff directory, if you're 
building RPMs from them then it's in the "slurm-contribs" RPM.


--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Site factor plugin example?

2023-10-16 Thread Christopher Samuel

On 10/13/23 10:10, Angel de Vicente wrote:

But, in any case, I would still be interested in a site factor plugin 
example, because I might revisit this in the future.


I don't know if you saw, but there is a skeleton example in the Slurm 
sources:


src/plugins/site_factor/none

Not sure if that helps?

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Unconfigured GPUs being allocated

2023-08-02 Thread Christopher Samuel

On 7/14/23 1:10 pm, Wilson, Steven M wrote:

It's not so much whether a job may or may not access the GPU but rather 
which GPU(s) is(are) included in $CUDA_VISIBLE_DEVICES. That is what 
controls what our CUDA jobs can see and therefore use (within any 
cgroups constraints, of course). In my case, Slurm is sometimes setting 
$CUDA_VISIBLE_DEVICES to a GPU that is not in the Slurm configuration 
because it is intended only for driving the display and not GPU 
computations.


Sorry I didn't see this before! Yeah that does sound different, I 
wouldn't expect that. :-(


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] slurmdbd database usage

2023-08-02 Thread Christopher Samuel

On 8/2/23 2:30 pm, Sandor wrote:

I am looking to track accounting and job data. Slurm requires the use of 
MySQL or MariaDB. Has anyone created the needed tables within PostGreSQL 
then had slurmdbd write to it? Any problems?


From memory (and confirmed by git) support for Postgres was removed 
from Slurm way back in 2013 before the 14.03 release (the first one 
using dates as version numbers).


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Christopher Samuel

On 7/14/23 10:20 am, Wilson, Steven M wrote:

I upgraded Slurm to 23.02.3 but I'm still running into the same problem. 
Unconfigured GPUs (those absent from gres.conf and slurm.conf) are still 
being made available to jobs so we end up with compute jobs being run on 
GPUs which should only be used


I think this is expected - it's not that Slurm is making them available, 
it's that it's unaware of them and so doesn't control them in the way it 
does for the GPUs it does know about. So you get the default behaviour 
(any process can access them).


If you want to stop them being accessed from Slurm you'd need to find a 
way to prevent that access via cgroups games or similar.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Trying to update from slurm 19.05 to slurm 23.02 but I can't figure out how to allow users to reboot nodes...

2023-06-06 Thread Christopher Samuel

On 6/6/23 1:33 pm, Heinz, Michael wrote:


I've gone through the man pages for slurm.conf but I can't find anything about 
how to define who the admins are? Is there still a way to do this with slurm or 
has the ability been removed?


Looks like that was disabled over 3 years ago.

commit dd111a52bf23d79efcfe9d5688e15cbc768bb22b
Author: Brian Christiansen 
Date:   Fri Jan 31 14:24:40 2020 -0700

Disable sbatch, salloc, srun --reboot for non-admins

Bug 7767

That bug is private it seems.

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Temporary Stop User Submission

2023-05-26 Thread Christopher Samuel

On 5/25/23 4:16 pm, Markuske, William wrote:

I have a badly behaving user that I need to speak with and want to 
temporarily disable their ability to submit jobs. I know I can change 
their account settings to stop them. Is there another way to set a block 
on a specific username that I can lift later without removing the 
user/account associations?


There are many ways to do this, our way is we set their QOS's to one 
called "batchdisable" and that QOS has "MaxJobsPerUser=0" set on it.


One of the benefits of that is it's easy to see everyone who's been 
blocked from submitting jobs.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Usage gathering for GPUs

2023-05-24 Thread Christopher Samuel

On 5/24/23 11:39 am, Fulton, Ben wrote:


Hi,


Hi Ben,

The release notes for 23.02 say “Added usage gathering for gpu/nvml 
(Nvidia) and gpu/rsmi (AMD) plugins”.


How would I go about enabling this?


I can only comment on the nvidia side (as those are the GPUs we have) 
but for that you need Slurm built with NVML support and running with 
"Autodetect=NVML" in gres.conf and then that information is stored in 
slurmdbd as part of the TRES usage data.


For example to grab a job step for a test code I ran the other day:

csamuel@perlmutter:login01:~> sacct -j 9285567.0 -Pno TRESUsageInAve | 
tr , \\n | fgrep gpu

gres/gpumem=493120K
gres/gpuutil=76

Hope that helps!

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] [EXTERNAL] Re: Question about PMIX ERROR messages being emitted by some child of srun process

2023-05-23 Thread Christopher Samuel

On 5/23/23 10:33 am, Pritchard Jr., Howard wrote:


Thanks Christopher,


No worries!


This doesn't seem to be related to Open MPI at all except that for our 5.0.0 
and newer one has to use PMix to talk to the job launcher.
I built MPICH 4.1 on Perlmutter using the --with-pmix option and see a similar 
message from srun --mpi=pmix


That's right, these messages are coming from PMIx code rather than MPI.


I too noticed that if I set PMIX_DEBUG=1 the chatter from srun stops.


Yeah, it looks like setting PMIX_DEBUG to anything (I tried "hello") 
stops these messages from being emitted.


Slurm RPMs with that patch will go on to Perlmutter in the Thursday 
maintenance.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Question about PMIX ERROR messages being emitted by some child of srun process

2023-05-22 Thread Christopher Samuel

Hi Tommi, Howard,

On 5/22/23 12:16 am, Tommi Tervo wrote:


23.02.2 contains PMIx permission regression, it may be worth to check if it's 
case?


I confirmed I could replicate the UNPACK-INADEQUATE-SPACE messages 
Howard is seeing on a test system, so I tried that patch on that same 
system without any change. :-(


Looking at the PMIx code base the messages appear to come from that code 
(the triggers are in src/mca/bfrops/) and I saw I could set 
PMIX_DEBUG=verbose to get more info on the problem, but when I set that 
these messages go away entirely. :-/


Very odd.

--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] From an initial installation cannot start slurmctld with a slurmdbd running

2023-05-17 Thread Christopher Samuel

Hi Lawrence,

On 5/17/23 3:26 pm, Sorrillo, Lawrence wrote:


Here is the error I get:

slurmctld: fatal: Can not recover assoc_usage state, incompatible 
version, got 9728 need >= 8704 <= 9216,


The slurm version is:  20.11.9


That error seems to appear when slurmctld is loading usage data from an 
on-disk cache of data (the "assoc_usage" file) - the function that 
throws that error is called here:


/* Now load the usage from a flat file since it isn't kept in
   the database
*/
load_assoc_usage();

It's telling you that the data file was written with a version of Slurm 
ahead of where it's at.


With my little cheat sheet:

> ./slurmver
SLURM_23_02_PROTOCOL_VERSION = 9984
SLURM_22_05_PROTOCOL_VERSION = 9728
SLURM_21_08_PROTOCOL_VERSION = 9472
SLURM_20_11_PROTOCOL_VERSION = 9216
SLURM_20_02_PROTOCOL_VERSION = 8960
SLURM_19_05_PROTOCOL_VERSION = 8704
SLURM_18_08_PROTOCOL_VERSION = 8448

That tells us the data file was written by Slurm 22.05.x, so my guess is 
that version was tested and the "assoc_usage" file that's being read 
here wasn't cleaned up afterwards.


Hope that helps!

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] PreemptExemptTime

2023-03-07 Thread Christopher Samuel

On 3/7/23 6:46 am, Groner, Rob wrote:

Over global settings are PreemptMode=SUSPEND,GANG and 
PreemptType=preempt/partition_prio.  We have a high priority partition 
that nothing should ever preempt, and an open partition that is always 
preemptable.  In between is a burst partition.  It can be preempted if 
the high priority partition needs the resources.  That's the partition 
we'd like to guarantee a 1 hour run time on.  Looking at the sacctmgr 
man page, it gives this info on QOS


Just a quick comment, here you're talking about both partitions and 
QOS's with respect to preemption, I think for this you need to pick just 
one of those options and only use those configs. For instance we just 
use QOS's for preemption and our exempt time works in that case.


Hope this helps!

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] I just had a "conversation" with ChatGPT about working DMTCP, OpenMPI and SLURM. Here are the results

2023-02-18 Thread Christopher Samuel

On 2/10/23 11:06 am, Analabha Roy wrote:

I'm having some complex issues coordinating OpenMPI, SLURM, and DMTCP in 
my cluster.


If you're looking to try checkpointing MPI applications you may want to 
experiment with the MANA ("MPI-Agnostic, Network-Agnostic MPI") plugin 
for DMTCP here: https://github.com/mpickpt/mana


We (NERSC) are collaborating with the developers and it is installed on 
Cori (our older Cray system) for people to experiment with. The 
documentation for it may be useful to others who'd like to try it out - 
it's got a nice description of how it works too which even I as a 
non-programmer can understand. 
https://docs.nersc.gov/development/checkpoint-restart/mana/


Pay special attention to the caveats in our docs though!

I've not used it myself, though I'm peripherally involved to give advice 
on system related issues.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Slurm - UnkillableStepProgram

2023-01-19 Thread Christopher Samuel

On 1/19/23 5:01 am, Stefan Staeglich wrote:


Hi,


Hiya,


I'm wondering where the UnkillableStepProgram is actually executed. According
to Mike it has to be available on every on the compute nodes. This makes sense
only if it is executed there.


That's right, it's only executed on compute nodes.


But the man page slurm.conf of 21.08.x states:
UnkillableStepProgram
   Must be executable by user SlurmUser.  The file must be
accessible by the primary and backup control machines.

So I would expect it's executed on the controller node.


That's strange, my slurm.conf man page from a system still running 21.08 
says:


UNKILLABLE STEP PROGRAM SCRIPT
   This program can be used to take special actions to clean up
   the unkillable processes and/or notify system administrators.
   The program will be run as SlurmdUser (usually "root") on
   the compute node where UnkillableStepTimeout was triggered.

Ah, I see, there's a later "FILE AND DIRECTORY PERMISSIONS" part which 
has the text that you've found - that part's wrong! :-)


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Interactive jobs using "srun --pty bash" and MPI

2022-11-02 Thread Christopher Samuel

On 11/2/22 4:45 pm, Juergen Salk wrote:


However, instead of using `srun --pty bash´ for launching interactive jobs, it
is now recommended to use `salloc´ and have 
`LaunchParameters=use_interactive_step´
set in slurm.conf.


+1 on that, this is what we've been using since it landed.

--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Prolog and job_submit

2022-10-31 Thread Christopher Samuel

On 10/31/22 5:46 am, Davide DelVento wrote:


Thanks for helping me find workarounds.


No worries!


My only other thought is that you might be able to use node features &
job constraints to communicate this without the user realising.


I am not sure I understand this approach.


I was just trying to think of things that could get into the Prolog that 
runs as root that you could use as a signal to it. Job constraints 
seemed the most reasonable choice.



Are you saying that if the job_submit.lua can't directly add an
environmental variable that the prolog can see, but can add the
constraint which will become an environmental variable that the prolog
can see?


That's correct - the difference being that Slurm, not the user, is in 
control of its presence and the possible values it can have (as it's 
constrained by what you've chosen for the name of the node feature).



Would that work if that feature is available in all nodes?


Yes, that should work just fine I believe.

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Rolling reboot with at most N machines down simultaneously?

2022-08-03 Thread Christopher Samuel

On 8/3/22 8:37 am, Phil Chiu wrote:

Therefore my problem is this - "Reboot all nodes, permitting N nodes to 
be rebooting simultaneously."


I think currently the only way to do that would be to have a script that 
does:


* issue the `scontrol reboot ASAP nextstate=resume [...]` for 3 nodes
* wait for 1 to come back to being online
* issue an `scontrol reboot` for another node
* wait for 1 more to come back
* lather rinse repeat.

This does assume you've got your nodes configured to come back cleanly 
on a reboot with slurmd up and no manual intervention required (which is 
what we do).


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Rate-limiting sbatch and srun

2022-07-19 Thread Christopher Samuel

On 7/18/22 3:45 pm, gphipps wrote:

Everyone so often one of our users accidentally writes a “fork-bomb” 
that submits thousands of sbatch and srun requests per second. It is a 
giant DDOS attack on our scheduler. Is there a way of rate limiting 
these requests before they reach the daemon?


Yes there is, you can use the Slurm cli_filter to do this.

https://slurm.schedmd.com/cli_filter_plugins.html

If you use the lua plugin you can write what you need in that; though of 
course it would need careful thought as you would need somewhere to 
store state on the node (writeable by users), a way of counting the 
frequency of the RPCs and introducing increasing delays (up to a point) 
if it's out of control and then decaying that delay time down when the 
RPCs from that user cease/decrease.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] How do you make --export=NONE the default behavior for our cluster?

2022-06-04 Thread Christopher Samuel

On 6/3/22 11:39 am, Ransom, Geoffrey M. wrote:

Adding “--export=NONE” to the job avoids the problem, but I’m not seeing 
a way to change this default behavior for the whole cluster.


There's an SBATCH_EXPORT environment variable that you could set for 
users to force that (at $JOB-1 we used to do that).


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Christopher Samuel

On 5/29/22 3:09 pm, byron wrote:

  This is the first time I've done an upgrade of slurm and I had been 
hoping to do a rolling upgrade as opposed to waiting for all the jobs to 
finish on all the compute nodes and then switching across but I dont see 
how I can do it with this setup.  Does any one have any expereience of this?


We do rolling upgrades with:

scontrol reboot ASAP nextstate=resume reason="some-useful-reason" 
[list-of-nodes]


But you do need to have RebootProgram defined and an appropriate 
ResumeTimeout set to allow enough time for your node to reboot (and of 
course your system must be configured to boot into a production ready 
state when rebooted, including starting up slurmd).


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] upgrading slurm to 20.11

2022-05-17 Thread Christopher Samuel

On 5/17/22 12:00 pm, Paul Edmon wrote:

Database upgrades can also take a while if your database is large. 
Definitely recommend backing up prior to upgrade as well as running 
slurmdbd -Dv and not the systemd daemon as if the upgrade takes a 
long time it will kill it preemptively due to unresponsiveness which 
will create all sorts of problems.


+lots to this - it's our SOP when doing upgrades as it takes hours to do 
so on a busy system that's been around for a while.


We also take routine backups and then when we're looking to do an 
upgrade I'll use one of those on a test system to see how it goes.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] SLURM: reconfig

2022-05-05 Thread Christopher Samuel

On 5/5/22 7:08 am, Mark Dixon wrote:

I'm confused how this is supposed to be achieved in a configless 
setting, as slurmctld isn't running to distribute the updated files to 
slurmd.


That's exactly what happens with configless mode, slurmd's retrieve 
their config from the slurmctld, and will grab it again on an "scontrol 
reconfigure". There's no reason to stop slurmctld for this.


So your slurm.conf should only exist on the slurmctld node - this is how 
we operate on our latest system.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] SLURM: reconfig

2022-05-05 Thread Christopher Samuel

On 5/5/22 5:17 am, Steven Varga wrote:

Thank you for the quick reply! I know I am pushing my luck here: is it 
possible to modify slurm: src/common/[read_conf.c, node_conf.c] 
src/slurmctld/[read_config.c, ...] such that the state can be maintained 
dynamically? -- or cheaper to write a job manager with less features but 
supporting dynamic nodes from ground up?


I had said currently, because it looks like you will be in luck with the 
next release (though it sounds like it needs a little config):


From https://github.com/SchedMD/slurm/blob/master/RELEASE_NOTES:

 -- Allow nodes to be dynamically added and removed from the system. 
Configure
MaxNodeCount to accomodate nodes created with dynamic node 
registrations

(slurmd -Z --conf="") and scontrol.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] SLURM: reconfig

2022-05-04 Thread Christopher Samuel

On 5/4/22 7:26 pm, Steven Varga wrote:

I am wondering what is the best way to update node changes, such as 
addition and removal of nodes to SLURM. The excerpts below suggest a 
full restart, can someone confirm this?


You are correct, you need to restart slurmctld and slurmd daemons at 
present.  See https://slurm.schedmd.com/faq.html#add_nodes


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] sbatch - accept jobs above limits

2022-02-09 Thread Christopher Samuel

On 2/8/22 11:41 pm, Alexander Block wrote:

I'm just discussing a familiar case with SchedMD right now (ticket 
13309). But it seems that it is not possible with Slurm to submit jobs 
that request features/configuration that are not available at the moment 
of submission.


Does --hold not allow that for you?

--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] sbatch - accept jobs above limits

2022-02-08 Thread Christopher Samuel

On 2/8/22 2:26 pm, z1...@arcor.de wrote:


These jobs should be accepted, if a suitable node will be active soon.
For example, these jobs could be in PartitionConfig.


From memory if you submit jobs with the `--hold` option then you should 
find they are successfully accepted - I've used that in the past (and 
just checked that it still works with 20.11.8, assuming nobody has snuck 
a node with 2TB of RAM in whilst I wasn't looking).


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Stopping new jobs but letting old ones end

2022-01-31 Thread Christopher Samuel

On 1/31/22 9:25 pm, Brian Andrus wrote:


touch /etc/nologin

That will prevent new logins.


It's also useful that if you put a message in /etc/nologin then users 
who are trying to login will get that message before being denied.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Stopping new jobs but letting old ones end

2022-01-31 Thread Christopher Samuel

On 1/31/22 9:00 pm, Christopher Samuel wrote:


That would basically be the way


Thinking further on this a better way would be to mark your partitions 
down, as it's likely you've got fewer partitions than compute nodes.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Stopping new jobs but letting old ones end

2022-01-31 Thread Christopher Samuel

On 1/31/22 4:41 pm, Sid Young wrote:

I need to replace a faulty DIMM chim in our login node so I need to stop 
new jobs being kicked off while letting the old ones end.


I thought I would just set all nodes to drain to stop new jobs from 
being kicked off...


That would basically be the way, but is there any reason why compute 
jobs shouldn't start whilst the login node is down?


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Questions about scontrol reconfigure / reconfig

2022-01-16 Thread Christopher Samuel

On 1/16/22 7:41 pm, Nicolas Greneche wrote:


I add a new compute node in config file so, Nodename becomes :


When adding a node you need to restart slurmctld and all the slurmd's as 
they (currently) can only rebuild their internal structures for this at 
that time. This is meant to be addressed in a future major Slurm release 
(can't remember which one sorry).


https://slurm.schedmd.com/faq.html#add_nodes

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Error " slurm_receive_msg_and_forward: Zero Bytes were transmitted or received"

2021-12-01 Thread Christopher Samuel

On 12/1/21 5:51 am, Gestió Servidors wrote:

I can’t syncronize before with “ntpdate” because when I run “ntpdate -s 
my_NTP_server”, I only received message “ntpdate: no server suitable for 
synchronization found”…


Yeah, you'll need to make sure your NTP infrastructure is working first. 
There is useful information (including NTP background info) here:


https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/ch-configuring_ntp_using_ntpd

and for chronyd (rather than ntpd):

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/ch-configuring_ntp_using_the_chrony_suite

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] random allocation of resources

2021-12-01 Thread Christopher Samuel

On 12/1/21 3:27 pm, Brian Andrus wrote:

If you truly want something like this, you could have a wrapper script 
look at available nodes, pick a random one and set the job to use that 
node.


Alternatively you could have a cron job that adjusted nodes `weight` 
periodically to change which ones Slurm will prefer to use over time 
(everything else being equal Slurm picks nodes with the lowest weight).


Hope this helps!
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Job Preemption Time

2021-11-22 Thread Christopher Samuel

On 11/22/21 8:28 pm, Jeherul Islam wrote:

Is there any way to configure slurm, that the High Priority job waits 
for a certain amount of time(say 24 hours), before it preempts the other 
job?


Not quite, but you can set PreemptExemptTime which says how long a job 
must have run for before it can be considered eligible for preemption.


In other words if that's set to 1 hour and there's a low priority job
that was submitted 55 minutes ago and a new high priority job comes
along it won't be able to preempt it for another 5 minutes.

You can set it on a QOS for instance so that different QOS's can
have different minimum times.

https://slurm.schedmd.com/qos.html#preemption

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Can't use cgroups on debian 11 : unable to get parameter 'tasks' for '/sys/fs/cgroup/cpuset/'

2021-11-16 Thread Christopher Samuel

On 11/16/21 8:04 am, Arthur Toussaint wrote:

I've seen people having those kind of problems, but no one seem to be 
able to solve it and keep the cgroups


Debian Bullseye switched to cgroups v2 by default which Slurm doesn't 
support yet, you'll need to switch back to the v1 cgroups. The release 
notes have info on how to do this here:


https://www.debian.org/releases/stable/amd64/release-notes/ch-information.en.html#openstack-cgroups

Short version is it says you need to add this to the kernel boot params:

systemd.unified_cgroup_hierarchy=false 
systemd.legacy_systemd_cgroup_controller=false


The name of that second one looks a bit misleading, it's described in 
the systemd man page as:


https://manpages.debian.org/testing/systemd/systemd.1.en.html

> Takes effect if the full unified cgroup hierarchy is not used (see 
previous option). When specified without an argument or with a true 
argument, disables the use of "hybrid" cgroup hierarchy (i.e. a 
cgroups-v2 tree used for systemd, and legacy cgroup hierarchy[10], 
a.k.a. cgroups-v1, for other controllers), and forces a full "legacy" 
mode. When specified with a false argument, enables the use of "hybrid" 
hierarchy.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Unable to start slurmd service

2021-11-16 Thread Christopher Samuel

On 11/16/21 7:07 am, Jaep Emmanuel wrote:

> root@ecpsc10:~# scontrol show node ecpsc10
[...]
>State=DOWN ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
[...]

    Reason=Node unexpectedly rebooted [slurm@2021-11-16T14:41:04]


This is why the node isn't considered available, as others have already
noted you will need to resume the node.

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] draining nodes due to failed killing of task?

2021-08-08 Thread Christopher Samuel

On 8/7/21 11:47 pm, Adrian Sevcenco wrote:

yes, the jobs that are running have a part of file saving if they are 
killed,

saving which depending of the target can get stuck ...
i have to think for a way to take a processes snapshot when this happens ..


Slurm does let you request a signal a certain amount of time before the 
job is due to end, you could make your job use that to do the checkpoint 
in advance of the end of the job so you don't hit this problem.


Look at the --signal option in "man sbatch".

Best of luck!
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Users Logout when job die or complete

2021-07-10 Thread Christopher Samuel

Hi Andrea,

On 7/9/21 3:50 am, Andrea Carotti wrote:


ProctrackType=proctrack/pgid


I suspect this is the cause of your problems, my bet is that it is 
incorrectly identifying the users login processes as being part of the 
job and thinking it needs to tidy them up in addition to any processes 
left over from the job. It also seems to be more for BSD systems than Linux.


At the very least you'd want:

ProctrackType=proctrack/linuxproc

Though I'd strongly suggest looking at cgroups for this, see:

https://slurm.schedmd.com/slurm.conf.html#OPT_ProctrackType

and:

https://slurm.schedmd.com/cgroups.html

Best of luck!
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] How to avoid a feature?

2021-07-02 Thread Christopher Samuel

On 7/1/21 7:08 am, Brian Andrus wrote:


I have a partition where one of the nodes has a node-locked license.
That license is not used by everyone that uses the partition.


This might be a case for using a reservation on that node with the 
MaxStartDelay flag to set the maximum amount of time (in minutes) that 
jobs that need to run in the reservation are willing to wait for a job 
on the node to clean up and exit.


The candidate jobs need to use the --signal flag with the R option to 
specify how many seconds of warning they would need to clean up before 
being preempted.


If the amount of time they say they need is less than the MaxStartDelay 
then they are candidates to run on those nodes _outside_ of the 
reservation, and when the actual work comes along they will get told to 
get out of the way and, if they fail to, they'll get killed.


I presume people have to request a license in Slurm to get sent to that 
node so you could automatically add that reservation to jobs that 
request the license.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Exposing only requested CPUs to a job on a given node.

2021-07-01 Thread Christopher Samuel

On 7/1/21 3:26 pm, Sid Young wrote:

I have exactly the same issue with a user who needs the reported cores 
to reflect the requested cores. If you find a solution that works please 
share. :)


The number of CPUs in teh system vs the number of CPUs you can access 
are very different things. You can use the "nproc" command to find the 
number of CPUs you can access.


From a software side of things this is why libraries like "hwloc" 
exist, so you can determine what is accessible in a portable way.


https://www.open-mpi.org/projects/hwloc/

It live on the Open-MPI website, but it doesn't use Open-MPI (Open-MPI 
uses it).


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Specify a gpu ID

2021-06-04 Thread Christopher Samuel

On 6/4/21 11:04 am, Ahmad Khalifa wrote:


Because there are failing GPUs that I'm trying to avoid.


Could you remove them from your gres.conf and adjust slurm.conf to match?

If you're using cgroups enforcement for devices (ConstrainDevices=yes in 
cgroup.conf) then that should render them inaccessible to jobs.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] DMTCP or MANA with Slurm?

2021-05-28 Thread Christopher Samuel

On 5/27/21 12:26 pm, Prentice Bisbal wrote:

Given the lack of traffic on the mailing list and lack of releases, I'm 
beginning to think that both of these project are all but abandoned.


They're definitely actively working on it - I've given them a heads up 
on this to let them know how it's being perceived. Thanks for mentioning it!


All the best!
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Drain node from TaskProlog / TaskEpilog

2021-05-24 Thread Christopher Samuel

On 5/24/21 3:02 am, Mark Dixon wrote:

Does anyone have advice on automatically draining a node in this 
situation, please?


We do some health checks via a node epilog set with the "Epilog" 
setting, including queueing node reboots with "scontrol reboot".


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] inconsistent CUDA_VISIBLE_DEVICES with srun vs sbatch

2021-05-20 Thread Christopher Samuel

On 5/19/21 1:41 pm, Tim Carlson wrote:

but I still don't understand how with "shared=exclusive" srun gives one 
result and sbatch gives another.


I can't either, but I can reproduce it with Slurm 20.11.7. :-/

--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] nodes going to down* and getting stuck in that state

2021-05-20 Thread Christopher Samuel

On 5/19/21 9:15 pm, Herc Silverstein wrote:


Does anyone have an idea of what might be going on?


To add to the other suggestions, I would say that checking the slurmctld 
and slurmd logs to see what it is saying is wrong is a good place to start.


Best of luck,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Determining Cluster Usage Rate

2021-05-14 Thread Christopher Samuel

On 5/14/21 1:45 am, Diego Zuccato wrote:


Usage reported in Percentage of Total
 

   Cluster  TRES Name    Allocated Down PLND Dow    Idle 
Reserved Reported
- --    --- 
 
   oph    cpu   81.93%    0.00%    0.00%  15.85% 
2.22%  100.00%
   oph    mem   80.60%    0.00%    0.00%  19.40% 
0.00%  100.00%


The "Reserved" column is the one you're interested in, it's indicating 
that for the 13th some jobs were waiting for CPUs, not memory.


You can look at a longer reporting period by specifying a start date,
something like:

sreport -t percent -T cpu,mem cluster utilization start=2021-01-01

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Determining Cluster Usage Rate

2021-05-14 Thread Christopher Samuel

On 5/14/21 1:45 am, Diego Zuccato wrote:


It just doesn't recognize 'ALL'. It works if I specify the resources.


That's odd, what does this say?

sreport --version

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Determining Cluster Usage Rate

2021-05-14 Thread Christopher Samuel

On 5/13/21 3:08 pm, Sid Young wrote:


Hi All,


Hiya,

Is there a way to define an effective "usage rate" of a HPC Cluster 
using the data captured in the slurm database.


Primarily I want to see if it can be helpful in presenting to the 
business a case for buying more hardware for the HPC  :)


I have a memory that it's possible to use "sreport" to show you what 
amount of time jobs were waiting for what TRES - in other words whether 
they were waiting for CPUs, memory, GPUs, etc (or some combination).


Ah here you go..

sreport -t percent -T ALL cluster utilization

That breaks things down by all the trackable resources on your system.

Hope that helps!
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Grid engine slaughtering parallel jobs when any one of them fails (copy)

2021-04-19 Thread Christopher Samuel

Hi Robert,

On 4/16/21 12:39 pm, Robert Peck wrote:

Please can anyone suggest how to instruct SLURM not to massacre ALL my 
jobs because ONE (or a few) node(s) fails?


You will also probably want this for your srun: --kill-on-bad-exit=0

What does the scontrol command below show?

scontrol show config | fgrep KillOnBadExit

From the manual page:

   -K, --kill-on-bad-exit[=0|1]
  Controls whether or not to terminate a step if any task
  exits with a non-zero exit code. If this option is not
  specified, the default action will  be  based  upon
  the  Slurm  configuration parameter of KillOnBadExit.
  If this option is specified, it will take precedence over
  KillOnBadExit. An option argument of zero will not
  terminate the job. A non-zero argument or no argument
  will terminate the job.  Note: This option takes
  precedence over the -W, --wait option to terminate the
  job immediately  if  a  task  exits with a non-zero exit
  code.  Since this option's argument is optional, for
  proper parsing the single letter option must be followed
  immediately with the value and not include a space between
  them. For example "-K1" and not "-K 1".


Best of luck,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] PartitionName default

2021-04-07 Thread Christopher Samuel
On 4/7/21 11:48 am, Administração de Sistemas do Centro de 
Bioinformática wrote:


Unfortunately, I still don't know how to use any other value to 
PartitionName.


We've got about 20 different partitions on our large Cray system, with a 
variety of names (our submit filter system directs jobs to the right 
location based on what the user requests and has access to):


cat /etc/slurm/slurm.conf | awk '/^PartitionName/ {print $1}'
PartitionName=system
PartitionName=system_shared
PartitionName=debug_hsw
PartitionName=debug_knl
PartitionName=jupyter
PartitionName=regular_hsw
PartitionName=regular_knl
PartitionName=regularx_hsw
PartitionName=regularx_knl
PartitionName=resv
PartitionName=resv_shared
PartitionName=benchmark
PartitionName=realtime_shared
PartitionName=realtime
PartitionName=shared
PartitionName=interactive
PartitionName=genepool
PartitionName=genepool_shared
PartitionName=genepool_resv
PartitionName=genepool_resv_shared

I've not had issues with naming partitions in the past, though I can 
imagine `default` could cause confusion as there is a `default=yes` 
setting you can put on the one partition you want as the default choice.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Rate Limiting of RPC calls

2021-02-09 Thread Christopher Samuel

On 2/9/21 5:08 pm, Paul Edmon wrote:

1. Being on the latest release: A lot of work has gone into improving 
RPC throughput, if you aren't running the latest 20.11 release I highly 
recommend upgrading.  20.02 also was pretty good at this.


We've not gone to 20.11 on production systems yet, but I can vouch for 
20.02 being far better than previous versions for scheduling performance.


We also use the cli_filter lua plugin to write our own RPC limiting 
mechanism using a local directory for per-user files. The big advantage 
of this is that it does the rate limiting client side and so they don't 
get sent to the slurmctld in the first place.  Yes, it is theoretically 
possible for users to discover and work around this, but the intent here 
is to catch accidental/naive use rather than anything malicious.


Also getting users to use `sacct` rather than `squeue` to check what 
state a job is in can help a lot too, it reduces the load on slurmctld.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] only 1 job running

2021-01-28 Thread Christopher Samuel

On 1/27/21 9:28 pm, Chandler wrote:

Hi list, we have a new cluster setup with Bright cluster manager. 
Looking into a support contract there, but trying to get community 
support in the mean time.  I'm sure things were working when the cluster 
was delivered, but I provisioned an additional node and now the 
scheduler isn't quite working right.


Did you restart the slurm daemons when you added the new node?  Some 
internal data structures (bitmaps) are build based on the number of 
nodes and they need to be rebuild with a restart in this situation.


https://slurm.schedmd.com/faq.html#add_nodes

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-26 Thread Christopher Samuel

On 1/26/21 12:10 pm, Ole Holm Nielsen wrote:

What I don't understand is, is it actually *required* to make the NVIDIA 
libraries available to Slurm?  I didn't do that, and I'm not aware of 
any problems with our GPU nodes so far.  Of course, our GPU nodes have 
the libraries installed and the /dev/nvidia? devices are present.


You only need it if you want to use NVML autodetection of GPUs, we don't 
have any nvidia software in the OS image we use to build our vast array 
of RPMs and they work just fine on our GPU nodes.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Defining an empty partition

2021-01-05 Thread Christopher Samuel

On 12/18/20 4:45 am, Tina Friedrich wrote:

Yeah, I had that problem as well (trying to set up a partition that 
didn't have any nodes - they're not here yet).


You can define nodes in Slurm that don't exist yet with State=FUTURE, 
that means slurmctld basically ignores them until you change that state 
setting (either with scontrol or updating your config).


I've used that before, and in fact added some nodes in that state 
yesterday on one of our test HPCs.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Scripts run slower in slurm?

2020-12-15 Thread Christopher Samuel

On 12/14/20 11:20 pm, Alpha Experiment wrote:


It is called using the following submission script:
#!/bin/bash
#SBATCH --partition=full
#SBATCH --job-name="Large"
source testenv1/bin/activate
python3 multithread_example.py


You're not asking for a number of cores, so you'll likely only be 
getting a single core to use here.  You'll likely need something like:


#SBATCH -c 64

for it to get access to more cores.

Also in your config I noticed:

NodeName=localhost

I'd suggest you use the actual name for your compute nodes, I don't 
think that's going to work out too well with more than 1 node. :-)


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Trouble installing slurm-20.02.4-1.amzn2.x86_64 libnvidia-ml.so.1

2020-12-04 Thread Christopher Samuel

Hi Drew,

On 12/4/20 11:32 am, Mullen, Drew wrote:


Error: Package: slurm-20.02.4-1.amzn2.x86_64 (/slurm-20.02.4-1.amzn2.x86_64)

    Requires: libnvidia-ml.so.1()(64bit



That looks like it's fixed in 20.02.5 (the current release is 20.02.6):

--
commit 1be5492c274e170451ed18763e7eeea826f57cb7
Author: Tim McMullan 
Date:   Tue Aug 11 11:32:26 2020 -0400

slurm.spec - don't depend on libnvidia-ml to allow manual cuda installs

Bug 9525
--

Hope this helps!

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] update_node / reason set to: slurm.conf / state set to DRAINED

2020-11-05 Thread Christopher Samuel

Hi Kevin,

On 11/4/20 6:00 pm, Kevin Buckley wrote:


In looking at the SlurmCtlD log we see pairs of lines as follows

  update_node: node nid00245 reason set to: slurm.conf
  update_node: node nid00245 state set to DRAINED


I'd go looking in your healthcheck scripts, I took a quick look at the 
source last night and couldn't see anything that looked related, and 
it's not a message I remember seeing before.


Also take a look in the slurmd logs on the node for that time, to see if 
there's anything that correlates there.


Good luck!
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Slurm Upgrade

2020-11-04 Thread Christopher Samuel

Hi Navin,

On 11/4/20 10:14 pm, navin srivastava wrote:

I have already built a new server slurm 20.2 with the latest DB. my 
question is,  shall i do a mysqldump into this server from existing 
server running with version slurm version 17.11.8


This won't work - you must upgrade your 17.11 database to 19.05.x first, 
then you can upgrade from 19.05.x to 20.02.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Nodes not returning from DRAINING

2020-10-28 Thread Christopher Samuel

On 10/28/20 6:27 am, Diego Zuccato wrote:


Strangely the core file seems corrupted (maybe because it's from a
4-nodes job and they all try to write to the same file?):


You can set a pattern for core file names to prevent that, usually the 
PID is in the name, but you can put the hostname in there too.


https://man7.org/linux/man-pages/man5/core.5.html

See the section: "Naming of core dump files"

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-23 Thread Christopher Samuel

Hi Paul,

On 10/23/20 10:13 am, Paul Raines wrote:


Any clues as to why pam_slurm_adopt thinks there is no job?


Do you have PrologFlags=Contain in your slurm.conf?

Contain
At job allocation time, use the ProcTrack plugin to create a job
container on all allocated compute nodes. This container may be
used for user processes not launched under Slurm control, for
example pam_slurm_adopt may place processes launched through
a direct user login into this container. If using pam_slurm_adopt,
then ProcTrackType must be set to either proctrack/cgroup or
proctrack/cray_aries. Setting the Contain implicitly sets the
Alloc flag.

Hope that helps!
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] SLES 15 rpmbuild from 20.02.5 tarball wants munge-libs: system munge RPMs don't provide it

2020-10-22 Thread Christopher Samuel

On 10/21/20 6:32 pm, Kevin Buckley wrote:


If you install SLES 15 SP1 from the Q2 ISOs so that you have Munge but
not the Slurm 18 that comes on the media, and then try to "rpmbuild -ta"
against a vanilla Slurm 20.02.5 tarball, you should get the error I did.


Ah, yes, that looks like it was a packaging bug fixed in subsequent updates!

# for i in libmunge2-0.5.*.rpm; do echo $i; rpm --provides -qp $i | 
fgrep munge-libs; done

libmunge2-0.5.13-4.3.1.x86_64.rpm
libmunge2-0.5.13-4.6.1.x86_64.rpm
munge-libs = 0.5.13
libmunge2-0.5.14-4.9.1.x86_64.rpm
munge-libs = 0.5.14

Well spotted - so applying the SLES updates before trying to build Slurm 
should fix that. The reason I got confused is the systems I've got 
access to already have those updates in place.


I suspect that also explains "Bug 6752 - Missing munge-libs dependency 
on SLES 15" that HPE opened and a separate bug of ours was marked as a 
duplicate of.  Thanks!


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] [External] Limit usage outside reservation

2020-10-22 Thread Christopher Samuel

On 10/22/20 12:20 pm, Burian, John wrote:


This doesn' t help you now, but Slurm 20.11 is expected to have "magnetic 
reservations," which are reservations that will adopt jobs that don't specify a 
reservation but otherwise meet the restrictions of the reservation:


Magnetic reservations are in 20.02 already.

https://slurm.schedmd.com/reservations.html

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] SLES 15 rpmbuild from 20.02.5 tarball wants munge-libs: system munge RPMs don't provide it

2020-10-20 Thread Christopher Samuel

On 10/20/20 12:49 am, Kevin Buckley wrote:


only have, as listed before, Munge 0.5.13.


I guess the question is (going back to your initial post):

> error: Failed build dependencies:
>munge-libs is needed by slurm-20.02.5-1.x86_64

Had you installed libmunge2 before trying this build?

rpmbuild can't install it for you if you've not already got it in place.

It should work once installed - assuming yours also shows:

# fgrep PRETTY /etc/os-release
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP1"
# rpm -q libmunge2 --provides | tail -n1
munge-libs = 0.5.14


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Christopher Samuel

Hi Sajesh,

On 10/8/20 4:18 pm, Sajesh Singh wrote:


  Thank you for the tip. That works as expected.


No worries, glad it's useful. Do be aware that the core bindings for the 
GPUs would likely need to be adjusted for your hardware!


Best of luck,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Christopher Samuel

On 10/8/20 3:48 pm, Sajesh Singh wrote:


   Thank you. Looks like the fix is indeed the missing file 
/etc/slurm/cgroup_allowed_devices_file.conf


No, you don't want that, that will allow all access to GPUs whether 
people have requested them or not.


What you want is in gres.conf and looks like (hopefully not line wrapped!):

NodeName=nodes[01-18] Name=gpu Type=v100 File=/dev/nvidia0 Cores=0,2,4,6,8
NodeName=nodes[01-18] Name=gpu Type=v100 File=/dev/nvidia1 
Cores=10,12,14,16,18
NodeName=nodes[01-18] Name=gpu Type=v100 File=/dev/nvidia2 
Cores=20,22,24,26,28
NodeName=nodes[01-18] Name=gpu Type=v100 File=/dev/nvidia3 
Cores=30,32,34,36,38


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Christopher Samuel

Hi Sajesh,

On 10/8/20 11:57 am, Sajesh Singh wrote:

debug:  common_gres_set_env: unable to set env vars, no device files 
configured


I suspect the clue is here - what does your gres.conf look like?
Does it list the devices in /dev for the GPUs?

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Current status of checkpointing

2020-08-14 Thread Christopher Samuel

On 8/14/20 6:17 am, Stefan Staeglich wrote:


what's the current status of the checkpointing support in SLURM?


There isn't any these days, there used to be support for BLCR but that's 
been dropped as BLCR is no more.


I know from talking with SchedMD they are of the opinion that any 
current checkpoint/resume code (such as DMTCP [1]) should be supported 
via the users batch script and not in Slurm itself.


All the best,
Chris

[1] - https://github.com/dmtcp/dmtcp

--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Reservation vs. Draining for Maintenance?

2020-08-06 Thread Christopher Samuel

On 8/6/20 10:13 am, Jason Simms wrote:

Later this month, I will have to bring down, patch, and reboot all nodes 
in our cluster for maintenance. The two options available to set nodes 
into a maintenance mode seem to be either: 1) creating a system-wide 
reservation, or 2) setting all nodes into a DRAIN state.


We use both. :-)

So for cases where we need to do a system wide outage for some reason we 
will put reservations on in advance to ensure the system is drained for 
the maintenance.


But for rolling upgrades we will build a new image, set nodes to use it 
and then do something like:


scontrol reboot ASAP nextstate=resume reason="Rolling upgrade" [nodes]

That will allow running jobs to complete, drain all the nodes and when 
idle they'll reboot into the new image and resume themselves once 
they're back up and slurmd has started and checked in.


We use the same mechanism when we need to reboot nodes for other 
maintenance activities, say when huge pages are too fragmented and the 
only way to reclaim them is to reboot the node (these checks happen in 
the node epilog).


We paid for enhancements to Slurm 18.08 to ensure that slurmctld took 
these nodes states into account when scheduling jobs so that large jobs 
(as in requiring most of the nodes in the system) do not lose their 
scheduling window when a node has to be rebooted for this reason.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] cgroup limits not created for jobs

2020-07-26 Thread Christopher Samuel

On 7/26/20 12:21 pm, Paul Raines wrote:


Thank you so much.  This also explains my GPU CUDA_VISIBLE_DEVICES missing
problem in my previous post.


I've missed that, but yes, that would do it.


As a new SLURM admin, I am a bit suprised at this default behavior.
Seems like a way for users to game the system by never running srun.


This is because by default salloc only requests a job allocation, it 
expects you to use srun to run an application on a compute node. But 
yes, it is non-obvious (as evidenced by the number of "sinteractive" and 
other scripts out there that folks have written not realising about the 
SallocDefaultCommand config option - I wrote one back in 2013!).



The only limit I suppose that is being really enforced at that point
is walltime?


Well the user isn't on the compute node so there's nothing really else 
to enforce.


I guess I need to research srun and SallocDefaultCommand more, but is 
there some way to set some kind of separate walltime limit on a

job for the time a salloc has to run srun?  It is not clear if one
can make a SallocDefaultCommand that does "srun ..." that really
covers all possibilities.


An srun inside of a salloc (just like an sbatch) should not be able to 
exceed the time limit for the job allocation.


If it helps this is the SallocDefaultCommand we use for our GPU nodes:

srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 -G 0 --gpus-per-task=0 
--gpus-per-node=0 --gpus-per-socket=0  --pty --preserve-env --mpi=none 
-m block $SHELL


We have to give all those possible permutations to not use various GPU 
GRES as otherwise this srun will consume them if the salloc asked for it 
and then when the user tries to "srun" their application across the 
nodes it will block as there won't be any available on this first node.


Of course the fact that because of this the user can't see the GPUs 
without the srun can confuse some people, but it's unavoidable for this 
use case.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Restart Job after sudden reboot of the node

2020-07-24 Thread Christopher Samuel

On 7/24/20 12:28 pm, Saikat Roy wrote:


If SLURM restarts automatically, is there any way to stop it?


If you would rather Slurm not start scheduling jobs when it is restarted 
then you can set your partitions to have `State=DOWN` in slurm.conf.


That way should the node running slurmctld reboot then it won't start 
scheduling jobs until you tell it to.


For compute nodes I believe Slurm should detect any node that reboots 
and mark it "DOWN" with the reason set to "Node unexpectedly rebooted".


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] [EXT] Jobs Immediately Fail for Certain Users

2020-07-07 Thread Christopher Samuel

On 7/7/20 5:57 pm, Jason Simms wrote:


Failed to look up user weissp: No such process


That looks like the user isn't known to the node.  What do these say:

id weissp
getent passwd weissp

Which version of Slurm is this?

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Are SLURM_JOB_USER and SLURM_JOB_UID always constant and available

2020-05-20 Thread Christopher Samuel

On 5/20/20 7:23 pm, Kevin Buckley wrote:


Are they set as part of the job payload creation, and so would ignore
and node local lookup, or set as the job gets allocated to the various
nodes it will run on?


Looking at git, it's a bit of both:

src/slurmd/slurmd/req.c:

setenvf(, "SLURM_JOB_UID", "%u", job_env->uid);
[...]
setenvf(, "SLURM_JOB_USER", "%s", job_env->user_name);

so the variables get set on the slurmd side (as you'd expect) but from 
data that is sent along with the job.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] additional jobs killed by scancel.

2020-05-13 Thread Christopher Samuel

On 5/11/20 9:52 am, Alastair Neil wrote:

[2020-05-10T00:26:05.202] [533900.batch] sending 
REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 9


This caught my eye, Googling for it found a single instance, from 2019 
on the list again about jobs on a node mysteriously dying.


The resolution was (courtesy of Uwe Seher):

# The system is an opensuse leap 15 installation and slurm
# comes from the repository. By default a slurm.epilog.clean
# skript is installed which kills everything that belongs to
$ the user when a job is finished including other jobs,
# ssh-sessions and so on. I do not know if other distributions
# do the same or if the script is broken, but removing it
# solved the problem.

Hope that helps!

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Do not upgrade mysql to 5.7.30!

2020-05-07 Thread Christopher Samuel

On 5/7/20 6:08 AM, Riebs, Andy wrote:


Alternatively, you could switch to MariaDB; I've been using that for years.


Debian switched to only having MariaDB in 2017 with the release of 
Debian 9 (Stretch), as a derivative distro I'm surprised that Ubuntu 
still packages MySQL.


I'd second Andy's suggestion.

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Munge decode failing on new node

2020-04-22 Thread Christopher Samuel

On 4/22/20 12:56 PM, dean.w.schu...@gmail.com wrote:


There is a third user account on all machines in the cluster that is the
user account for using the cluster.  That account has uid 1000 on all four
worker nodes, but on the controller it is 1001.  So that is probably why the
question marks.


You need to have identical UIDs everywhere for this to work.

I would strongly suggest using something like LDAP to ensure that your 
users have identical representation everywhere.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread Christopher Samuel

Hi Robert,

On 4/8/20 7:08 AM, Robert Kudyba wrote:

and the NVIDIA Management Library (NVML) is installed on the node and 
was found during Slurm configuration


That's the key phrase - when whoever compiled Slurm ran ./configure 
*before* compilation it was on a system without the nvidia libraries and 
headers present, so Slurm could not compile that support in.


You'll need to redo the build on a system with the nvidia libraries and 
headers in order for this to work.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-07 Thread Christopher Samuel

On 4/7/20 2:48 PM, Robert Kudyba wrote:


How can I get this to work by loading the correct Bright module?


You can't - you will need to recompile Slurm.

The error says:

Apr 07 16:52:33 node001 slurmd[299181]: fatal: We were configured to 
autodetect nvml functionality, but we weren't able to find that lib when 
Slurm was configured.


So when Slurm was built the libraries you are telling it to use now were 
not detected and so the configure script disabled that functionality as 
it would not otherwise have been able to compile.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Accounting Information from slurmdbd does not reach slurmctld

2020-03-19 Thread Christopher Samuel

On 3/19/20 4:05 AM, Pascal Klink wrote:


However, there was not real answer given why this happened. So we thought that 
maybe this time someone may have an idea.


To me it sounds like either your slurmctld is not correctly registering 
with slurmdbd, or if it has then slurmdbd cannot connect back to slurmctld.


What does this say?

sacctmgr show clusters

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] slurmd -C showing incorrect core count

2020-03-12 Thread Christopher Samuel

On 3/12/20 9:37 PM, Kirill 'kkm' Katsnelson wrote:

Aaah, that's a cool find! I never really looked inside my nodes for more 
than a year since I debugged all my stuff so it "just works". They are 
conjured out of nothing and dissolve back into nothing after 10 minutes 
of inactivity. But good to know! In the cloud, changing the amount of 
RAM and the number and even type of CPUs is all too easy.


Also on some architectures doing that discovery can take time, so having 
it cached can be useful (slurmd will just read it once on startup).


For us that's on a ramdisk filesystem (as Cray XC nodes have no local 
disk) so it vanishes every time the node reboots.


My bet is that Mike's nodes have persistent storage and have an old copy 
of this file, hence the weird discrepancy he's seeing.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Block interactive shell sessions

2020-03-05 Thread Christopher Samuel

On 3/5/20 9:22 AM, Luis Huang wrote:

We would like to block certain nodes from accepting interactive jobs. Is 
this possible on slurm?


My suggestion would be to make a partition for interactive jobs that 
only contains the nodes that you want to run them and then use the 
submit filter to direct jobs without a batch script set to that 
partition only (and prevent people modifying the partition for those 
jobs once submitted).


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Slurm 19.05 X11-forwarding

2020-02-29 Thread Christopher Samuel

On 2/28/20 8:56 PM, Pär Lundö wrote:

I thought that I could run the srun-command with X11-forwarding called 
from an sbatch-jobarray-script and get the X11-forwarding to my display.


No, I believe X11 forwarding can only work when you run "srun --x11" 
directly on a login node, not from inside a batch script.


(You should not need to be logged into a compute node either)

See:

https://slurm.schedmd.com/faq.html#x11

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Slurm 17.11 and configuring backfill and oversubscribe to allow concurrent processes

2020-02-27 Thread Christopher Samuel

On 2/27/20 11:23 AM, Robert Kudyba wrote:

OK so does SLURM support MPS and if so what version? Would we need to 
enable cons_tres and use, e.g., --mem-per-gpu?


Slurm 19.05 (and later) supports MPS - here's the docs from the most 
recent release of 19.05:


https://slurm.schedmd.com/archive/slurm-19.05.5/gres.html

It does require the use of cons_tres for MPS.

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Slurm version 20.02.0 is now available

2020-02-25 Thread Christopher Samuel

On 2/25/20 11:41 AM, Dean Schulze wrote:

I'm very interested in the "configless" setup for slurm.  Is the setup 
for configless documented somewhere?


Looks like the website has already been updated for the 20.02 
documentation, and it looks like it's here:


https://slurm.schedmd.com/configless_slurm.html

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-07 Thread Christopher Samuel

Hi Dean,

On 2/7/20 8:03 AM, dean.w.schu...@gmail.com wrote:


I just checked the .deb package that I build from source and there is nothing 
in it that has nv or cuda in its name.

Are you sure that slurm distributes nvidia binaries?


SchedMD only distributes sources, it's up to distros how they package it.

I suspect you'll need to build it yourself if you want NVML support, I 
doubt many distros will want to be distributing builds linked against 
non-free nvidia libraries.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] sbatch sending the working directory from the controller to the node

2020-01-22 Thread Christopher Samuel

On 1/21/20 11:27 AM, Dean Schulze wrote:

The sbatch docs say nothing about why the node gets the pwd from the 
controller.  Why would slurm send a directory to a node that may not 
exist on the node and expect it to use it?


That's a pretty standard expectation from a cluster, that the filesystem 
you are working in on the node you are submitting from is the same as 
the one that's on the compute nodes.  Otherwise there's a lot of messy 
staging of files you'll need to do.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



  1   2   >