[slurm-users] question about hyperthreaded CPUS, --hint=nomultithread and mutli-jobstep jobs

2023-05-23 Thread Hans van Schoot

Hi all,

I am getting some unexpected behavior with SLURM on a multithreaded CPU 
(AMD Ryzen 7950X), in combination with a job that uses multiple jobsteps 
and a program that prefers to run without hyperthreading.


My job consists of a simple shell script that does multiple srun 
executions, and normally (on non-multithreaded nodes) the srun commands 
will only start when resources are available inside my allocation. Example:


sbatch -N 1 -n 16 mytestjob.sh

mytestjob.sh contains:
srun -n 8 someMPIprog &
srun -n 8 someMPIprog &
srun -n 8 someMPIprog &
srun -n 8 someMPIprog &
wait

srun 1 and 2 will start immediately, srun 3 will start as soon as one of 
the first two jobsteps is finished, and srun 4 will again wait until 
some cores are available.


Now I would like this same behavior (no multithreading, one task per 
core) on a node with 16 multithreaded cores (32 cpus in SLURM 
ThreadsPerCore=2), so I submit with the following command:

sbatch -N 1 --hint=nomultithread -n 16 mytestjob.sh

Slurm correctly reserves the whole node for this, and srun without 
additional directions would launch someMPIprog with 16 MPI ranks.
Unfortunately in the multi jobstep situation, this causes all four srun 
iterations to start immediately, resulting in 4x 8 MPI ranks running at 
the same time, and thus multithreading. As I specified 
--hint=nomultithread, I would have expected the same behaviour as on the 
no-multithreaded node: srun 1 and 2 launch directly, and srun 3 and 4 
wait for CPU resources to become available.



So far I've been able to find two hacky ways of getting around this problem:
- do not use --hint=nomultithread, and instead limit using memory 
(--mem-per-cpu=4000). This is a bit ugly: it reserves half the compute 
node and seems to bind to the wrong CPU cores.
- set --cpus-per-task=2 instead of --hint=nomultithread, but this causes 
OpenMP to kick in if the MPI program supports it.



To me this feels like a bit of a bug in SLURM: I tell it not to 
multithread, but it still schedules jobsteps that cause the CPU to 
multithread


Is there another way of getting the non-multithreaded behavior without 
disabling multithreading in BIOS?


Best regards and many thanks in advance!
Hans van Schoot


Some additional information:
- I'm running slurm 18.08.4
- This is my node configuration in scontrol:
scontrol show nodes compute-7-0
NodeName=compute-7-0 Arch=x86_64 CoresPerSocket=16
   CPUAlloc=0 CPUTot=32 CPULoad=1.00
   AvailableFeatures=rack-7,32CPUs
   ActiveFeatures=rack-7,32CPUs
   Gres=(null)
   NodeAddr=10.1.1.210 NodeHostName=compute-7-0 Version=18.08
   OS=Linux 6.1.8-1.el7.elrepo.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jan 23 
12:57:27 EST 2023

   RealMemory=64051 AllocMem=0 FreeMem=62681 Sockets=1 Boards=1
   State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=20511200 Owner=N/A 
MCS_label=N/A

   Partitions=zen4
   BootTime=2023-04-04T14:14:55 SlurmdStartTime=2023-04-17T12:32:38
   CfgTRES=cpu=32,mem=64051M,billing=47
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s



Re: [slurm-users] Disabling SWAP space will it effect SLURM working

2023-12-06 Thread Hans van Schoot

Hi Joseph,

This might depend on the rest of your configuration, but in general swap 
should not be needed for anything on Linux.
BUT: you might get OOM killer messages in your system logs, and SLURM 
might fall victim to the OOM killer (OOM = Out Of Memory) if you run 
applications on the compute node that eat up all your RAM.
Swap does not prevent against this, but makes it less likely to happen. 
I've seen OOM kill slurm daemon processes on compute nodes with swap, 
usually slurm recovers just fine after the application that ate up all 
the RAM ends up getting killed by the OOM killer. My compute nodes are 
not configured to monitor memory usage of jobs. If you have memory 
configured as a managed resource in your SLURM setup, and you leave a 
bit of headroom for the OS itself (e.g. only hand our a maximum of 250GB 
RAM to jobs on your 256GB RAM nodes), you should be fine.


cheers,
Hans


ps. I'm just a happy slurm user/admin, not an expert, so I might be 
wrong about everything :-)




On 06-12-2023 05:57, John Joseph wrote:

Dear All,
Good morning
We have 4 node   [256 GB Ram in each node]  SLURM instance  with which 
we installed and it is working fine.
We have 2 GB of SWAP space on each node,  for some purpose  to make 
the system in full use want to disable the SWAP memory,


Like to know if I am disabling the SWAP  partition will it efffect 
SLURM  functionality .


Advice requested
Thanks
Joseph John