Re: [slurm-users] srun and --cpus-per-task
Hello all, Thanks for the useful observations. Here is some further env vars: # non problematic case $ srun -c 3 --partition=gpu-2080ti env SRUN_DEBUG=3 SLURM_JOB_CPUS_PER_NODE=4 SLURM_NTASKS=1 SLURM_NPROCS=1 SLURM_CPUS_PER_TASK=3 SLURM_STEP_ID=0 SLURM_STEPID=0 SLURM_NNODES=1 SLURM_JOB_NUM_NODES=1 SLURM_STEP_NUM_NODES=1 SLURM_STEP_NUM_TASKS=1 SLURM_STEP_TASKS_PER_NODE=1 SLURM_CPUS_ON_NODE=4 SLURM_NODEID=0 *SLURM_PROCID=0SLURM_LOCALID=0SLURM_GTIDS=0* # problematic case - prints two sets of env vars $ srun -c 1 --partition=gpu-2080ti env SRUN_DEBUG=3 SLURM_JOB_CPUS_PER_NODE=2 SLURM_NTASKS=2 SLURM_NPROCS=2 SLURM_CPUS_PER_TASK=1 SLURM_STEP_ID=0 SLURM_STEPID=0 SLURM_NNODES=1 SLURM_JOB_NUM_NODES=1 SLURM_STEP_NUM_NODES=1 SLURM_STEP_NUM_TASKS=2 SLURM_STEP_TASKS_PER_NODE=2 SLURM_CPUS_ON_NODE=2 SLURM_NODEID=0 *SLURM_PROCID=0SLURM_LOCALID=0* *SLURM_GTIDS=0,1* SRUN_DEBUG=3 SLURM_JOB_CPUS_PER_NODE=2 SLURM_NTASKS=2 SLURM_NPROCS=2 SLURM_CPUS_PER_TASK=1 SLURM_STEP_ID=0 SLURM_STEPID=0 SLURM_NNODES=1 SLURM_JOB_NUM_NODES=1 SLURM_STEP_NUM_NODES=1 SLURM_STEP_NUM_TASKS=2 SLURM_STEP_TASKS_PER_NODE=2 SLURM_CPUS_ON_NODE=2 SLURM_NODEID=0 *SLURM_PROCID=1SLURM_LOCALID=1SLURM_GTIDS=0,1* Please see the ones in bold. @Hermann Schwärzler how do you plan to manage this bug? We have currently set SLURM_NTASKS_PER_NODE=1 clusterwide. Best, Durai On Fri, Mar 25, 2022 at 12:45 PM Juergen Salk wrote: > Hi Bjørn-Helge, > > that's very similar to what we did as well in order to avoid confusion with > Core vs. Threads vs. CPU counts when Hyperthreading is kept enabled in the > BIOS. > > Adding CPUs= (not ) will tell Slurm to only > schedule physical cores. > > We have > > SelectType=select/cons_res > SelectTypeParameters=CR_Core_Memory > > and > > NodeName=DEFAULT CPUs=48 Sockets=2 CoresPerSocket=24 ThreadsPerCore=2 > > This is for compute nodes that have 2 sockets, 2 x 24 physical cores > with hyperthreading enabled in the BIOS. (Although, in general, we do > not encourage our users to make use of hyperthreading, we have decided > to leave it enabled in the BIOS as there are some corner cases that > are known to benefit from hyperthreading.) > > With this setting Slurm does also show the total physical core > counts instead of the thread counts and also treats the --mem-per-cpu > option as "--mem-per-core" which is in our case what most of our users > expect. > > As to the number of tasks spawned with `--cpus-per-task=1´, I think this > is intended behavior. The following sentence from the srun manpage is > probably relevant: > > -c, --cpus-per-task= > > If -c is specified without -n, as many tasks will be allocated per > node as possible while satisfying the -c restriction. > > In our configuration, we allow multiple jobs to run for the same user > on a node (ExclusiveUser=yes) and we get > > $ srun -c 1 echo foo | wc -l > 1 > $ > > However, in case of CPUs= instead of CPUs=, > I guess, this would have been 2 lines of output, because the smallest > unit to schedule for a job is 1 physical core which allows 2 tasks to > run with hyperthreading enabled. > > In case of exclusive node allocation for jobs (i.e. no node > sharing allowed) Slurm would give all cores of a node to the job > which allows even more tasks to be spawned: > > $ srun --exclusive -c 1 echo foo | wc -l > 48 > $ > > 48 lines correspond exactly to the number of physical cores on the > node. Again, with CPUs= instead of CPUs=, I > would expect 2 x 48 = 96 lines of output, but I did not test that. > > Best regards > Jürgen > > > * Bjørn-Helge Mevik [220325 08:49]: > > For what it's worth, we have a similar setup, with one crucial > > difference: we are handing out physical cores to jobs, not hyperthreads, > > and we are *not* seeing this behaviour: > > > > $ srun --cpus-per-task=1 -t 10 --mem-per-cpu=1g -A nnk -q devel echo > foo > > srun: job 5371678 queued and waiting for resources > > srun: job 5371678 has been allocated resources > > foo > > $ srun --cpus-per-task=3 -t 10 --mem-per-cpu=1g -A nnk -q devel echo > foo > > srun: job 5371680 queued and waiting for resources > > srun: job 5371680 has been allocated resources > > foo > > > > We have > > > > SelectType=select/cons_tres > > SelectTypeParameters=CR_CPU_Memory > > > > and node definitions like > > > > NodeName=DEFAULT CPUs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2 > RealMemory=182784 Gres=localscratch:330G Weight=1000 > > > > (so we set CPUs to the number of *physical cores*, not *hyperthreads*). > > > > -- > > Regards, > > Bjørn-Helge Mevik, dr. scient, > > Department for Research Computing, University of Oslo > > > > > > -- > Jürgen Salk > Scientific Software & Compute Services (SSCS) > Kommunikations- und Informationszentrum (kiz) > Universität Ulm > Telefon: +49 (0)731 50-22478 > Telefax: +49 (0)731 50-22471 > >
Re: [slurm-users] srun and --cpus-per-task
Hi Bjørn-Helge, that's very similar to what we did as well in order to avoid confusion with Core vs. Threads vs. CPU counts when Hyperthreading is kept enabled in the BIOS. Adding CPUs= (not ) will tell Slurm to only schedule physical cores. We have SelectType=select/cons_res SelectTypeParameters=CR_Core_Memory and NodeName=DEFAULT CPUs=48 Sockets=2 CoresPerSocket=24 ThreadsPerCore=2 This is for compute nodes that have 2 sockets, 2 x 24 physical cores with hyperthreading enabled in the BIOS. (Although, in general, we do not encourage our users to make use of hyperthreading, we have decided to leave it enabled in the BIOS as there are some corner cases that are known to benefit from hyperthreading.) With this setting Slurm does also show the total physical core counts instead of the thread counts and also treats the --mem-per-cpu option as "--mem-per-core" which is in our case what most of our users expect. As to the number of tasks spawned with `--cpus-per-task=1´, I think this is intended behavior. The following sentence from the srun manpage is probably relevant: -c, --cpus-per-task= If -c is specified without -n, as many tasks will be allocated per node as possible while satisfying the -c restriction. In our configuration, we allow multiple jobs to run for the same user on a node (ExclusiveUser=yes) and we get $ srun -c 1 echo foo | wc -l 1 $ However, in case of CPUs= instead of CPUs=, I guess, this would have been 2 lines of output, because the smallest unit to schedule for a job is 1 physical core which allows 2 tasks to run with hyperthreading enabled. In case of exclusive node allocation for jobs (i.e. no node sharing allowed) Slurm would give all cores of a node to the job which allows even more tasks to be spawned: $ srun --exclusive -c 1 echo foo | wc -l 48 $ 48 lines correspond exactly to the number of physical cores on the node. Again, with CPUs= instead of CPUs=, I would expect 2 x 48 = 96 lines of output, but I did not test that. Best regards Jürgen * Bjørn-Helge Mevik [220325 08:49]: > For what it's worth, we have a similar setup, with one crucial > difference: we are handing out physical cores to jobs, not hyperthreads, > and we are *not* seeing this behaviour: > > $ srun --cpus-per-task=1 -t 10 --mem-per-cpu=1g -A nnk -q devel echo foo > srun: job 5371678 queued and waiting for resources > srun: job 5371678 has been allocated resources > foo > $ srun --cpus-per-task=3 -t 10 --mem-per-cpu=1g -A nnk -q devel echo foo > srun: job 5371680 queued and waiting for resources > srun: job 5371680 has been allocated resources > foo > > We have > > SelectType=select/cons_tres > SelectTypeParameters=CR_CPU_Memory > > and node definitions like > > NodeName=DEFAULT CPUs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2 > RealMemory=182784 Gres=localscratch:330G Weight=1000 > > (so we set CPUs to the number of *physical cores*, not *hyperthreads*). > > -- > Regards, > Bjørn-Helge Mevik, dr. scient, > Department for Research Computing, University of Oslo > -- Jürgen Salk Scientific Software & Compute Services (SSCS) Kommunikations- und Informationszentrum (kiz) Universität Ulm Telefon: +49 (0)731 50-22478 Telefax: +49 (0)731 50-22471
Re: [slurm-users] srun and --cpus-per-task
Hermann Schwärzler writes: > Do you happen to know if there is a difference between setting CPUs > explicitely like you do it and not setting it but using > "ThreadsPerCore=1"? > > My guess is that there is no difference and in both cases only the > physical cores are "handed out to jobs". But maybe I am wrong? I don't think we've ever tried that. But I'd be sceptical about "lying" to Slurm about the actual hardware structure - it migth confuse the cpu binding if Slurm and the kernel has different pictures of the hardware. -- Bjørn-Helge signature.asc Description: PGP signature
Re: [slurm-users] srun and --cpus-per-task
Hi Bjørn-Helge, hi everone, ok, I see. I also just re-read the documentation to find this in the description of the "CPUs" option: "This can be useful when you want to schedule only the cores on a hyper-threaded node. If CPUs is omitted, its default will be set equal to the product of Boards, Sockets, CoresPerSocket, and ThreadsPerCore." Do you happen to know if there is a difference between setting CPUs explicitely like you do it and not setting it but using "ThreadsPerCore=1"? My guess is that there is no difference and in both cases only the physical cores are "handed out to jobs". But maybe I am wrong? Regards, Hermann On 3/25/22 8:49 AM, Bjørn-Helge Mevik wrote: For what it's worth, we have a similar setup, with one crucial difference: we are handing out physical cores to jobs, not hyperthreads, and we are *not* seeing this behaviour: $ srun --cpus-per-task=1 -t 10 --mem-per-cpu=1g -A nnk -q devel echo foo srun: job 5371678 queued and waiting for resources srun: job 5371678 has been allocated resources foo $ srun --cpus-per-task=3 -t 10 --mem-per-cpu=1g -A nnk -q devel echo foo srun: job 5371680 queued and waiting for resources srun: job 5371680 has been allocated resources foo We have SelectType=select/cons_tres SelectTypeParameters=CR_CPU_Memory and node definitions like NodeName=DEFAULT CPUs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2 RealMemory=182784 Gres=localscratch:330G Weight=1000 (so we set CPUs to the number of *physical cores*, not *hyperthreads*).
Re: [slurm-users] srun and --cpus-per-task
For what it's worth, we have a similar setup, with one crucial difference: we are handing out physical cores to jobs, not hyperthreads, and we are *not* seeing this behaviour: $ srun --cpus-per-task=1 -t 10 --mem-per-cpu=1g -A nnk -q devel echo foo srun: job 5371678 queued and waiting for resources srun: job 5371678 has been allocated resources foo $ srun --cpus-per-task=3 -t 10 --mem-per-cpu=1g -A nnk -q devel echo foo srun: job 5371680 queued and waiting for resources srun: job 5371680 has been allocated resources foo We have SelectType=select/cons_tres SelectTypeParameters=CR_CPU_Memory and node definitions like NodeName=DEFAULT CPUs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2 RealMemory=182784 Gres=localscratch:330G Weight=1000 (so we set CPUs to the number of *physical cores*, not *hyperthreads*). -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature
Re: [slurm-users] srun and --cpus-per-task
Hi Durai, I see the same thing as you on our test-cluster that has ThreadsPerCore=2 configured in slurm.conf The double-foo goes away with this: srun --cpus-per-task=1 --hint=nomultithread echo foo Having multithreading enabled leads to imho surprising behaviour of Slurm. My impression is that using it makes the concept of "a CPU" in Slurm somewhat fuzzy. It becomes unclear and ambiguous what you get when using the cpu-related options of srun, sbatch and salloc: is it a CPU-core or is it a CPU-thread? I think what you found is a bug. If you run for c in {4..1} do echo "## $c ###" srun -c $c bash -c 'echo $SLURM_CPU_BIND_LIST' done you will get: ## 4 ### 0x003003 ## 3 ### 0x003003 ## 2 ### 0x001001 ## 1 ### 0x01,0x001000 0x01,0x001000 You see: requesting 4 and 3 CPUs results in the same cpu-binding as both need two CPU-cores with 2 threads each. In the "3" case one of it stays unused but of course is not free for another job. In the "1" case I would expect to see the same binding as in the "2" case. If you combine the two values in the list you *do* get the same value but obviously it's a list of two values and this might be the origin of the problem. It is probably related to what's mentioned in the documentation for '--ntasks': "[...] The default is one task per node, but note that the --cpus-per-task option will change this default." Regards Hermann On 3/24/22 1:37 PM, Durai Arasan wrote: Hello Slurm users, We are experiencing strange behavior with srun executing commands twice only when setting --cpus-per-task=1 $ srun --cpus-per-task=1 --partition=gpu-2080ti echo foo srun: job 1298286 queued and waiting for resources srun: job 1298286 has been allocated resources foo foo This is not seen when --cpus-per-task is another value: $ srun --cpus-per-task=3 --partition=gpu-2080ti echo foo srun: job 1298287 queued and waiting for resources srun: job 1298287 has been allocated resources foo Also when specifying --ntasks: $ srun -n1 --cpus-per-task=1 --partition=gpu-2080ti echo foo srun: job 1298288 queued and waiting for resources srun: job 1298288 has been allocated resources foo Relevant slurm.conf settings are: SelectType=select/cons_tres SelectTypeParameters=CR_Core_Memory # example node configuration NodeName=slurm-bm-58 NodeAddr=xxx.xxx.xxx.xxx Procs=72 Sockets=2 CoresPerSocket=18 ThreadsPerCore=2 RealMemory=354566 Gres=gpu:rtx2080ti:8 Feature=xx_v2.38 State=UNKNOWN On closer of job variables in the "--cpus-per-task=1" case, the following variables have wrongly acquired a value of 2 for no reason: SLURM_NTASKS=2 SLURM_NPROCS=2 SLURM_TASKS_PER_NODE=2 SLURM_STEP_NUM_TASKS=2 SLURM_STEP_TASKS_PER_NODE=2 Can you see what could be wrong? Best, Durai