Re: [slurm-users] srun and --cpus-per-task

2022-03-25 Thread Durai Arasan
Hello all,

Thanks for the useful observations. Here is some further env vars:

# non problematic case
$ srun -c 3 --partition=gpu-2080ti env

SRUN_DEBUG=3
SLURM_JOB_CPUS_PER_NODE=4
SLURM_NTASKS=1
SLURM_NPROCS=1
SLURM_CPUS_PER_TASK=3
SLURM_STEP_ID=0
SLURM_STEPID=0
SLURM_NNODES=1
SLURM_JOB_NUM_NODES=1
SLURM_STEP_NUM_NODES=1
SLURM_STEP_NUM_TASKS=1
SLURM_STEP_TASKS_PER_NODE=1
SLURM_CPUS_ON_NODE=4
SLURM_NODEID=0


*SLURM_PROCID=0SLURM_LOCALID=0SLURM_GTIDS=0*


# problematic case - prints two sets of env vars
$ srun -c 1 --partition=gpu-2080ti env

SRUN_DEBUG=3
SLURM_JOB_CPUS_PER_NODE=2
SLURM_NTASKS=2
SLURM_NPROCS=2
SLURM_CPUS_PER_TASK=1
SLURM_STEP_ID=0
SLURM_STEPID=0
SLURM_NNODES=1
SLURM_JOB_NUM_NODES=1
SLURM_STEP_NUM_NODES=1
SLURM_STEP_NUM_TASKS=2
SLURM_STEP_TASKS_PER_NODE=2
SLURM_CPUS_ON_NODE=2
SLURM_NODEID=0

*SLURM_PROCID=0SLURM_LOCALID=0*

*SLURM_GTIDS=0,1*

SRUN_DEBUG=3
SLURM_JOB_CPUS_PER_NODE=2
SLURM_NTASKS=2
SLURM_NPROCS=2
SLURM_CPUS_PER_TASK=1
SLURM_STEP_ID=0
SLURM_STEPID=0
SLURM_NNODES=1
SLURM_JOB_NUM_NODES=1
SLURM_STEP_NUM_NODES=1
SLURM_STEP_NUM_TASKS=2
SLURM_STEP_TASKS_PER_NODE=2
SLURM_CPUS_ON_NODE=2
SLURM_NODEID=0



*SLURM_PROCID=1SLURM_LOCALID=1SLURM_GTIDS=0,1*
Please see the ones in bold. @Hermann Schwärzler how do you plan to manage
this bug? We have currently set SLURM_NTASKS_PER_NODE=1 clusterwide.

Best,
Durai


On Fri, Mar 25, 2022 at 12:45 PM Juergen Salk 
wrote:

> Hi Bjørn-Helge,
>
> that's very similar to what we did as well in order to avoid confusion with
> Core vs. Threads vs. CPU counts when Hyperthreading is kept enabled in the
> BIOS.
>
> Adding CPUs= (not ) will tell Slurm to only
> schedule physical cores.
>
> We have
>
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
>
> and
>
> NodeName=DEFAULT CPUs=48 Sockets=2 CoresPerSocket=24 ThreadsPerCore=2
>
> This is for compute nodes that have 2 sockets, 2 x 24 physical cores
> with hyperthreading enabled in the BIOS. (Although, in general, we do
> not encourage our users to make use of hyperthreading, we have decided
> to leave it enabled in the BIOS as there are some corner cases that
> are known to benefit from hyperthreading.)
>
> With this setting Slurm does also show the total physical core
> counts instead of the thread counts and also treats the --mem-per-cpu
> option as "--mem-per-core" which is in our case what most of our users
> expect.
>
> As to the number of tasks spawned with `--cpus-per-task=1´, I think this
> is intended behavior. The following sentence from the srun manpage is
> probably relevant:
>
> -c, --cpus-per-task=
>
>   If -c is specified without -n, as many tasks will be allocated per
>   node as possible while satisfying the -c restriction.
>
> In our configuration, we allow multiple jobs to run for the same user
> on a node (ExclusiveUser=yes) and we get
>
> $ srun -c 1 echo foo | wc -l
> 1
> $
>
> However, in case of CPUs= instead of CPUs=,
> I guess, this would have been 2 lines of output, because the smallest
> unit to schedule for a job is 1 physical core which allows 2 tasks to
> run with hyperthreading enabled.
>
> In case of exclusive node allocation for jobs (i.e. no node
> sharing allowed) Slurm would give all cores of a node to the job
> which allows even more tasks to be spawned:
>
> $ srun --exclusive -c 1 echo foo | wc -l
> 48
> $
>
> 48 lines correspond exactly to the number of physical cores on the
> node. Again, with CPUs= instead of CPUs=, I
> would expect 2 x 48 = 96 lines of output, but I did not test that.
>
> Best regards
> Jürgen
>
>
> * Bjørn-Helge Mevik  [220325 08:49]:
> > For what it's worth, we have a similar setup, with one crucial
> > difference: we are handing out physical cores to jobs, not hyperthreads,
> > and we are *not* seeing this behaviour:
> >
> > $ srun --cpus-per-task=1 -t 10 --mem-per-cpu=1g -A nnk -q devel echo
> foo
> > srun: job 5371678 queued and waiting for resources
> > srun: job 5371678 has been allocated resources
> > foo
> > $ srun --cpus-per-task=3 -t 10 --mem-per-cpu=1g -A nnk -q devel echo
> foo
> > srun: job 5371680 queued and waiting for resources
> > srun: job 5371680 has been allocated resources
> > foo
> >
> > We have
> >
> > SelectType=select/cons_tres
> > SelectTypeParameters=CR_CPU_Memory
> >
> > and node definitions like
> >
> > NodeName=DEFAULT CPUs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2
> RealMemory=182784 Gres=localscratch:330G Weight=1000
> >
> > (so we set CPUs to the number of *physical cores*, not *hyperthreads*).
> >
> > --
> > Regards,
> > Bjørn-Helge Mevik, dr. scient,
> > Department for Research Computing, University of Oslo
> >
>
>
>
> --
> Jürgen Salk
> Scientific Software & Compute Services (SSCS)
> Kommunikations- und Informationszentrum (kiz)
> Universität Ulm
> Telefon: +49 (0)731 50-22478
> Telefax: +49 (0)731 50-22471
>
>


Re: [slurm-users] srun and --cpus-per-task

2022-03-25 Thread Juergen Salk
Hi Bjørn-Helge,

that's very similar to what we did as well in order to avoid confusion with
Core vs. Threads vs. CPU counts when Hyperthreading is kept enabled in the
BIOS. 

Adding CPUs= (not ) will tell Slurm to only 
schedule physical cores. 

We have 

SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory

and

NodeName=DEFAULT CPUs=48 Sockets=2 CoresPerSocket=24 ThreadsPerCore=2 

This is for compute nodes that have 2 sockets, 2 x 24 physical cores
with hyperthreading enabled in the BIOS. (Although, in general, we do
not encourage our users to make use of hyperthreading, we have decided
to leave it enabled in the BIOS as there are some corner cases that
are known to benefit from hyperthreading.)

With this setting Slurm does also show the total physical core
counts instead of the thread counts and also treats the --mem-per-cpu
option as "--mem-per-core" which is in our case what most of our users
expect.

As to the number of tasks spawned with `--cpus-per-task=1´, I think this 
is intended behavior. The following sentence from the srun manpage is
probably relevant:

-c, --cpus-per-task=

  If -c is specified without -n, as many tasks will be allocated per
  node as possible while satisfying the -c restriction.

In our configuration, we allow multiple jobs to run for the same user
on a node (ExclusiveUser=yes) and we get 

$ srun -c 1 echo foo | wc -l
1
$

However, in case of CPUs= instead of CPUs=,
I guess, this would have been 2 lines of output, because the smallest
unit to schedule for a job is 1 physical core which allows 2 tasks to
run with hyperthreading enabled. 

In case of exclusive node allocation for jobs (i.e. no node
sharing allowed) Slurm would give all cores of a node to the job 
which allows even more tasks to be spawned:

$ srun --exclusive -c 1 echo foo | wc -l
48
$

48 lines correspond exactly to the number of physical cores on the
node. Again, with CPUs= instead of CPUs=, I
would expect 2 x 48 = 96 lines of output, but I did not test that. 

Best regards
Jürgen


* Bjørn-Helge Mevik  [220325 08:49]:
> For what it's worth, we have a similar setup, with one crucial
> difference: we are handing out physical cores to jobs, not hyperthreads,
> and we are *not* seeing this behaviour:
> 
> $ srun --cpus-per-task=1 -t 10 --mem-per-cpu=1g -A nnk -q devel echo foo
> srun: job 5371678 queued and waiting for resources
> srun: job 5371678 has been allocated resources
> foo
> $ srun --cpus-per-task=3 -t 10 --mem-per-cpu=1g -A nnk -q devel echo foo
> srun: job 5371680 queued and waiting for resources
> srun: job 5371680 has been allocated resources
> foo
> 
> We have
> 
> SelectType=select/cons_tres
> SelectTypeParameters=CR_CPU_Memory
> 
> and node definitions like
> 
> NodeName=DEFAULT CPUs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2 
> RealMemory=182784 Gres=localscratch:330G Weight=1000
> 
> (so we set CPUs to the number of *physical cores*, not *hyperthreads*).
> 
> -- 
> Regards,
> Bjørn-Helge Mevik, dr. scient,
> Department for Research Computing, University of Oslo
> 



-- 
Jürgen Salk
Scientific Software & Compute Services (SSCS)
Kommunikations- und Informationszentrum (kiz)
Universität Ulm
Telefon: +49 (0)731 50-22478
Telefax: +49 (0)731 50-22471



Re: [slurm-users] srun and --cpus-per-task

2022-03-25 Thread Bjørn-Helge Mevik
Hermann Schwärzler  writes:

> Do you happen to know if there is a difference between setting CPUs
> explicitely like you do it and not setting it but using
> "ThreadsPerCore=1"?
>
> My guess is that there is no difference and in both cases only the
> physical cores are "handed out to jobs". But maybe I am wrong?

I don't think we've ever tried that.  But I'd be sceptical about "lying"
to Slurm about the actual hardware structure - it migth confuse the cpu
binding if Slurm and the kernel has different pictures of the hardware.

-- 
Bjørn-Helge


signature.asc
Description: PGP signature


Re: [slurm-users] srun and --cpus-per-task

2022-03-25 Thread Hermann Schwärzler

Hi Bjørn-Helge,
hi everone,

ok, I see. I also just re-read the documentation to find this in the 
description of the "CPUs" option:


"This can be useful when you want to schedule only the cores on a 
hyper-threaded node. If CPUs is omitted, its default will be set equal 
to the product of Boards, Sockets, CoresPerSocket, and ThreadsPerCore."


Do you happen to know if there is a difference between setting CPUs 
explicitely like you do it and not setting it but using "ThreadsPerCore=1"?


My guess is that there is no difference and in both cases only the 
physical cores are "handed out to jobs". But maybe I am wrong?


Regards,
Hermann

On 3/25/22 8:49 AM, Bjørn-Helge Mevik wrote:

For what it's worth, we have a similar setup, with one crucial
difference: we are handing out physical cores to jobs, not hyperthreads,
and we are *not* seeing this behaviour:

$ srun --cpus-per-task=1 -t 10 --mem-per-cpu=1g -A nnk -q devel echo foo
srun: job 5371678 queued and waiting for resources
srun: job 5371678 has been allocated resources
foo
$ srun --cpus-per-task=3 -t 10 --mem-per-cpu=1g -A nnk -q devel echo foo
srun: job 5371680 queued and waiting for resources
srun: job 5371680 has been allocated resources
foo

We have

SelectType=select/cons_tres
SelectTypeParameters=CR_CPU_Memory

and node definitions like

NodeName=DEFAULT CPUs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2 
RealMemory=182784 Gres=localscratch:330G Weight=1000

(so we set CPUs to the number of *physical cores*, not *hyperthreads*).





[slurm-users] user_name attribute missing in job response from slurm rest api method jobs

2022-03-25 Thread Filip Holka
Hallo,

Let me explain a problem I have with one of the methods from slurmrest api, 
namley the “jobs” method. When I issue the call:

curl -H "X-SLURM-USER-NAME:holka" -H "X-SLURM-USER-TOKEN:${SLURM_JWT}" 
http://172.20.102.31:6820/slurm/v0.0.37/jobs 


I receive the response (which is as expected an array of jobs). However, some 
of the jobs (I've noticed it for some of the PENDING jobs only yet) does not 
have the attribute “user_ name” set (On the othe hand the standard slurm 
cmdline utils like squeue display the username correctly). 

 …
"jobs": [
 {
   "account": "ktokar",
   "accrue_time": 1640268477,
   "admin_comment": "",
   "array_job_id": 0,
   "array_task_id": null,
…
   ],
   "group_id": 1003,
   "job_id": 1097151,
   "job_resources": {
   },
   "job_state": "PENDING”,
…
   "tres_alloc_str": "",
   "user_id": 1081,
   "user_name": "",
   "wckey": "",
   "current_working_directory": "\/lustre\/home\/ktokar\/test\/NbO2"
 },
…

The same happens, when I call the method “job" with particular job id:

curl -H "X-SLURM-USER-NAME:holka" -H "X-SLURM-USER-TOKEN:${SLURM_JWT}" 
http://172.20.102.31:6820/slurm/v0.0.37/job/ 
1097151 


We have a mobile app for users with some statistical and accounting data, that 
is relying on the slurm rest API and this inconsistency in the jobs method is 
causing us slight problems. We have currently version 21.08.4. Could anyone 
help with that?

Thanks and regards, Filip.


#  Filip Holka
#  HPC administration and monitoring
#  Computing Centre of Slovak Academy of Sciences
#  Dúbravská cesta 9
#  84535 Bratislava
#  Slovakia
#
#  Office phone: +421 2 3229 3110
#  Cell phone: +421 904 942 684
#  E-mail: filip.ho...@savba.sk



Re: [slurm-users] Disable exclusive flag for users

2022-03-25 Thread Pankaj Dorlikar
thanks

-Original Message-
From: slurm-users  On Behalf Of Ward 
Poelmans
Sent: 25 March 2022 13:10
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Disable exclusive flag for users

Hi PVD,

On 25/03/2022 01:55, pankajd wrote:

> We have slurm 21.08.6 and GPUs in our compute nodes. We want to restrict / 
> disable the use of "exclusive" flag in srun for users. How should we do it?


You can check for the flag in the job_submit.lua plugin and reject it if it's 
used while also requesting gpus?

Ward



[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.





Re: [slurm-users] Disable exclusive flag for users

2022-03-25 Thread Bjørn-Helge Mevik
pankajd  writes:

> We have slurm 21.08.6 and GPUs in our compute nodes. We want to restrict /
> disable the use of "exclusive" flag in srun for users. How should we do it?

Two options would be to use the CLI_filter plugin or the job_submit
plugin.  If you want the enforcement to be guaranteed, then the
job_submit plugin is the place (cli_filter can be circumvented by user).

For instance, in job_submit.lua:

   if job_desc.shared == 0 or job_desc.shared == 2 or job_desc.shared == 3 then
slurm.user_msg ("Warning! Please do not use --exclusive unless you 
really know what you are doing.  Your job might be accounted for more CPUs than 
it actually uses, sometimes many times more.  There are better ways to specify 
using whole nodes, for instance using all cpus on the node or all memory on the 
node.")
end

or in cli_filter.lua:

   is_bad_exclusive = { exclusive = true, user = true, mcs = true }
   if is_bad_exclusive[options["exclusive"]] then
  slurm.log_info("Warning! Please do not use --exclusive unless you really 
know what you are doing.  Your job might be accounted for more CPUs than it 
actually uses, sometimes many times more.  There are better ways to specify 
using whole nodes, for instance using all cpus on the node or all memory on the 
node.")
   end

(both of these just warn, though, but should be easy to change into
rejecting the job.)

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo


signature.asc
Description: PGP signature


Re: [slurm-users] srun and --cpus-per-task

2022-03-25 Thread Bjørn-Helge Mevik
For what it's worth, we have a similar setup, with one crucial
difference: we are handing out physical cores to jobs, not hyperthreads,
and we are *not* seeing this behaviour:

$ srun --cpus-per-task=1 -t 10 --mem-per-cpu=1g -A nnk -q devel echo foo
srun: job 5371678 queued and waiting for resources
srun: job 5371678 has been allocated resources
foo
$ srun --cpus-per-task=3 -t 10 --mem-per-cpu=1g -A nnk -q devel echo foo
srun: job 5371680 queued and waiting for resources
srun: job 5371680 has been allocated resources
foo

We have

SelectType=select/cons_tres
SelectTypeParameters=CR_CPU_Memory

and node definitions like

NodeName=DEFAULT CPUs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2 
RealMemory=182784 Gres=localscratch:330G Weight=1000

(so we set CPUs to the number of *physical cores*, not *hyperthreads*).

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo



signature.asc
Description: PGP signature


Re: [slurm-users] Disable exclusive flag for users

2022-03-25 Thread Ward Poelmans

Hi PVD,

On 25/03/2022 01:55, pankajd wrote:


We have slurm 21.08.6 and GPUs in our compute nodes. We want to restrict / disable the 
use of "exclusive" flag in srun for users. How should we do it?



You can check for the flag in the job_submit.lua plugin and reject it if it's 
used while also requesting gpus?

Ward


smime.p7s
Description: S/MIME Cryptographic Signature