[slurm-users] Disable exclusive flag for users
Hi, We have slurm 21.08.6 and GPUs in our compute nodes. We want to restrict / disable the use of "exclusive" flag in srun for users. How should we do it? -- Thanks and regards, PVD For assimilation and dissemination of knowledge, visit cakes.cdac.in [ C-DAC is on Social-Media too. Kindly follow us at: Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] This e-mail is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies and the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email is strictly prohibited and appropriate legal action will be taken.
Re: [slurm-users] Question about sbatch options: -n, and --cpus-per-task
If you want to have the same number of processes per node, like: #PBS -l nodes=4:ppn=8 then what I am doing (maybe there is another way?) is: #SBATCH --ntasks-per-node=8 #SBATCH --nodes=4 #SBATCH --mincpus=8 This is because "--ntasks-per-node" is actually "maximum number of tasks per node" and "--nodes=4" means "minimum number of nodes". I'm sure other variations (specifying --ntasks=32, --mincpus=8 and --nodes=4-4 might do it too) but this one is what I've been using. I remember being surprised when coming over from Torque to find that "--ntasks-per-node" and --nodes did not mean what they so obviously seemed to mean. Steve On Thu, Mar 24, 2022 at 7:56 PM David Henkemeyer wrote: > Thank you! We recently converted from pbs, and I was converting “ppn=X” > to “-n X”. Does it make more sense to convert “ppn=X” to > --“cpus-per-task=X”? > > Thanks again > David > > On Thu, Mar 24, 2022 at 3:54 PM Thomas M. Payerle wrote: > >> Although all three cases ( "-N 1 --cpus-per-task 64 -n 1", "-N 1 >> --cpus-per-task 1 -n 64", and "-N 1 --cpus-per-task 32 -n 2") will cause >> Slurm to allocate 64 cores to the job, there can (and will) be differences >> in the other respects. >> >> The variable SLURM_NTASKS will be set to the argument of the -n (aka >> --ntasks) argument, and other Slurm variables will differ as well. >> >> More importantly, as others noted, srun will launch $SLURM_NTASKS >> processes. The mpirun/mpiexec/etc binaries of most MPI libraries will (if >> compiled with support for Slurm) act similarly (and indeed, I believe most >> use srun under the hood). >> >> If you are just using sbatch and launching a single process using 64 >> threads, then the different options are probably equivalent for most intent >> and purposes. Similar if you are doing a loop to start 64 single threaded >> processes. But those are simplistic cases, and just happen to "work" even >> though you are "abusing" the scheduler options. And even the cases wherein >> it "works" is subject to unexpected failures (e.g. if one substitutes srun >> for sbatch). >> >> The differences are most clear when the -N 1 flag is not given. >> Generally, SLURM_NTASKS should be the number of MPI or similar tasks you >> intend to start. By default, it is assumed the tasks can support >> distributed memory parallelism, so the scheduler by default assumes that it >> can launch tasks on different nodes (the -N 1 flag you mentioned would >> override that). Each such task is assumed to need --cpus-per-task cores >> which the scheduler assumes needs shared memory parallelism (i.e. must be >> on the same node). >> So without the -N 1, "--cpus-per-task 64 -n 1" will require 64 cores on a >> single node, whereas "-n 64 --cpus-per-task 1" can result in the job being >> assigned 64 cores on a single node to a single core on 64 nodes or any >> combination in between with 64 cores. The "--cpus-per-task 32 -n 2" will >> either assign one node with 64 cores or 2 nodes with 32 cores each. >> >> As I said, although there are some simple cases where the different cases >> are mostly functionally equivalent, I would recommend trying to use the >> proper arguments --- "abusing" the arguments might work for a while but >> will likely bite you in the end. E.g., the 64 thread case should do >> "--cpus-per-task 64", and the launching processes in the loop should >> _probably_ do "-n 64" (assuming it can handle the tasks being assigned to >> different nodes). >> >> On Thu, Mar 24, 2022 at 3:35 PM David Henkemeyer < >> david.henkeme...@gmail.com> wrote: >> >>> Assuming -N is 1 (meaning, this job needs only one node), then is there >>> a difference between any of these 3 flag combinations: >>> >>> -n 64 (leaving cpus-per-task to be the default of 1) >>> --cpus-per-task 64 (leaving -n to be the default of 1) >>> --cpus-per-task 32 -n 2 >>> >>> As far as I can tell, there is no functional difference. But if there is >>> even a subtle difference, I would love to know what it is! >>> >>> Thanks >>> David >>> -- >>> Sent from Gmail Mobile >>> >> >> >> -- >> Tom Payerle >> DIT-ACIGS/Mid-Atlantic Crossroadspaye...@umd.edu >> 5825 University Research Park (301) 405-6135 >> University of Maryland >> College Park, MD 20740-3831 >> > -- > Sent from Gmail Mobile >
Re: [slurm-users] Question about sbatch options: -n, and --cpus-per-task
Thank you! We recently converted from pbs, and I was converting “ppn=X” to “-n X”. Does it make more sense to convert “ppn=X” to --“cpus-per-task=X”? Thanks again David On Thu, Mar 24, 2022 at 3:54 PM Thomas M. Payerle wrote: > Although all three cases ( "-N 1 --cpus-per-task 64 -n 1", "-N 1 > --cpus-per-task 1 -n 64", and "-N 1 --cpus-per-task 32 -n 2") will cause > Slurm to allocate 64 cores to the job, there can (and will) be differences > in the other respects. > > The variable SLURM_NTASKS will be set to the argument of the -n (aka > --ntasks) argument, and other Slurm variables will differ as well. > > More importantly, as others noted, srun will launch $SLURM_NTASKS > processes. The mpirun/mpiexec/etc binaries of most MPI libraries will (if > compiled with support for Slurm) act similarly (and indeed, I believe most > use srun under the hood). > > If you are just using sbatch and launching a single process using 64 > threads, then the different options are probably equivalent for most intent > and purposes. Similar if you are doing a loop to start 64 single threaded > processes. But those are simplistic cases, and just happen to "work" even > though you are "abusing" the scheduler options. And even the cases wherein > it "works" is subject to unexpected failures (e.g. if one substitutes srun > for sbatch). > > The differences are most clear when the -N 1 flag is not given. > Generally, SLURM_NTASKS should be the number of MPI or similar tasks you > intend to start. By default, it is assumed the tasks can support > distributed memory parallelism, so the scheduler by default assumes that it > can launch tasks on different nodes (the -N 1 flag you mentioned would > override that). Each such task is assumed to need --cpus-per-task cores > which the scheduler assumes needs shared memory parallelism (i.e. must be > on the same node). > So without the -N 1, "--cpus-per-task 64 -n 1" will require 64 cores on a > single node, whereas "-n 64 --cpus-per-task 1" can result in the job being > assigned 64 cores on a single node to a single core on 64 nodes or any > combination in between with 64 cores. The "--cpus-per-task 32 -n 2" will > either assign one node with 64 cores or 2 nodes with 32 cores each. > > As I said, although there are some simple cases where the different cases > are mostly functionally equivalent, I would recommend trying to use the > proper arguments --- "abusing" the arguments might work for a while but > will likely bite you in the end. E.g., the 64 thread case should do > "--cpus-per-task 64", and the launching processes in the loop should > _probably_ do "-n 64" (assuming it can handle the tasks being assigned to > different nodes). > > On Thu, Mar 24, 2022 at 3:35 PM David Henkemeyer < > david.henkeme...@gmail.com> wrote: > >> Assuming -N is 1 (meaning, this job needs only one node), then is there a >> difference between any of these 3 flag combinations: >> >> -n 64 (leaving cpus-per-task to be the default of 1) >> --cpus-per-task 64 (leaving -n to be the default of 1) >> --cpus-per-task 32 -n 2 >> >> As far as I can tell, there is no functional difference. But if there is >> even a subtle difference, I would love to know what it is! >> >> Thanks >> David >> -- >> Sent from Gmail Mobile >> > > > -- > Tom Payerle > DIT-ACIGS/Mid-Atlantic Crossroadspaye...@umd.edu > 5825 University Research Park (301) 405-6135 > University of Maryland > College Park, MD 20740-3831 > -- Sent from Gmail Mobile
Re: [slurm-users] How to open a slurm support case
...it is a bit arcane, but it's not like we're funding lavish lifestyles with our support payments. I would prefer to see a slightly more differentiated support system, but this suffices... On Thu, Mar 24, 2022 at 6:06 PM Sean Crosby wrote: > Hi Jeff, > > The support system is here - https://bugs.schedmd.com/ > > Create an account, log in, and when creating a request, select your site > from the Site selection box. > > Sean > -- > *From:* slurm-users on behalf of > Jeffrey R. Lang > *Sent:* Friday, 25 March 2022 08:48 > *To:* slurm-users@lists.schedmd.com > *Subject:* [EXT] [slurm-users] How to open a slurm support case > > * External email: Please exercise caution * > -- > > Can someone provide me with instructions on how to open a support case > with SchedMD? > > > > We have a support contract, but no where on their website can I find a > link to open a case with them. > > > > Thanks, > > Jeff >
Re: [slurm-users] How to open a slurm support case
Hi Jeff, The support system is here - https://bugs.schedmd.com/ Create an account, log in, and when creating a request, select your site from the Site selection box. Sean From: slurm-users on behalf of Jeffrey R. Lang Sent: Friday, 25 March 2022 08:48 To: slurm-users@lists.schedmd.com Subject: [EXT] [slurm-users] How to open a slurm support case External email: Please exercise caution Can someone provide me with instructions on how to open a support case with SchedMD? We have a support contract, but no where on their website can I find a link to open a case with them. Thanks, Jeff
Re: [slurm-users] How to open a slurm support case
Jeff, I will reach out to you directly. -Jason On Thu, Mar 24, 2022 at 3:51 PM Jeffrey R. Lang wrote: > Can someone provide me with instructions on how to open a support case > with SchedMD? > > > > We have a support contract, but no where on their website can I find a > link to open a case with them. > > > > Thanks, > > Jeff > -- Jason Booth Director of Support, SchedMD LLC Commercial Slurm Development and Support
[slurm-users] How to open a slurm support case
Can someone provide me with instructions on how to open a support case with SchedMD? We have a support contract, but no where on their website can I find a link to open a case with them. Thanks, Jeff
Re: [slurm-users] Question about sbatch options: -n, and --cpus-per-task
Although all three cases ( "-N 1 --cpus-per-task 64 -n 1", "-N 1 --cpus-per-task 1 -n 64", and "-N 1 --cpus-per-task 32 -n 2") will cause Slurm to allocate 64 cores to the job, there can (and will) be differences in the other respects. The variable SLURM_NTASKS will be set to the argument of the -n (aka --ntasks) argument, and other Slurm variables will differ as well. More importantly, as others noted, srun will launch $SLURM_NTASKS processes. The mpirun/mpiexec/etc binaries of most MPI libraries will (if compiled with support for Slurm) act similarly (and indeed, I believe most use srun under the hood). If you are just using sbatch and launching a single process using 64 threads, then the different options are probably equivalent for most intent and purposes. Similar if you are doing a loop to start 64 single threaded processes. But those are simplistic cases, and just happen to "work" even though you are "abusing" the scheduler options. And even the cases wherein it "works" is subject to unexpected failures (e.g. if one substitutes srun for sbatch). The differences are most clear when the -N 1 flag is not given. Generally, SLURM_NTASKS should be the number of MPI or similar tasks you intend to start. By default, it is assumed the tasks can support distributed memory parallelism, so the scheduler by default assumes that it can launch tasks on different nodes (the -N 1 flag you mentioned would override that). Each such task is assumed to need --cpus-per-task cores which the scheduler assumes needs shared memory parallelism (i.e. must be on the same node). So without the -N 1, "--cpus-per-task 64 -n 1" will require 64 cores on a single node, whereas "-n 64 --cpus-per-task 1" can result in the job being assigned 64 cores on a single node to a single core on 64 nodes or any combination in between with 64 cores. The "--cpus-per-task 32 -n 2" will either assign one node with 64 cores or 2 nodes with 32 cores each. As I said, although there are some simple cases where the different cases are mostly functionally equivalent, I would recommend trying to use the proper arguments --- "abusing" the arguments might work for a while but will likely bite you in the end. E.g., the 64 thread case should do "--cpus-per-task 64", and the launching processes in the loop should _probably_ do "-n 64" (assuming it can handle the tasks being assigned to different nodes). On Thu, Mar 24, 2022 at 3:35 PM David Henkemeyer wrote: > Assuming -N is 1 (meaning, this job needs only one node), then is there a > difference between any of these 3 flag combinations: > > -n 64 (leaving cpus-per-task to be the default of 1) > --cpus-per-task 64 (leaving -n to be the default of 1) > --cpus-per-task 32 -n 2 > > As far as I can tell, there is no functional difference. But if there is > even a subtle difference, I would love to know what it is! > > Thanks > David > -- > Sent from Gmail Mobile > -- Tom Payerle DIT-ACIGS/Mid-Atlantic Crossroadspaye...@umd.edu 5825 University Research Park (301) 405-6135 University of Maryland College Park, MD 20740-3831
[slurm-users] Help with failing job execution
My site recently updated to Slurm 21.08.6 and for the most part everything went fine. Two Ubuntu nodes however are having issues.Slurmd cannot execve the jobs on the nodes. As an example: [jrlang@tmgt1 ~]$ salloc -A ARCC --nodes=1 --ntasks=20 -t 1:00:00 --bell --nodelist=mdgx01 --partition=dgx /bin/bash salloc: Granted job allocation 2328489 [jrlang@tmgt1 ~]$ srun hostname srun: error: task 0 launch failed: Slurmd could not execve job srun: error: task 1 launch failed: Slurmd could not execve job srun: error: task 2 launch failed: Slurmd could not execve job srun: error: task 3 launch failed: Slurmd could not execve job srun: error: task 4 launch failed: Slurmd could not execve job srun: error: task 5 launch failed: Slurmd could not execve job srun: error: task 6 launch failed: Slurmd could not execve job srun: error: task 7 launch failed: Slurmd could not execve job srun: error: task 8 launch failed: Slurmd could not execve job srun: error: task 9 launch failed: Slurmd could not execve job srun: error: task 10 launch failed: Slurmd could not execve job srun: error: task 11 launch failed: Slurmd could not execve job srun: error: task 12 launch failed: Slurmd could not execve job srun: error: task 13 launch failed: Slurmd could not execve job srun: error: task 14 launch failed: Slurmd could not execve job srun: error: task 15 launch failed: Slurmd could not execve job srun: error: task 16 launch failed: Slurmd could not execve job srun: error: task 17 launch failed: Slurmd could not execve job srun: error: task 18 launch failed: Slurmd could not execve job srun: error: task 19 launch failed: Slurmd could not execve job Looking in slurmd-mdgx01.log we only see [2022-03-24T14:44:02.408] [2328501.interactive] error: Failed to invoke task plugins: one of task_p_pre_setuid functions returned error [2022-03-24T14:44:02.409] [2328501.interactive] error: job_manager: exiting abnormally: Slurmd could not execve job [2022-03-24T14:44:02.411] [2328501.interactive] done with job Note that this issues didn't occure with Slurm 20.11.8. Any ideas what could be causing the issue, cause I'm stumped? Jeff
Re: [slurm-users] Question about sbatch options: -n, and --cpus-per-task
“ Will launch 64 instances of your application, each bound to a single cpu” This is true for srun, but not for sbatch. A while back, we did an experiment using “hostname” to verify. On Thu, Mar 24, 2022 at 12:47 PM Ralph Castain wrote: > Well, there is indeed a difference - and it is significant. > > > On Mar 24, 2022, at 12:32 PM, David Henkemeyer < > david.henkeme...@gmail.com> wrote: > > > > Assuming -N is 1 (meaning, this job needs only one node), then is there > a difference between any of these 3 flag combinations: > > > > -n 64 (leaving cpus-per-task to be the default of 1) > > Will launch 64 instances of your application, each bound to a single cpu > > > --cpus-per-task 64 (leaving -n to be the default of 1) > > Will run ONE instance of your application (no binding if the node has 64 > cpus - otherwise, the proc will be bound to 64 cpu's) > > > --cpus-per-task 32 -n 2 > > Will run TWO instances of your application, each bound to 32 cpu's > > > > > As far as I can tell, there is no functional difference. But if there is > even a subtle difference, I would love to know what it is! > > > > Thanks > > David > > -- > > Sent from Gmail Mobile > > > -- Sent from Gmail Mobile
Re: [slurm-users] Question about sbatch options: -n, and --cpus-per-task
Well, there is indeed a difference - and it is significant. > On Mar 24, 2022, at 12:32 PM, David Henkemeyer > wrote: > > Assuming -N is 1 (meaning, this job needs only one node), then is there a > difference between any of these 3 flag combinations: > > -n 64 (leaving cpus-per-task to be the default of 1) Will launch 64 instances of your application, each bound to a single cpu > --cpus-per-task 64 (leaving -n to be the default of 1) Will run ONE instance of your application (no binding if the node has 64 cpus - otherwise, the proc will be bound to 64 cpu's) > --cpus-per-task 32 -n 2 Will run TWO instances of your application, each bound to 32 cpu's > > As far as I can tell, there is no functional difference. But if there is even > a subtle difference, I would love to know what it is! > > Thanks > David > -- > Sent from Gmail Mobile
[slurm-users] Question about sbatch options: -n, and --cpus-per-task
Assuming -N is 1 (meaning, this job needs only one node), then is there a difference between any of these 3 flag combinations: -n 64 (leaving cpus-per-task to be the default of 1) --cpus-per-task 64 (leaving -n to be the default of 1) --cpus-per-task 32 -n 2 As far as I can tell, there is no functional difference. But if there is even a subtle difference, I would love to know what it is! Thanks David -- Sent from Gmail Mobile
Re: [slurm-users] Make sacct show short job state codes?
Here is an example command for getting parseable output from sacct of all completed jobs during a specific period of time: $ sacct -p -X -a -S 032322 -E 032422 -o JobID,User,State -s ca,cd,f,to,pr,oom The fields are separated by | and can easily be parsed by awk. Example output: JobID|User|State| 4753873_126|catscr|TIMEOUT| 4753873_129|catscr|TIMEOUT| 4753873_136|catscr|FAILED| I hope this helps. /Ole On 3/24/22 14:47, Brian Andrus wrote: I don't think that is part of sacct options. Feature request maybe. Meanwhile, awk would be your friend here. Just post-process by piping the output to awk and doing the substitutions before printing the output. eg: sacct |awk '{sub("CANCELLED","CA");sub("RUNNING","RU");print}' Just add a 'sub' command for each substitution. It is tedious to setup but will do the trick. You can also specify the specific field to do any substitution on. Brian Andrus On 3/24/2022 6:12 AM, Chip Seraphine wrote: I’m trying to shave a few columns off the output of some sacct output, and while it will happily accept the short codes (e.g. CA instead of CANCELLED) I can’t find a way to get it to report them. Shaving down the columns using %N in –format just results in a truncated version of the long code, which is often not the same thing. Does anyone know if/how this can be done?
Re: [slurm-users] srun and --cpus-per-task
Hi Durai, I see the same thing as you on our test-cluster that has ThreadsPerCore=2 configured in slurm.conf The double-foo goes away with this: srun --cpus-per-task=1 --hint=nomultithread echo foo Having multithreading enabled leads to imho surprising behaviour of Slurm. My impression is that using it makes the concept of "a CPU" in Slurm somewhat fuzzy. It becomes unclear and ambiguous what you get when using the cpu-related options of srun, sbatch and salloc: is it a CPU-core or is it a CPU-thread? I think what you found is a bug. If you run for c in {4..1} do echo "## $c ###" srun -c $c bash -c 'echo $SLURM_CPU_BIND_LIST' done you will get: ## 4 ### 0x003003 ## 3 ### 0x003003 ## 2 ### 0x001001 ## 1 ### 0x01,0x001000 0x01,0x001000 You see: requesting 4 and 3 CPUs results in the same cpu-binding as both need two CPU-cores with 2 threads each. In the "3" case one of it stays unused but of course is not free for another job. In the "1" case I would expect to see the same binding as in the "2" case. If you combine the two values in the list you *do* get the same value but obviously it's a list of two values and this might be the origin of the problem. It is probably related to what's mentioned in the documentation for '--ntasks': "[...] The default is one task per node, but note that the --cpus-per-task option will change this default." Regards Hermann On 3/24/22 1:37 PM, Durai Arasan wrote: Hello Slurm users, We are experiencing strange behavior with srun executing commands twice only when setting --cpus-per-task=1 $ srun --cpus-per-task=1 --partition=gpu-2080ti echo foo srun: job 1298286 queued and waiting for resources srun: job 1298286 has been allocated resources foo foo This is not seen when --cpus-per-task is another value: $ srun --cpus-per-task=3 --partition=gpu-2080ti echo foo srun: job 1298287 queued and waiting for resources srun: job 1298287 has been allocated resources foo Also when specifying --ntasks: $ srun -n1 --cpus-per-task=1 --partition=gpu-2080ti echo foo srun: job 1298288 queued and waiting for resources srun: job 1298288 has been allocated resources foo Relevant slurm.conf settings are: SelectType=select/cons_tres SelectTypeParameters=CR_Core_Memory # example node configuration NodeName=slurm-bm-58 NodeAddr=xxx.xxx.xxx.xxx Procs=72 Sockets=2 CoresPerSocket=18 ThreadsPerCore=2 RealMemory=354566 Gres=gpu:rtx2080ti:8 Feature=xx_v2.38 State=UNKNOWN On closer of job variables in the "--cpus-per-task=1" case, the following variables have wrongly acquired a value of 2 for no reason: SLURM_NTASKS=2 SLURM_NPROCS=2 SLURM_TASKS_PER_NODE=2 SLURM_STEP_NUM_TASKS=2 SLURM_STEP_TASKS_PER_NODE=2 Can you see what could be wrong? Best, Durai
Re: [slurm-users] Make sacct show short job state codes?
I don't think that is part of sacct options. Feature request maybe. Meanwhile, awk would be your friend here. Just post-process by piping the output to awk and doing the substitutions before printing the output. eg: sacct |awk '{sub("CANCELLED","CA");sub("RUNNING","RU");print}' Just add a 'sub' command for each substitution. It is tedious to setup but will do the trick. You can also specify the specific field to do any substitution on. Brian Andrus On 3/24/2022 6:12 AM, Chip Seraphine wrote: I’m trying to shave a few columns off the output of some sacct output, and while it will happily accept the short codes (e.g. CA instead of CANCELLED) I can’t find a way to get it to report them. Shaving down the columns using %N in –format just results in a truncated version of the long code, which is often not the same thing. Does anyone know if/how this can be done? -- Chip Seraphine Linux Admin (Grid) E: cseraph...@drwholdings.com M: 773 412 2608 This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.
Re: [slurm-users] Make sacct show short job state codes?
Hi Chip, Use the sacct -p or --parsable option to get the complete output delimited by | /Ole On 3/24/22 14:12, Chip Seraphine wrote: I’m trying to shave a few columns off the output of some sacct output, and while it will happily accept the short codes (e.g. CA instead of CANCELLED) I can’t find a way to get it to report them. Shaving down the columns using %N in –format just results in a truncated version of the long code, which is often not the same thing. Does anyone know if/how this can be done?
[slurm-users] Make sacct show short job state codes?
I’m trying to shave a few columns off the output of some sacct output, and while it will happily accept the short codes (e.g. CA instead of CANCELLED) I can’t find a way to get it to report them. Shaving down the columns using %N in –format just results in a truncated version of the long code, which is often not the same thing. Does anyone know if/how this can be done? -- Chip Seraphine Linux Admin (Grid) E: cseraph...@drwholdings.com M: 773 412 2608 This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it by mistake. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects.
[slurm-users] srun and --cpus-per-task
Hello Slurm users, We are experiencing strange behavior with srun executing commands twice only when setting --cpus-per-task=1 $ srun --cpus-per-task=1 --partition=gpu-2080ti echo foo srun: job 1298286 queued and waiting for resources srun: job 1298286 has been allocated resources foo foo This is not seen when --cpus-per-task is another value: $ srun --cpus-per-task=3 --partition=gpu-2080ti echo foo srun: job 1298287 queued and waiting for resources srun: job 1298287 has been allocated resources foo Also when specifying --ntasks: $ srun -n1 --cpus-per-task=1 --partition=gpu-2080ti echo foo srun: job 1298288 queued and waiting for resources srun: job 1298288 has been allocated resources foo Relevant slurm.conf settings are: SelectType=select/cons_tres SelectTypeParameters=CR_Core_Memory # example node configuration NodeName=slurm-bm-58 NodeAddr=xxx.xxx.xxx.xxx Procs=72 Sockets=2 CoresPerSocket=18 ThreadsPerCore=2 RealMemory=354566 Gres=gpu:rtx2080ti:8 Feature=xx_v2.38 State=UNKNOWN On closer of job variables in the "--cpus-per-task=1" case, the following variables have wrongly acquired a value of 2 for no reason: SLURM_NTASKS=2 SLURM_NPROCS=2 SLURM_TASKS_PER_NODE=2 SLURM_STEP_NUM_TASKS=2 SLURM_STEP_TASKS_PER_NODE=2 Can you see what could be wrong? Best, Durai
Re: [slurm-users] how to locate the problem when slurm failed to restrict gpu usage of user jobs
cgroups can control access to devices (e.g. /dev/nvidia0), which is how I understand it to work. -Sean On Thu, Mar 24, 2022 at 4:27 AM wrote: > Well, this is indeed the point. We didn’t set *ConstrainDevices=yes *in > cgroup.conf. After adding this, gpu restriction works as expected. > > But what is the relation between gpu restriction and cgroup? I never heard > that cgroup can limit gpu card usage. Isn’t it a feature of cuda or nvidia > driver? > > > > *发件人:* Sean Maxwell > *发送时间:* 2022年3月23日 23:05 > *收件人:* Slurm User Community List > *主题:* Re: [slurm-users] how to locate the problem when slurm failed to > restrict gpu usage of user jobs > > > > Hi, > > > > If you are using cgroups for task/process management, you should verify > that your /etc/slurm/cgroup.conf has the following line: > > > > ConstrainDevices=yes > > > > I'm not sure about the missing environment variable, but the absence of > the above in cgroup.conf is one way the GPU devices can be unconstrained in > the jobs. > > > > -Sean > > > > > > > > On Wed, Mar 23, 2022 at 10:46 AM wrote: > > Hi, all: > > > > We found a problem that slurm job with argument such as *--gres gpu:1 * > didn’t be restricted with gpu usage, user still can see all gpu card on > allocated nodes. > > Our gpu node has 4 cards with their gres.conf to be: > > > cat /etc/slurm/gres.conf > > Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia0 CPUs=0-15 > > Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia1 CPUs=16-31 > > Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia2 CPUs=32-47 > > Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia3 CPUs=48-63 > > > > And for test, we submit simple job batch like: > > #!/bin/bash > > #SBATCH --job-name=test > > #SBATCH --partition=a100 > > #SBATCH --nodes=1 > > #SBATCH --ntasks=6 > > #SBATCH --gres=gpu:1 > > #SBATCH --reservation="gpu test" > > hostname > > nvidia-smi > > echo end > > > > Then in the out file the nvidia-smi showed all 4 gpu cards. But we expect > to see only 1 allocated gpu card. > > > > Official document of slurm said it will set *CUDA_VISIBLE_DEVICES *env > var to restrict the gpu card available to user. But we didn’t find such > variable exists in job environment. We only confirmed it do exist in prolog > script environment by adding debug command “echo $CUDA_VISIBLE_DEVICES” > to slurm prolog script. > > > > So how do slurm co-operate with nvidia tools to make job user only see its > allocated gpu card? What is the requirement on nvidia gpu drivers, CUDA > toolkit or any other part to help slurm correctly restrict the gpu usage? > >
[slurm-users] 答复: how to locate the problem when slurm failed to restrict gpu usage of user jobs
Well, this is indeed the point. We didn’t set ConstrainDevices=yes in cgroup.conf. After adding this, gpu restriction works as expected. But what is the relation between gpu restriction and cgroup? I never heard that cgroup can limit gpu card usage. Isn’t it a feature of cuda or nvidia driver? 发件人: Sean Maxwell 发送时间: 2022年3月23日 23:05 收件人: Slurm User Community List 主题: Re: [slurm-users] how to locate the problem when slurm failed to restrict gpu usage of user jobs Hi, If you are using cgroups for task/process management, you should verify that your /etc/slurm/cgroup.conf has the following line: ConstrainDevices=yes I'm not sure about the missing environment variable, but the absence of the above in cgroup.conf is one way the GPU devices can be unconstrained in the jobs. -Sean On Wed, Mar 23, 2022 at 10:46 AM mailto:taleinterve...@sjtu.edu.cn> > wrote: Hi, all: We found a problem that slurm job with argument such as --gres gpu:1 didn’t be restricted with gpu usage, user still can see all gpu card on allocated nodes. Our gpu node has 4 cards with their gres.conf to be: > cat /etc/slurm/gres.conf Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia0 CPUs=0-15 Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia1 CPUs=16-31 Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia2 CPUs=32-47 Name=gpu Type=NVlink_A100_40GB File=/dev/nvidia3 CPUs=48-63 And for test, we submit simple job batch like: #!/bin/bash #SBATCH --job-name=test #SBATCH --partition=a100 #SBATCH --nodes=1 #SBATCH --ntasks=6 #SBATCH --gres=gpu:1 #SBATCH --reservation="gpu test" hostname nvidia-smi echo end Then in the out file the nvidia-smi showed all 4 gpu cards. But we expect to see only 1 allocated gpu card. Official document of slurm said it will set CUDA_VISIBLE_DEVICES env var to restrict the gpu card available to user. But we didn’t find such variable exists in job environment. We only confirmed it do exist in prolog script environment by adding debug command “echo $CUDA_VISIBLE_DEVICES” to slurm prolog script. So how do slurm co-operate with nvidia tools to make job user only see its allocated gpu card? What is the requirement on nvidia gpu drivers, CUDA toolkit or any other part to help slurm correctly restrict the gpu usage?