Re: [slurm-users] cgroupv2 + slurmd - external cgroup changes needed to get daemon to start

2023-07-14 Thread Williams, Jenny Avis
Thanks, Herman, for the feedback.

My reason for posting was to request some inspection of the systemd file for 
slurmd such that this "nudging" would not be necessary.

I'd like to explore that a little more -- it looks like cgroupsv2 cpusets are 
working for us in this configuration, except for having to "nudge" the daemon 
to start with the steps originally listed.  

This document from RedHat explicitly describes enabling cpusets under cgroupsv2 
under rhel 8 -- this at least appears to be working in our configuration.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/using-cgroups-v2-to-control-distribution-of-cpu-time-for-applications_managing-monitoring-and-updating-the-kernel

This document is were I got the steps to get the daemon working and cpusets 
enabled.  I've checked the contents of job_*/cpuset.cpus under /s

Regards,
Jenny 


-Original Message-
From: slurm-users  On Behalf Of Hermann 
Schwärzler
Sent: Thursday, July 13, 2023 6:45 AM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] cgroupv2 + slurmd - external cgroup changes needed 
to get daemon to start

Hi Jenny,

ok, I see. You are using the exact same Slurm version and a very similar OS 
version/distribution as we do.

You have to consider that cpuset support is not available in cgroup/v2 in 
kernel versions below 5.2 (see "Cgroups v2 controllers" in "man cgroups" on 
your system). So some of the warnings/errors you see - at least "Controller 
cpuset is not enabled" - is expected (and slurmd should start nevertheless).
This btw is one of the reasons why we stick with cgroup/v1 for the time being.

We did some tests with cgroups/v2 and in our case slurmd started with no 
problems (except the error/warning regarding the cpuset controller). But we 
have a slightly different configuration. You use
JobAcctGatherType   = jobacct_gather/cgroup
ProctrackType   = proctrack/cgroup
TaskPlugin  = cgroup,affinity
CgroupPlugin= cgroup/v2

We use for the respective settings:
JobAcctGatherType   = jobacct_gather/linux
ProctrackType   = proctrack/cgroup
TaskPlugin  = task/affinity,task/cgroup
CgroupPlugin= (null) - i.e. we don't set that one in cgroup.conf

Maybe using the same settings as we do helps in your case?
Please be aware that you should change JobAcctGatherType only when there are no 
running job steps!

Regards,
Hermann


On 7/12/23 16:50, Williams, Jenny Avis wrote:
> The systems have only cgroup/v2 enabled
>   # mount |egrep cgroup
>   cgroup2 on /sys/fs/cgroup type cgroup2 
> (rw,nosuid,nodev,noexec,relatime,nsdelegate)
> Distribution and kernel
>   RedHat 8.7
>   4.18.0-348.2.1.el8_5.x86_64
> 
> 
> 
> -Original Message-
> From: slurm-users  On Behalf Of 
> Hermann Schwärzler
> Sent: Wednesday, July 12, 2023 4:36 AM
> To: slurm-users@lists.schedmd.com
> Subject: Re: [slurm-users] cgroupv2 + slurmd - external cgroup changes 
> needed to get daemon to start
> 
> Hi Jenny,
> 
> I *guess* you have a system that has both cgroup/v1 and cgroup/v2 enabled.
> 
> Which Linux distribution are you using? And which kernel version?
> What is the output of
> mount | grep cgroup
> What if you do not restrict the cgroup-version Slurm can use to
> cgroup/v2 but omit "CgroupPlugin=..." from your cgroup.conf?
> 
> Regards,
> Hermann
> 
> On 7/11/23 19:41, Williams, Jenny Avis wrote:
>> Additional configuration information -- /etc/slurm/cgroup.conf
>>
>> CgroupAutomount=yes
>>
>> ConstrainCores=yes
>>
>> ConstrainRAMSpace=yes
>>
>> CgroupPlugin=cgroup/v2
>>
>> AllowedSwapSpace=1
>>
>> ConstrainSwapSpace=yes
>>
>> ConstrainDevices=yes
>>
>> *From:* Williams, Jenny Avis
>> *Sent:* Tuesday, July 11, 2023 10:47 AM
>> *To:* slurm-us...@schedmd.com
>> *Subject:* cgroupv2 + slurmd - external cgroup changes needed to get 
>> daemon to start
>>
>> Progress on getting slurmd to start under cgroupv2
>>
>> Issue: slurmd 22.05.6 will not start when using cgroupv2
>>
>> Expected result: even after reboot slurmd will start up without 
>> needing to manually add lines to /sys/fs/cgroup files.
>>
>> When started as service the error is:
>>
>> # systemctl status slurmd
>>
>> * slurmd.service - Slurm node daemon
>>
>>      Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; 
>> vendor preset: disabled)
>>
>>     Drop-In: /etc/systemd/system/slurmd.service.d
>>
>>      `-extendUnit.conf
>>
>>      Active: failed (Result: exit-code) since Tue 2023-07-11 10:29:23 
>> EDT; 2s ago
>>
>>     Process: 11395 ExecStart=/usr/sbin/slurmd -D -s $SLURMD_OPTIONS 
>> (code=exited, status=1/FAILURE)
>>
>> Main PID: 11395 (code=exited, status=1/FAILURE)
>>
>> Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: Started Slurm node 
>> daemon.
>>
>> Jul 11 10:29:23 g1803jles01.ll.unc.edu slurmd[11395]: slurmd: slurmd 
>> version 22.05.6 started
>>
>> Jul 11 10:29:23 

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Wilson, Steven M
I haven't seen anything that allows for disabling a defined Gres device. It 
does seem to work if I define the GPUs that I don't want to use and then 
specifically submit jobs to the other GPUs using --gpu like 
"--gpu=gpu:rtx_2080_ti:1". I suppose if I set the GPU Type to be "COMPUTE" for 
the GPUs I want to use for computing and "UNUSED" for those that I don't, this 
scheme might work (e.g., --gpu=gpu:COMPUTE:3). But then every job submission 
would be required to have this option set. Not a very workable solution.

Thanks!
Steve

From: slurm-users  on behalf of Feng 
Zhang 
Sent: Friday, July 14, 2023 3:09 PM
To: Slurm User Community List 
Subject: Re: [slurm-users] Unconfigured GPUs being allocated

[Some people who received this message don't often get email from 
prod.f...@gmail.com. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]

 External Email: Use caution with attachments, links, or sharing data 


Very interesting issue.

I am guessing there might be a workaround: SInce oryx has 2 gpus
instead, you can define both of them, but disable the GT 710? Does
Slurm support this?

Best,

Feng

Best,

Feng


On Tue, Jun 27, 2023 at 9:54 AM Wilson, Steven M  wrote:
>
> Hi,
>
> I manually configure the GPUs in our Slurm configuration (AutoDetect=off in 
> gres.conf) and everything works fine when all the GPUs in a node are 
> configured in gres.conf and available to Slurm.  But we have some nodes where 
> a GPU is reserved for running the display and is specifically not configured 
> in gres.conf.  In these cases, Slurm includes this unconfigured GPU and makes 
> it available to Slurm jobs.  Using a simple Slurm job that executes 
> "nvidia-smi -L", it will display the unconfigured GPU along with as many 
> configured GPUs as requested by the job.
>
> For example, in a node configured with this line in slurm.conf:
> NodeName=oryx CoreSpecCount=2 CPUs=8 RealMemory=64000 Gres=gpu:RTX2080TI:1
> and this line in gres.conf:
> Nodename=oryx Name=gpu Type=RTX2080TI File=/dev/nvidia1
> I will get the following results from a job running "nvidia-smi -L" that 
> requested a single GPU:
> GPU 0: NVIDIA GeForce GT 710 (UUID: 
> GPU-21fe15f0-d8b9-b39e-8ada-8c1c8fba8a1e)
> GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: 
> GPU-0dc4da58-5026-6173-1156-c4559a268bf5)
>
> But in another node that has all GPUs configured in Slurm like this in 
> slurm.conf:
> NodeName=beluga CoreSpecCount=1 CPUs=16 RealMemory=128500 
> Gres=gpu:TITANX:2
> and this line in gres.conf:
> Nodename=beluga Name=gpu Type=TITANX File=/dev/nvidia[0-1]
> I get the expected results from the job running "nvidia-smi -L" that 
> requested a single GPU:
> GPU 0: NVIDIA RTX A5500 (UUID: GPU-3754c069-799e-2027-9fbb-ff90e2e8e459)
>
> I'm running Slurm 22.05.5.
>
> Thanks in advance for any suggestions to help correct this problem!
>
> Steve



Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Wilson, Steven M
It's not so much whether a job may or may not access the GPU but rather which 
GPU(s) is(are) included in $CUDA_VISIBLE_DEVICES. That is what controls what 
our CUDA jobs can see and therefore use (within any cgroups constraints, of 
course). In my case, Slurm is sometimes setting $CUDA_VISIBLE_DEVICES to a GPU 
that is not in the Slurm configuration because it is intended only for driving 
the display and not GPU computations.

Thanks for your thoughts!

Steve

From: slurm-users  on behalf of 
Christopher Samuel 
Sent: Friday, July 14, 2023 1:57 PM
To: slurm-users@lists.schedmd.com 
Subject: Re: [slurm-users] Unconfigured GPUs being allocated

[You don't often get email from ch...@csamuel.org. Learn why this is important 
at https://aka.ms/LearnAboutSenderIdentification ]

 External Email: Use caution with attachments, links, or sharing data 


On 7/14/23 10:20 am, Wilson, Steven M wrote:

> I upgraded Slurm to 23.02.3 but I'm still running into the same problem.
> Unconfigured GPUs (those absent from gres.conf and slurm.conf) are still
> being made available to jobs so we end up with compute jobs being run on
> GPUs which should only be used

I think this is expected - it's not that Slurm is making them available,
it's that it's unaware of them and so doesn't control them in the way it
does for the GPUs it does know about. So you get the default behaviour
(any process can access them).

If you want to stop them being accessed from Slurm you'd need to find a
way to prevent that access via cgroups games or similar.

All the best,
Chris
--
Chris Samuel  :  
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F=05%7C01%7Cstevew%40purdue.edu%7C6fba97485b73413521d208db8494160a%7C4130bd397c53419cb1e58758d6d63f21%7C0%7C0%7C638249543794377751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=VslW51ree1Ibt3xfYyy99Aj%2BREZh7BqpM6Ipg3jAM84%3D=0
  :  Berkeley, CA, USA




Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Feng Zhang
Very interesting issue.

I am guessing there might be a workaround: SInce oryx has 2 gpus
instead, you can define both of them, but disable the GT 710? Does
Slurm support this?

Best,

Feng

Best,

Feng


On Tue, Jun 27, 2023 at 9:54 AM Wilson, Steven M  wrote:
>
> Hi,
>
> I manually configure the GPUs in our Slurm configuration (AutoDetect=off in 
> gres.conf) and everything works fine when all the GPUs in a node are 
> configured in gres.conf and available to Slurm.  But we have some nodes where 
> a GPU is reserved for running the display and is specifically not configured 
> in gres.conf.  In these cases, Slurm includes this unconfigured GPU and makes 
> it available to Slurm jobs.  Using a simple Slurm job that executes 
> "nvidia-smi -L", it will display the unconfigured GPU along with as many 
> configured GPUs as requested by the job.
>
> For example, in a node configured with this line in slurm.conf:
> NodeName=oryx CoreSpecCount=2 CPUs=8 RealMemory=64000 Gres=gpu:RTX2080TI:1
> and this line in gres.conf:
> Nodename=oryx Name=gpu Type=RTX2080TI File=/dev/nvidia1
> I will get the following results from a job running "nvidia-smi -L" that 
> requested a single GPU:
> GPU 0: NVIDIA GeForce GT 710 (UUID: 
> GPU-21fe15f0-d8b9-b39e-8ada-8c1c8fba8a1e)
> GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: 
> GPU-0dc4da58-5026-6173-1156-c4559a268bf5)
>
> But in another node that has all GPUs configured in Slurm like this in 
> slurm.conf:
> NodeName=beluga CoreSpecCount=1 CPUs=16 RealMemory=128500 
> Gres=gpu:TITANX:2
> and this line in gres.conf:
> Nodename=beluga Name=gpu Type=TITANX File=/dev/nvidia[0-1]
> I get the expected results from the job running "nvidia-smi -L" that 
> requested a single GPU:
> GPU 0: NVIDIA RTX A5500 (UUID: GPU-3754c069-799e-2027-9fbb-ff90e2e8e459)
>
> I'm running Slurm 22.05.5.
>
> Thanks in advance for any suggestions to help correct this problem!
>
> Steve



Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Christopher Samuel

On 7/14/23 10:20 am, Wilson, Steven M wrote:

I upgraded Slurm to 23.02.3 but I'm still running into the same problem. 
Unconfigured GPUs (those absent from gres.conf and slurm.conf) are still 
being made available to jobs so we end up with compute jobs being run on 
GPUs which should only be used


I think this is expected - it's not that Slurm is making them available, 
it's that it's unaware of them and so doesn't control them in the way it 
does for the GPUs it does know about. So you get the default behaviour 
(any process can access them).


If you want to stop them being accessed from Slurm you'd need to find a 
way to prevent that access via cgroups games or similar.


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Wilson, Steven M
I upgraded Slurm to 23.02.3 but I'm still running into the same problem. 
Unconfigured GPUs (those absent from gres.conf and slurm.conf) are still being 
made available to jobs so we end up with compute jobs being run on GPUs which 
should only be used

Any ideas?

Thanks,
Steve

From: Wilson, Steven M
Sent: Tuesday, June 27, 2023 9:50 AM
To: slurm-users@lists.schedmd.com 
Subject: Unconfigured GPUs being allocated

Hi,

I manually configure the GPUs in our Slurm configuration (AutoDetect=off in 
gres.conf) and everything works fine when all the GPUs in a node are configured 
in gres.conf and available to Slurm.  But we have some nodes where a GPU is 
reserved for running the display and is specifically not configured in 
gres.conf.  In these cases, Slurm includes this unconfigured GPU and makes it 
available to Slurm jobs.  Using a simple Slurm job that executes "nvidia-smi 
-L", it will display the unconfigured GPU along with as many configured GPUs as 
requested by the job.

For example, in a node configured with this line in slurm.conf:
NodeName=oryx CoreSpecCount=2 CPUs=8 RealMemory=64000 Gres=gpu:RTX2080TI:1
and this line in gres.conf:
Nodename=oryx Name=gpu Type=RTX2080TI File=/dev/nvidia1
I will get the following results from a job running "nvidia-smi -L" that 
requested a single GPU:
GPU 0: NVIDIA GeForce GT 710 (UUID: 
GPU-21fe15f0-d8b9-b39e-8ada-8c1c8fba8a1e)
GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: 
GPU-0dc4da58-5026-6173-1156-c4559a268bf5)

But in another node that has all GPUs configured in Slurm like this in 
slurm.conf:
NodeName=beluga CoreSpecCount=1 CPUs=16 RealMemory=128500 Gres=gpu:TITANX:2
and this line in gres.conf:
Nodename=beluga Name=gpu Type=TITANX File=/dev/nvidia[0-1]
I get the expected results from the job running "nvidia-smi -L" that requested 
a single GPU:
GPU 0: NVIDIA RTX A5500 (UUID: GPU-3754c069-799e-2027-9fbb-ff90e2e8e459)

I'm running Slurm 22.05.5.

Thanks in advance for any suggestions to help correct this problem!

Steve