Re: [slurm-users] cgroupv2 + slurmd - external cgroup changes needed to get daemon to start

2023-07-14 Thread Williams, Jenny Avis
Thanks, Herman, for the feedback. My reason for posting was to request some inspection of the systemd file for slurmd such that this "nudging" would not be necessary. I'd like to explore that a little more -- it looks like cgroupsv2 cpusets are working for us in this configuration, except for

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Wilson, Steven M
I haven't seen anything that allows for disabling a defined Gres device. It does seem to work if I define the GPUs that I don't want to use and then specifically submit jobs to the other GPUs using --gpu like "--gpu=gpu:rtx_2080_ti:1". I suppose if I set the GPU Type to be "COMPUTE" for the

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Wilson, Steven M
It's not so much whether a job may or may not access the GPU but rather which GPU(s) is(are) included in $CUDA_VISIBLE_DEVICES. That is what controls what our CUDA jobs can see and therefore use (within any cgroups constraints, of course). In my case, Slurm is sometimes setting

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Feng Zhang
Very interesting issue. I am guessing there might be a workaround: SInce oryx has 2 gpus instead, you can define both of them, but disable the GT 710? Does Slurm support this? Best, Feng Best, Feng On Tue, Jun 27, 2023 at 9:54 AM Wilson, Steven M wrote: > > Hi, > > I manually configure the

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Christopher Samuel
On 7/14/23 10:20 am, Wilson, Steven M wrote: I upgraded Slurm to 23.02.3 but I'm still running into the same problem. Unconfigured GPUs (those absent from gres.conf and slurm.conf) are still being made available to jobs so we end up with compute jobs being run on GPUs which should only be

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Wilson, Steven M
I upgraded Slurm to 23.02.3 but I'm still running into the same problem. Unconfigured GPUs (those absent from gres.conf and slurm.conf) are still being made available to jobs so we end up with compute jobs being run on GPUs which should only be used Any ideas? Thanks, Steve