Re: [slurm-users] Unconfigured GPUs being allocated

2023-08-02 Thread Christopher Samuel
On 7/14/23 1:10 pm, Wilson, Steven M wrote: It's not so much whether a job may or may not access the GPU but rather which GPU(s) is(are) included in $CUDA_VISIBLE_DEVICES. That is what controls what our CUDA jobs can see and therefore use (within any cgroups constraints, of course). In my

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-19 Thread Wilson, Steven M
, 2023 5:32 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Unconfigured GPUs being allocated Further testing and looking at the source code confirms what looks to me like a bug in Slurm. GPUs that are not configured in gres.conf are detected by slurmd in the system and discarded

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-18 Thread Wilson, Steven M
ffect upon the actual environment of the job. Steve From: Wilson, Steven M Sent: Friday, July 14, 2023 4:10 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Unconfigured GPUs being allocated It's not so much whether a job may or may not access the GPU

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Wilson, Steven M
From: slurm-users on behalf of Feng Zhang Sent: Friday, July 14, 2023 3:09 PM To: Slurm User Community List Subject: Re: [slurm-users] Unconfigured GPUs being allocated [Some people who received this message don't often get email from prod.f...@gmail.com. Learn why this i

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Wilson, Steven M
-users@lists.schedmd.com Subject: Re: [slurm-users] Unconfigured GPUs being allocated [You don't often get email from ch...@csamuel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] External Email: Use caution with attachments, links, or sharing data

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Feng Zhang
Very interesting issue. I am guessing there might be a workaround: SInce oryx has 2 gpus instead, you can define both of them, but disable the GT 710? Does Slurm support this? Best, Feng Best, Feng On Tue, Jun 27, 2023 at 9:54 AM Wilson, Steven M wrote: > > Hi, > > I manually configure the

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Christopher Samuel
On 7/14/23 10:20 am, Wilson, Steven M wrote: I upgraded Slurm to 23.02.3 but I'm still running into the same problem. Unconfigured GPUs (those absent from gres.conf and slurm.conf) are still being made available to jobs so we end up with compute jobs being run on GPUs which should only be

Re: [slurm-users] Unconfigured GPUs being allocated

2023-07-14 Thread Wilson, Steven M
I upgraded Slurm to 23.02.3 but I'm still running into the same problem. Unconfigured GPUs (those absent from gres.conf and slurm.conf) are still being made available to jobs so we end up with compute jobs being run on GPUs which should only be used Any ideas? Thanks, Steve

[slurm-users] Unconfigured GPUs being allocated

2023-06-27 Thread Wilson, Steven M
Hi, I manually configure the GPUs in our Slurm configuration (AutoDetect=off in gres.conf) and everything works fine when all the GPUs in a node are configured in gres.conf and available to Slurm. But we have some nodes where a GPU is reserved for running the display and is specifically not