[slurm-users] Correct way to do logrotation

2023-10-16 Thread Taras Shapovalov
Hello, In the past it was recommended to reconfigure slurm daemons in logrotate script, sending a signal I believe was also the way to go. But recently I retested manual logrotation and I see that a removal of log file (for slurmctld, slurmdbd or slurmd) does not affect the logging of the daemo

[slurm-users] Two gpu types on one node: gres/gpu count reported lower than configured (1 < 5)

2023-10-16 Thread Gregor Hagelueken
Hi, We have a ubuntu server (22.04) with currently 5 GPUs (1 x l40 and 4 x rtx_a5000). I am trying to configure slurm such that a user can select either the l40 or a5000 gpus for a particular job. I have configured my slurm.conf and gres.conf files similar as in this old thread: https://groups.

Re: [slurm-users] Two gpu types on one node: gres/gpu count reported lower than configured (1 < 5)

2023-10-16 Thread Feng Zhang
Try scontrol update NodeName=heimdall state=DOWN Reason="gpu issue" and then scontrol update NodeName=heimdall state=RESUME to see if it will work. Probably just SLURM daemon having a hiccup after you made changes. Best, Feng On Mon, Oct 16, 2023 at 10:43 AM Gregor Hagelueken wrote: > > Hi,

Re: [slurm-users] Slurm versions 23.02.6 and 22.05.10 are now available (CVE-2023-41914)

2023-10-16 Thread Groner, Rob
It is my understanding that it is a different issue than pmix. So to be fully protected, you would need to build the latest/fixed pmix and rebuild slurm using that (or just keep pmix disabled), AND have this latest version of slurm with their fix for their own vulnerability. Rob _

Re: [slurm-users] Site factor plugin example?

2023-10-16 Thread Reed Dier
Hi Angel and Loris, I hope this will be of at least some help, as I was tasked with trying to get site factor implemented in our cluster for the sake of making conformant, predictable priority values that were “pretty” and round, and I was not able to find any good documentation for it either.

Re: [slurm-users] Slurm versions 23.02.6 and 22.05.10 are now available (CVE-2023-41914)

2023-10-16 Thread Christopher Samuel
On 10/16/23 08:22, Groner, Rob wrote: It is my understanding that it is a different issue than pmix. That's my understanding too. The PMIx issue wasn't in Slurm, it was in the PMIx code that Slurm was linked to. This CVE is for Slurm itself. -- Chris Samuel : http://www.csamuel.org/ : B

Re: [slurm-users] Slurm versions 23.02.6 and 22.05.10 are now available (CVE-2023-41914)

2023-10-16 Thread Kilian Cavalotti
Those CVEs are indeed for different software (one for PMIx, one for Slurm), even though they're ultimately for the same kind of underlying problem (chown() being used instead of lchown(), which could lead in taking over privileged files). The Slurm patches include more fixes related to permissions