Re: [slurm-users] [EXT] rejecting jobs that exceed QOS limits
Ah, should have found that. Thanks. On Sat, 29 May 2021 12:08am, Sean Crosby wrote: Hi Paul, Try sacctmgr modify qos gputest set flags=DenyOnLimit Sean From: slurm-users on behalf of Paul Raines Sent: Saturday, 29 May 2021 12:48 To: slurm-users@lists.schedmd.com Subject: [EXT] [slurm-users] rejecting jobs that exceed QOS limits External email: Please exercise caution I want to dedicate one of our GPU servers for testing where users are only allowed to run 1 job at a time using 1 GPU and 8 cores of the server. So I put one server in a partition on its own and setup a QOS for it as follows: sacctmgr add qos gputest sacctmgr modify qos gputest set priority=20 sacctmgr modify qos gputest set MaxJobsPerUser=1 sacctmgr modify qos gputest set MaxTRESPerUser=cpu=8,gres/gpu=1 sacctmgr show qos format=name,priority,MaxTRESPerUser,MaxJobsPerUser In slurm.conf I have: AccountingStorageEnforce=safe,qos AccountingStorageTRES=Billing,CPU,Energy,Mem,Node,FS/Disk,Pages,VMem,gres/gpu EnforcePartLimits=ALL This works but when I submit a job asking for 2 more more GPUs, instead of being immediate rejected it queues but never runs. Same if I ask for more than 8 cores Is there a way to get it immediately rejected?
Re: [slurm-users] [EXT] rejecting jobs that exceed QOS limits
Hi Paul, Try sacctmgr modify qos gputest set flags=DenyOnLimit Sean From: slurm-users on behalf of Paul Raines Sent: Saturday, 29 May 2021 12:48 To: slurm-users@lists.schedmd.com Subject: [EXT] [slurm-users] rejecting jobs that exceed QOS limits External email: Please exercise caution I want to dedicate one of our GPU servers for testing where users are only allowed to run 1 job at a time using 1 GPU and 8 cores of the server. So I put one server in a partition on its own and setup a QOS for it as follows: sacctmgr add qos gputest sacctmgr modify qos gputest set priority=20 sacctmgr modify qos gputest set MaxJobsPerUser=1 sacctmgr modify qos gputest set MaxTRESPerUser=cpu=8,gres/gpu=1 sacctmgr show qos format=name,priority,MaxTRESPerUser,MaxJobsPerUser In slurm.conf I have: AccountingStorageEnforce=safe,qos AccountingStorageTRES=Billing,CPU,Energy,Mem,Node,FS/Disk,Pages,VMem,gres/gpu EnforcePartLimits=ALL This works but when I submit a job asking for 2 more more GPUs, instead of being immediate rejected it queues but never runs. Same if I ask for more than 8 cores Is there a way to get it immediately rejected?