Re: [slurm-users] [EXT] rejecting jobs that exceed QOS limits

2021-05-29 Thread Paul Raines



Ah, should have found that.  Thanks.


On Sat, 29 May 2021 12:08am, Sean Crosby wrote:


Hi Paul,

Try

sacctmgr modify qos gputest set flags=DenyOnLimit

Sean

From: slurm-users  on behalf of Paul Raines 

Sent: Saturday, 29 May 2021 12:48
To: slurm-users@lists.schedmd.com 
Subject: [EXT] [slurm-users] rejecting jobs that exceed QOS limits

External email: Please exercise caution


I want to dedicate one of our GPU servers for testing where
users are only allowed to run 1 job at a time using 1 GPU and
8 cores of the server.  So I put one server in a partition on its
own and setup a QOS for it as follows:

 sacctmgr add qos gputest
 sacctmgr modify qos gputest set priority=20
 sacctmgr modify qos gputest set MaxJobsPerUser=1
 sacctmgr modify qos gputest set MaxTRESPerUser=cpu=8,gres/gpu=1
 sacctmgr show qos format=name,priority,MaxTRESPerUser,MaxJobsPerUser

In slurm.conf I have:

AccountingStorageEnforce=safe,qos
AccountingStorageTRES=Billing,CPU,Energy,Mem,Node,FS/Disk,Pages,VMem,gres/gpu
EnforcePartLimits=ALL


This works but when I submit a job asking for 2 more more GPUs, instead
of being immediate rejected it queues but never runs. Same if I
ask for more than 8 cores

Is there a way to get it immediately rejected?






Re: [slurm-users] [EXT] rejecting jobs that exceed QOS limits

2021-05-28 Thread Sean Crosby
Hi Paul,

Try

sacctmgr modify qos gputest set flags=DenyOnLimit

Sean

From: slurm-users  on behalf of Paul 
Raines 
Sent: Saturday, 29 May 2021 12:48
To: slurm-users@lists.schedmd.com 
Subject: [EXT] [slurm-users] rejecting jobs that exceed QOS limits

External email: Please exercise caution


I want to dedicate one of our GPU servers for testing where
users are only allowed to run 1 job at a time using 1 GPU and
8 cores of the server.  So I put one server in a partition on its
own and setup a QOS for it as follows:

  sacctmgr add qos gputest
  sacctmgr modify qos gputest set priority=20
  sacctmgr modify qos gputest set MaxJobsPerUser=1
  sacctmgr modify qos gputest set MaxTRESPerUser=cpu=8,gres/gpu=1
  sacctmgr show qos format=name,priority,MaxTRESPerUser,MaxJobsPerUser

In slurm.conf I have:

AccountingStorageEnforce=safe,qos
AccountingStorageTRES=Billing,CPU,Energy,Mem,Node,FS/Disk,Pages,VMem,gres/gpu
EnforcePartLimits=ALL


This works but when I submit a job asking for 2 more more GPUs, instead
of being immediate rejected it queues but never runs. Same if I
ask for more than 8 cores

Is there a way to get it immediately rejected?