Re: [slurm-users] PreemptExemptTime
Could you give me an idea of what your partition and qos settings are? I've tried the following and I'm getting odd results: slurm.conf PreemptType: preempt/qos PreemptMode: 'SUSPEND,GANG' PreemptExemptTime: '00:00:00' Partitions: PartitionName=DEFAULT OverSubscribe=FORCE:1 Nodes=slurm[2-4] PartitionName=active Default=YES QOS=normal PartitionName=hipri Default=NO QOS=expedite QOS: sacctmgr -i modify qos where name=normal set PreemptExemptTime=00:03:00 PreemptMode=SUSPEND sacctmgr -i modify qos where name=expedite set PreemptExemptTime=-1 PreemptMode=OFF I took these settings directly from the google group I linked before, and I'm seeing what he's seeing...that no preemption happens. What I see is that, even if a job from "active" is already running, slurm will let me submit jobs from hipri to take up resources as though the active job wasn't there. In other words, there's some time sharing going on, even though the jobs are different partitions. The online docs indicate that jobs from different partitions should NOT be time sharing. I also do not see the "active" job getting preempted once it has run through its preempexempttime. It never gets preempted. Thanks. Rob From: slurm-users on behalf of Christopher Samuel Sent: Tuesday, March 7, 2023 3:40 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] PreemptExemptTime On 3/7/23 6:46 am, Groner, Rob wrote: > Over global settings are PreemptMode=SUSPEND,GANG and > PreemptType=preempt/partition_prio. We have a high priority partition > that nothing should ever preempt, and an open partition that is always > preemptable. In between is a burst partition. It can be preempted if > the high priority partition needs the resources. That's the partition > we'd like to guarantee a 1 hour run time on. Looking at the sacctmgr > man page, it gives this info on QOS Just a quick comment, here you're talking about both partitions and QOS's with respect to preemption, I think for this you need to pick just one of those options and only use those configs. For instance we just use QOS's for preemption and our exempt time works in that case. Hope this helps! All the best, Chris -- Chris Samuel : https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F=05%7C01%7Crug262%40psu.edu%7C0ed8f546e70843e7266208db1f4c5471%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638138184919726959%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=md1MaA%2FIj%2Fb%2B27e7JLkVwSY5IUSVnsJIdx6PADl0JF4%3D=0 : Berkeley, CA, USA
Re: [slurm-users] PreemptExemptTime
>From what I'm reading in the man pages, it seems like PreempExemptTime isn't >compatible with suspending jobs instead of requeue/cancelling them, no matter >if at the partition or qos level. Am I reading that correctly? We currently >give users the option of submitting to a partition that lets their job be >suspended, or to one where they're requeued. Do we have to give that up to >offer a min run time on jobs (for just one partition/qos)? Rob From: slurm-users on behalf of Christopher Samuel Sent: Tuesday, March 7, 2023 3:40 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] PreemptExemptTime On 3/7/23 6:46 am, Groner, Rob wrote: > Over global settings are PreemptMode=SUSPEND,GANG and > PreemptType=preempt/partition_prio. We have a high priority partition > that nothing should ever preempt, and an open partition that is always > preemptable. In between is a burst partition. It can be preempted if > the high priority partition needs the resources. That's the partition > we'd like to guarantee a 1 hour run time on. Looking at the sacctmgr > man page, it gives this info on QOS Just a quick comment, here you're talking about both partitions and QOS's with respect to preemption, I think for this you need to pick just one of those options and only use those configs. For instance we just use QOS's for preemption and our exempt time works in that case. Hope this helps! All the best, Chris -- Chris Samuel : https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F=05%7C01%7Crug262%40psu.edu%7C0ed8f546e70843e7266208db1f4c5471%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638138184919726959%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=md1MaA%2FIj%2Fb%2B27e7JLkVwSY5IUSVnsJIdx6PADl0JF4%3D=0 : Berkeley, CA, USA
Re: [slurm-users] PreemptExemptTime
Ya, it did seem like there was quite a bit of overlap between what partitions could do and what qos could do. So, choose one or the other to accomplish our preemption/suspension goals? I'll see if I can do that. I think I'll look at qos, because partitions don't have the preempt exempt time (unless that comes from the global setting). Thanks. Rob From: slurm-users on behalf of Christopher Samuel Sent: Tuesday, March 7, 2023 3:40 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] PreemptExemptTime On 3/7/23 6:46 am, Groner, Rob wrote: > Over global settings are PreemptMode=SUSPEND,GANG and > PreemptType=preempt/partition_prio. We have a high priority partition > that nothing should ever preempt, and an open partition that is always > preemptable. In between is a burst partition. It can be preempted if > the high priority partition needs the resources. That's the partition > we'd like to guarantee a 1 hour run time on. Looking at the sacctmgr > man page, it gives this info on QOS Just a quick comment, here you're talking about both partitions and QOS's with respect to preemption, I think for this you need to pick just one of those options and only use those configs. For instance we just use QOS's for preemption and our exempt time works in that case. Hope this helps! All the best, Chris -- Chris Samuel : https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F=05%7C01%7Crug262%40psu.edu%7C0ed8f546e70843e7266208db1f4c5471%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638138184919726959%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=md1MaA%2FIj%2Fb%2B27e7JLkVwSY5IUSVnsJIdx6PADl0JF4%3D=0 : Berkeley, CA, USA
Re: [slurm-users] PreemptExemptTime
On 3/7/23 6:46 am, Groner, Rob wrote: Over global settings are PreemptMode=SUSPEND,GANG and PreemptType=preempt/partition_prio. We have a high priority partition that nothing should ever preempt, and an open partition that is always preemptable. In between is a burst partition. It can be preempted if the high priority partition needs the resources. That's the partition we'd like to guarantee a 1 hour run time on. Looking at the sacctmgr man page, it gives this info on QOS Just a quick comment, here you're talking about both partitions and QOS's with respect to preemption, I think for this you need to pick just one of those options and only use those configs. For instance we just use QOS's for preemption and our exempt time works in that case. Hope this helps! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
[slurm-users] PreemptExemptTime
I found a thread about this topic that's a year old and at that time seemed to give no hope, I'm just wondering if the situation has changed. My testing so far isn't encouraging. In the thread (here: https://groups.google.com/g/slurm-users/c/yhnSVBoohik) it talks about wanting to give lower priority jobs some amount of guaranteed run time. That's what we're trying to do. Over global settings are PreemptMode=SUSPEND,GANG and PreemptType=preempt/partition_prio. We have a high priority partition that nothing should ever preempt, and an open partition that is always preemptable. In between is a burst partition. It can be preempted if the high priority partition needs the resources. That's the partition we'd like to guarantee a 1 hour run time on. Looking at the sacctmgr man page, it gives this info on QOS: PreemptExemptTime Specifies a minimum run time for jobs of this QOS before they are considered for preemption. This QOS option takes precedence over the global PreemptExemptTime. This is only honored for PreemptMode=REQUEUE and PreemptMode=CANCEL. This sounds like exactly what we want. So I went into the burst QOS we have available on the burst partition and I set a preemptExemptTime of 30 seconds and a preemptMode of cancel, and tested. Whenever something of a higher priority came along, my job was immediately cancelled, no exempt time was utliized. Am I not understanding how this is supposed to work, or am I asking for an impossible slurm configuration? Thanks, Rob