Re: [slurm-users] PreemptExemptTime

2023-03-10 Thread Groner, Rob
Could you give me an idea of what your partition and qos settings are?  I've 
tried the following and I'm getting odd results:

slurm.conf
PreemptType: preempt/qos
PreemptMode: 'SUSPEND,GANG'
PreemptExemptTime: '00:00:00'

Partitions:
PartitionName=DEFAULT OverSubscribe=FORCE:1 Nodes=slurm[2-4]
PartitionName=active Default=YES QOS=normal
PartitionName=hipri Default=NO QOS=expedite

QOS:
sacctmgr -i modify qos where name=normal set PreemptExemptTime=00:03:00 
PreemptMode=SUSPEND
sacctmgr -i modify qos where name=expedite set PreemptExemptTime=-1 
PreemptMode=OFF

I took these settings directly from the google group I linked before, and I'm 
seeing what he's seeing...that no preemption happens.  What I see is that, even 
if a job from "active" is already running, slurm will let me submit jobs from 
hipri to take up resources as though the active job wasn't there.  In other 
words, there's some time sharing going on, even though the jobs are different 
partitions.  The online docs indicate that jobs from different partitions 
should NOT be time sharing.  I also do not see the "active" job getting 
preempted once it has run through its preempexempttime.  It never gets 
preempted.

Thanks.

Rob



From: slurm-users  on behalf of 
Christopher Samuel 
Sent: Tuesday, March 7, 2023 3:40 PM
To: slurm-users@lists.schedmd.com 
Subject: Re: [slurm-users] PreemptExemptTime

On 3/7/23 6:46 am, Groner, Rob wrote:

> Over global settings are PreemptMode=SUSPEND,GANG and
> PreemptType=preempt/partition_prio.  We have a high priority partition
> that nothing should ever preempt, and an open partition that is always
> preemptable.  In between is a burst partition.  It can be preempted if
> the high priority partition needs the resources.  That's the partition
> we'd like to guarantee a 1 hour run time on.  Looking at the sacctmgr
> man page, it gives this info on QOS

Just a quick comment, here you're talking about both partitions and
QOS's with respect to preemption, I think for this you need to pick just
one of those options and only use those configs. For instance we just
use QOS's for preemption and our exempt time works in that case.

Hope this helps!

All the best,
Chris
--
Chris Samuel  :  
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F=05%7C01%7Crug262%40psu.edu%7C0ed8f546e70843e7266208db1f4c5471%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638138184919726959%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=md1MaA%2FIj%2Fb%2B27e7JLkVwSY5IUSVnsJIdx6PADl0JF4%3D=0
  :  Berkeley, CA, USA




Re: [slurm-users] PreemptExemptTime

2023-03-10 Thread Groner, Rob
>From what I'm reading in the man pages, it seems like PreempExemptTime isn't 
>compatible with suspending jobs instead of requeue/cancelling them, no matter 
>if at the partition or qos level.  Am I reading that correctly?  We currently 
>give users the option of submitting to a partition that lets their job be 
>suspended, or to one where they're requeued.  Do we have to give that up to 
>offer a min run time on jobs (for just one partition/qos)?

Rob


From: slurm-users  on behalf of 
Christopher Samuel 
Sent: Tuesday, March 7, 2023 3:40 PM
To: slurm-users@lists.schedmd.com 
Subject: Re: [slurm-users] PreemptExemptTime

On 3/7/23 6:46 am, Groner, Rob wrote:

> Over global settings are PreemptMode=SUSPEND,GANG and
> PreemptType=preempt/partition_prio.  We have a high priority partition
> that nothing should ever preempt, and an open partition that is always
> preemptable.  In between is a burst partition.  It can be preempted if
> the high priority partition needs the resources.  That's the partition
> we'd like to guarantee a 1 hour run time on.  Looking at the sacctmgr
> man page, it gives this info on QOS

Just a quick comment, here you're talking about both partitions and
QOS's with respect to preemption, I think for this you need to pick just
one of those options and only use those configs. For instance we just
use QOS's for preemption and our exempt time works in that case.

Hope this helps!

All the best,
Chris
--
Chris Samuel  :  
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F=05%7C01%7Crug262%40psu.edu%7C0ed8f546e70843e7266208db1f4c5471%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638138184919726959%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=md1MaA%2FIj%2Fb%2B27e7JLkVwSY5IUSVnsJIdx6PADl0JF4%3D=0
  :  Berkeley, CA, USA




Re: [slurm-users] PreemptExemptTime

2023-03-10 Thread Groner, Rob
Ya, it did seem like there was quite a bit of overlap between what partitions 
could do and what qos could do.  So, choose one or the other to accomplish our 
preemption/suspension goals?  I'll see if I can do that.  I think I'll look at 
qos, because partitions don't have the preempt exempt time (unless that comes 
from the global setting).

Thanks.

Rob


From: slurm-users  on behalf of 
Christopher Samuel 
Sent: Tuesday, March 7, 2023 3:40 PM
To: slurm-users@lists.schedmd.com 
Subject: Re: [slurm-users] PreemptExemptTime

On 3/7/23 6:46 am, Groner, Rob wrote:

> Over global settings are PreemptMode=SUSPEND,GANG and
> PreemptType=preempt/partition_prio.  We have a high priority partition
> that nothing should ever preempt, and an open partition that is always
> preemptable.  In between is a burst partition.  It can be preempted if
> the high priority partition needs the resources.  That's the partition
> we'd like to guarantee a 1 hour run time on.  Looking at the sacctmgr
> man page, it gives this info on QOS

Just a quick comment, here you're talking about both partitions and
QOS's with respect to preemption, I think for this you need to pick just
one of those options and only use those configs. For instance we just
use QOS's for preemption and our exempt time works in that case.

Hope this helps!

All the best,
Chris
--
Chris Samuel  :  
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F=05%7C01%7Crug262%40psu.edu%7C0ed8f546e70843e7266208db1f4c5471%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638138184919726959%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=md1MaA%2FIj%2Fb%2B27e7JLkVwSY5IUSVnsJIdx6PADl0JF4%3D=0
  :  Berkeley, CA, USA




Re: [slurm-users] PreemptExemptTime

2023-03-07 Thread Christopher Samuel

On 3/7/23 6:46 am, Groner, Rob wrote:

Over global settings are PreemptMode=SUSPEND,GANG and 
PreemptType=preempt/partition_prio.  We have a high priority partition 
that nothing should ever preempt, and an open partition that is always 
preemptable.  In between is a burst partition.  It can be preempted if 
the high priority partition needs the resources.  That's the partition 
we'd like to guarantee a 1 hour run time on.  Looking at the sacctmgr 
man page, it gives this info on QOS


Just a quick comment, here you're talking about both partitions and 
QOS's with respect to preemption, I think for this you need to pick just 
one of those options and only use those configs. For instance we just 
use QOS's for preemption and our exempt time works in that case.


Hope this helps!

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




[slurm-users] PreemptExemptTime

2023-03-07 Thread Groner, Rob
I found a thread about this topic that's a year old and at that time seemed to 
give no hope, I'm just wondering if the situation has changed.  My testing so 
far isn't encouraging.

In the thread (here: https://groups.google.com/g/slurm-users/c/yhnSVBoohik) it 
talks about wanting to give lower priority jobs some amount of guaranteed run 
time.  That's what we're trying to do.

Over global settings are PreemptMode=SUSPEND,GANG and 
PreemptType=preempt/partition_prio.  We have a high priority partition that 
nothing should ever preempt, and an open partition that is always preemptable.  
In between is a burst partition.  It can be preempted if the high priority 
partition needs the resources.  That's the partition we'd like to guarantee a 1 
hour run time on.  Looking at the sacctmgr man page, it gives this info on QOS:

PreemptExemptTime
  Specifies a minimum run time for jobs of this QOS before they are 
considered for preemption. This QOS option takes precedence over the global 
PreemptExemptTime. This  is  only honored for PreemptMode=REQUEUE and 
PreemptMode=CANCEL.

This sounds like exactly what we want.  So I went into the burst QOS we have 
available on the burst partition and I set a preemptExemptTime of 30 seconds 
and a preemptMode of cancel, and tested.  Whenever something of a higher 
priority came along, my job was immediately cancelled, no exempt time was 
utliized.

Am I not understanding how this is supposed to work, or am I asking for an 
impossible slurm configuration?

Thanks,

Rob