[slurm-dev] Re: Dynamic, small partition?

2017-07-24 Thread dani
Title: Re: [slurm-dev] Re: Dynamic, small partition?

 
 

  
I believe GrpNodes=20 is what you're looking for.
from the man page https://slurm.schedmd.com/sacctmgr.html


  
GrpNodes=

  Maximum number of nodes running jobs are able to be
  allocated in aggregate for this association and
  all associations which
  are children of this association.

  


Hope that helps,
--Dani_L.

On 24/07//2017 15:02, Steffen Grunewald
  wrote:


  
On Wed, 2017-07-19 at 09:52:37 -0600, Nicholas McCollum wrote:

  
You could try a QoS with
Flags=DenyOnLimit,OverPartQOS,PartitionTimeLimit Priority= 

Depending on how you have your accounting set up, you could tweak some
of the GrpTRES, MaxTRES, MaxTRESPU and MaxJobsPU to try to limit
resource usage down to your 20 node limit.  I'm not sure, off the top
of my head, how to define a hard limit for max nodes that a QoS can
use.  

You could use accounting to prevent unauthorized users from submitting
to that QoS.

If the QoS isn't going to be used often, you could have only one job
running in that QoS at a time, and use job_submit.lua to set the
MaxNodes submitted at job submission to 20.

if string.match(job_desc.qos, "special") then
  job_desc.max_nodes = 20
end


Just a couple idea's for you, there's probably a way better way to do
it!

  
  
Thanks for hurling such a long list of ideas at me.

Since there's already a too long list of QoSes, I decided to have a look
at the job_submit.lua path first.

Only to discover that our installation doesn't know about that plugin,
and that it would take me the better part of a few days to find out how
to overcome this.

Since our micro version is no longer available in the source archives,
I'd have to upgrade to the latest micro release first, which adds
unwanted complexity and the risk of having to scrub my vacation.

I'm not too fluent in Lua, and there seems to be no safe platform to
debug a job_submit.lua script.
Nevertheless, I'm looking into this.

Your suggestion would limit the number of nodes assigned to a single
job - that can be achieved by MaxNodes=20 in the Partition definition.
Actually I don't want to trim a job's size, not at submission time, and
not later.

I'm still in doubt a job_submit.lua script might fully handle my task:
It's not about limiting a job's resources, it's about limiting jobs
being allocated and run in a particular partition. So what I could do
at submit time: modify the "uhold" state of the job according to the
current usage of the special partition, and let the user retry. Would
I "see" this job again (in job_submit.lua)?
Documentation of what this plugin can do, and what it cannot, is rather
sparse, and this spontaneous request is only one item on my list. 

Perhaps I'm looking for the wrong kind of tool - what I'd need is a 
plugin that decides whether a job is eligible for allocation at all,
and so far I've been unsuccessful writing search queries to find such
a plugin. Does it exist?

Thanks,
 Steffen


  

-- 
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority

On Wed, 2017-07-19 at 09:12 -0600, Steffen Grunewald wrote:


  Is it possible to define, and use, a subset of all available nodes in
a
partition *without* explicitly setting aside a number of nodes (in a
static
Nodes=... definition)?
Let's say I want, starting with a 100-node cluster, make 20 nodes
available
for jobs needing an extended MaxTime and Priority (compared with the
defaults)
 - and if these 20 nodes have been allocated, no more nodes will be
available
to jobs  submitted to this particular partition, but the 20 nodes may
cover
a subset of all nodes changing over time (as it will not be in use
very often)?

Can this be done with Slurm's built-in functionality, 15.08.8 or
later?
Any pointers are welcome...

Thanks,
 S


  


  



 
 
			

smime.p7s
Description: S/MIME Cryptographic Signature


[slurm-dev] Re: Thoughts on GrpCPURunMins as primary constraint?

2017-07-24 Thread Ryan Cox


Corey,

We almost exclusively use GrpCPURunMins as well as 3 or 7 day walltime 
limits depending on the partition.  For my (somewhat rambling) thoughts 
on the matter, see 
http://tech.ryancox.net/2014/04/scheduler-limit-remaining-cputime-per.html. 
It generally works pretty well.


We also have https://marylou.byu.edu/simulation/grpcpurunmins.php to 
simulate various settings, though it needs some improvement such as a 
realistic maximum.


sshare -l (TRESRunMins) should have the live stats you're looking for.

Ryan

On 07/24/2017 02:39 PM, Corey Keasling wrote:


Hi Slurm-Dev,

I'm currently designing and testing what will ultimately be a small 
Slurm cluster of about 60 heterogeneous nodes (five different 
generations of hardware).  Our user-base is also diverse, with need 
for fast turnover of small, sequential jobs and for long-duration 
parallel codes (e.g., 16 cores for several months).


In the past we limited users by how many cores they could allocate at 
any one time.  This has the drawback that no distinction is made 
between, say, 128 cores for 2 hours and 128 cores for 2 months.  We 
want users to be able to run on a large portion of the cluster when it 
is available while ensuring that they cannot take advantage of an idle 
period to start jobs which will monopolize it for weeks.


Limiting by GrpCPURunMins seems like a good answer.  I think of it as 
allocating computational area (i.e., cores*minutes) and not just width 
(cores).  I'd love to know if anyone has any experience or thoughts on 
imposing limits in this way.  Also, is anyone aware of a simple way to 
calculate remaining "area"?  I can use squeue or sacct to ultimately 
derive how much of a limit is in use by looking at remaining wall-time 
and core count, but if there's something built in - or pre-existing - 
it would be nice to know.


It's worth noting that the cluster is divided into several partitions 
with most nodes existing in several.  This is partially political (to 
give groups increased priority on nodes they helped pay for) and 
partially practical (to ensure users explicitly requesting slow nodes 
instead of just dumping them on ancient Opterons).  Also, each user 
gets their own Account, so the QoS Grp limits apply to each human 
separately.  Accounts would also have absolute core limits.


Thank you for your thoughts!

Corey



--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University


[slurm-dev] Thoughts on GrpCPURunMins as primary constraint?

2017-07-24 Thread Corey Keasling


Hi Slurm-Dev,

I'm currently designing and testing what will ultimately be a small 
Slurm cluster of about 60 heterogeneous nodes (five different 
generations of hardware).  Our user-base is also diverse, with need for 
fast turnover of small, sequential jobs and for long-duration parallel 
codes (e.g., 16 cores for several months).


In the past we limited users by how many cores they could allocate at 
any one time.  This has the drawback that no distinction is made 
between, say, 128 cores for 2 hours and 128 cores for 2 months.  We want 
users to be able to run on a large portion of the cluster when it is 
available while ensuring that they cannot take advantage of an idle 
period to start jobs which will monopolize it for weeks.


Limiting by GrpCPURunMins seems like a good answer.  I think of it as 
allocating computational area (i.e., cores*minutes) and not just width 
(cores).  I'd love to know if anyone has any experience or thoughts on 
imposing limits in this way.  Also, is anyone aware of a simple way to 
calculate remaining "area"?  I can use squeue or sacct to ultimately 
derive how much of a limit is in use by looking at remaining wall-time 
and core count, but if there's something built in - or pre-existing - it 
would be nice to know.


It's worth noting that the cluster is divided into several partitions 
with most nodes existing in several.  This is partially political (to 
give groups increased priority on nodes they helped pay for) and 
partially practical (to ensure users explicitly requesting slow nodes 
instead of just dumping them on ancient Opterons).  Also, each user gets 
their own Account, so the QoS Grp limits apply to each human separately. 
 Accounts would also have absolute core limits.


Thank you for your thoughts!

Corey

--
Corey Keasling
Software Manager
JILA Computing Group
University of Colorado-Boulder
440 UCB Room S244
Boulder, CO 80309-0440
303-492-9643


[slurm-dev] Re: Elapsed time for slurm job

2017-07-24 Thread Loris Bennett

Dear Sema,

You need to set up accounting first:

  https://slurm.schedmd.com/accounting.html

You obviously won't have data for jobs which ran before accounting was
set up.  When you have done this, you will be able to do something like

  sacct -j 123456 -o jobid,elapsed

for subsequent jobs.

Read 'man sacct' for more info.

Regards

Loris

Sema Atasever  writes:

> Re: [slurm-dev] Re: Elapsed time for slurm job 
>
> Dear Loris,
>
> When i try this command (sacct -o 2893,elapsed) i get this error message 
> unfortunately:
>
> SLURM accounting storage is disabled
>
> How to solve this problem?
>
> Regards, Sema.
>
> On Mon, Jul 24, 2017 at 4:25 PM, Loris Bennett  
> wrote:
>
>  Sema Atasever  writes:
>
>  > Elapsed time for slurm job
>  >
>  > Dear Friends,
>  >
>  > How can i retrieve elapsed time if the slurm job has completed?
>  >
>  > Thanks in advance.
>
>  sacct -o jobid,elapsed
>
>  See 'man sacct' or 'sacct -e' for the full list of fields.
>
>  Cheers,
>
>  Loris
>
>  --
>  Dr. Loris Bennett (Mr.)
>  ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
>
>

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de


[slurm-dev] Re: Elapsed time for slurm job

2017-07-24 Thread Sema Atasever
Dear Loris,

When i try this command (*sacct -o 2893,elapsed*) i get this error message
unfortunately:

*SLURM accounting storage is disabled*

How to solve this problem?

Regards, Sema.

On Mon, Jul 24, 2017 at 4:25 PM, Loris Bennett 
wrote:

>
> Sema Atasever  writes:
>
> > Elapsed time for slurm job
> >
> > Dear Friends,
> >
> > How can i retrieve elapsed time if the slurm job has completed?
> >
> > Thanks in advance.
>
> sacct -o jobid,elapsed
>
> See 'man sacct' or 'sacct -e' for the full list of fields.
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Mr.)
> ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
>


[slurm-dev] Re: Elapsed time for slurm job

2017-07-24 Thread Loris Bennett

Sema Atasever  writes:

> Elapsed time for slurm job 
>
> Dear Friends,
>
> How can i retrieve elapsed time if the slurm job has completed?
>
> Thanks in advance.

sacct -o jobid,elapsed

See 'man sacct' or 'sacct -e' for the full list of fields.

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de


[slurm-dev] Elapsed time for slurm job

2017-07-24 Thread Sema Atasever
Dear Friends,

How can i retrieve elapsed time if the slurm job has completed?

Thanks in advance.


[slurm-dev] Re: Dynamic, small partition?

2017-07-24 Thread Steffen Grunewald

On Wed, 2017-07-19 at 09:52:37 -0600, Nicholas McCollum wrote:
> You could try a QoS with
> Flags=DenyOnLimit,OverPartQOS,PartitionTimeLimit Priority= 
> 
> Depending on how you have your accounting set up, you could tweak some
> of the GrpTRES, MaxTRES, MaxTRESPU and MaxJobsPU to try to limit
> resource usage down to your 20 node limit.  I'm not sure, off the top
> of my head, how to define a hard limit for max nodes that a QoS can
> use.  
> 
> You could use accounting to prevent unauthorized users from submitting
> to that QoS.
> 
> If the QoS isn't going to be used often, you could have only one job
> running in that QoS at a time, and use job_submit.lua to set the
> MaxNodes submitted at job submission to 20.
> 
> if string.match(job_desc.qos, "special") then
>   job_desc.max_nodes = 20
> end
> 
> 
> Just a couple idea's for you, there's probably a way better way to do
> it!

Thanks for hurling such a long list of ideas at me.

Since there's already a too long list of QoSes, I decided to have a look
at the job_submit.lua path first.

Only to discover that our installation doesn't know about that plugin,
and that it would take me the better part of a few days to find out how
to overcome this.

Since our micro version is no longer available in the source archives,
I'd have to upgrade to the latest micro release first, which adds
unwanted complexity and the risk of having to scrub my vacation.

I'm not too fluent in Lua, and there seems to be no safe platform to
debug a job_submit.lua script.
Nevertheless, I'm looking into this.

Your suggestion would limit the number of nodes assigned to a single
job - that can be achieved by MaxNodes=20 in the Partition definition.
Actually I don't want to trim a job's size, not at submission time, and
not later.

I'm still in doubt a job_submit.lua script might fully handle my task:
It's not about limiting a job's resources, it's about limiting jobs
being allocated and run in a particular partition. So what I could do
at submit time: modify the "uhold" state of the job according to the
current usage of the special partition, and let the user retry. Would
I "see" this job again (in job_submit.lua)?
Documentation of what this plugin can do, and what it cannot, is rather
sparse, and this spontaneous request is only one item on my list. 

Perhaps I'm looking for the wrong kind of tool - what I'd need is a 
plugin that decides whether a job is eligible for allocation at all,
and so far I've been unsuccessful writing search queries to find such
a plugin. Does it exist?

Thanks,
 Steffen

> 
> -- 
> Nicholas McCollum
> HPC Systems Administrator
> Alabama Supercomputer Authority
> 
> On Wed, 2017-07-19 at 09:12 -0600, Steffen Grunewald wrote:
> > Is it possible to define, and use, a subset of all available nodes in
> > a
> > partition *without* explicitly setting aside a number of nodes (in a
> > static
> > Nodes=... definition)?
> > Let's say I want, starting with a 100-node cluster, make 20 nodes
> > available
> > for jobs needing an extended MaxTime and Priority (compared with the
> > defaults)
> >  - and if these 20 nodes have been allocated, no more nodes will be
> > available
> > to jobs  submitted to this particular partition, but the 20 nodes may
> > cover
> > a subset of all nodes changing over time (as it will not be in use
> > very often)?
> > 
> > Can this be done with Slurm's built-in functionality, 15.08.8 or
> > later?
> > Any pointers are welcome...
> > 
> > Thanks,
> >  S