[slurm-dev] Re: Dynamic, small partition?
Title: Re: [slurm-dev] Re: Dynamic, small partition? I believe GrpNodes=20 is what you're looking for. from the man page https://slurm.schedmd.com/sacctmgr.html GrpNodes= Maximum number of nodes running jobs are able to be allocated in aggregate for this association and all associations which are children of this association. Hope that helps, --Dani_L. On 24/07//2017 15:02, Steffen Grunewald wrote: On Wed, 2017-07-19 at 09:52:37 -0600, Nicholas McCollum wrote: You could try a QoS with Flags=DenyOnLimit,OverPartQOS,PartitionTimeLimit Priority= Depending on how you have your accounting set up, you could tweak some of the GrpTRES, MaxTRES, MaxTRESPU and MaxJobsPU to try to limit resource usage down to your 20 node limit. I'm not sure, off the top of my head, how to define a hard limit for max nodes that a QoS can use. You could use accounting to prevent unauthorized users from submitting to that QoS. If the QoS isn't going to be used often, you could have only one job running in that QoS at a time, and use job_submit.lua to set the MaxNodes submitted at job submission to 20. if string.match(job_desc.qos, "special") then job_desc.max_nodes = 20 end Just a couple idea's for you, there's probably a way better way to do it! Thanks for hurling such a long list of ideas at me. Since there's already a too long list of QoSes, I decided to have a look at the job_submit.lua path first. Only to discover that our installation doesn't know about that plugin, and that it would take me the better part of a few days to find out how to overcome this. Since our micro version is no longer available in the source archives, I'd have to upgrade to the latest micro release first, which adds unwanted complexity and the risk of having to scrub my vacation. I'm not too fluent in Lua, and there seems to be no safe platform to debug a job_submit.lua script. Nevertheless, I'm looking into this. Your suggestion would limit the number of nodes assigned to a single job - that can be achieved by MaxNodes=20 in the Partition definition. Actually I don't want to trim a job's size, not at submission time, and not later. I'm still in doubt a job_submit.lua script might fully handle my task: It's not about limiting a job's resources, it's about limiting jobs being allocated and run in a particular partition. So what I could do at submit time: modify the "uhold" state of the job according to the current usage of the special partition, and let the user retry. Would I "see" this job again (in job_submit.lua)? Documentation of what this plugin can do, and what it cannot, is rather sparse, and this spontaneous request is only one item on my list. Perhaps I'm looking for the wrong kind of tool - what I'd need is a plugin that decides whether a job is eligible for allocation at all, and so far I've been unsuccessful writing search queries to find such a plugin. Does it exist? Thanks, Steffen -- Nicholas McCollum HPC Systems Administrator Alabama Supercomputer Authority On Wed, 2017-07-19 at 09:12 -0600, Steffen Grunewald wrote: Is it possible to define, and use, a subset of all available nodes in a partition *without* explicitly setting aside a number of nodes (in a static Nodes=... definition)? Let's say I want, starting with a 100-node cluster, make 20 nodes available for jobs needing an extended MaxTime and Priority (compared with the defaults) - and if these 20 nodes have been allocated, no more nodes will be available to jobs submitted to this particular partition, but the 20 nodes may cover a subset of all nodes changing over time (as it will not be in use very often)? Can this be done with Slurm's built-in functionality, 15.08.8 or later? Any pointers are welcome... Thanks, S smime.p7s Description: S/MIME Cryptographic Signature
[slurm-dev] Re: Thoughts on GrpCPURunMins as primary constraint?
Corey, We almost exclusively use GrpCPURunMins as well as 3 or 7 day walltime limits depending on the partition. For my (somewhat rambling) thoughts on the matter, see http://tech.ryancox.net/2014/04/scheduler-limit-remaining-cputime-per.html. It generally works pretty well. We also have https://marylou.byu.edu/simulation/grpcpurunmins.php to simulate various settings, though it needs some improvement such as a realistic maximum. sshare -l (TRESRunMins) should have the live stats you're looking for. Ryan On 07/24/2017 02:39 PM, Corey Keasling wrote: Hi Slurm-Dev, I'm currently designing and testing what will ultimately be a small Slurm cluster of about 60 heterogeneous nodes (five different generations of hardware). Our user-base is also diverse, with need for fast turnover of small, sequential jobs and for long-duration parallel codes (e.g., 16 cores for several months). In the past we limited users by how many cores they could allocate at any one time. This has the drawback that no distinction is made between, say, 128 cores for 2 hours and 128 cores for 2 months. We want users to be able to run on a large portion of the cluster when it is available while ensuring that they cannot take advantage of an idle period to start jobs which will monopolize it for weeks. Limiting by GrpCPURunMins seems like a good answer. I think of it as allocating computational area (i.e., cores*minutes) and not just width (cores). I'd love to know if anyone has any experience or thoughts on imposing limits in this way. Also, is anyone aware of a simple way to calculate remaining "area"? I can use squeue or sacct to ultimately derive how much of a limit is in use by looking at remaining wall-time and core count, but if there's something built in - or pre-existing - it would be nice to know. It's worth noting that the cluster is divided into several partitions with most nodes existing in several. This is partially political (to give groups increased priority on nodes they helped pay for) and partially practical (to ensure users explicitly requesting slow nodes instead of just dumping them on ancient Opterons). Also, each user gets their own Account, so the QoS Grp limits apply to each human separately. Accounts would also have absolute core limits. Thank you for your thoughts! Corey -- Ryan Cox Operations Director Fulton Supercomputing Lab Brigham Young University
[slurm-dev] Thoughts on GrpCPURunMins as primary constraint?
Hi Slurm-Dev, I'm currently designing and testing what will ultimately be a small Slurm cluster of about 60 heterogeneous nodes (five different generations of hardware). Our user-base is also diverse, with need for fast turnover of small, sequential jobs and for long-duration parallel codes (e.g., 16 cores for several months). In the past we limited users by how many cores they could allocate at any one time. This has the drawback that no distinction is made between, say, 128 cores for 2 hours and 128 cores for 2 months. We want users to be able to run on a large portion of the cluster when it is available while ensuring that they cannot take advantage of an idle period to start jobs which will monopolize it for weeks. Limiting by GrpCPURunMins seems like a good answer. I think of it as allocating computational area (i.e., cores*minutes) and not just width (cores). I'd love to know if anyone has any experience or thoughts on imposing limits in this way. Also, is anyone aware of a simple way to calculate remaining "area"? I can use squeue or sacct to ultimately derive how much of a limit is in use by looking at remaining wall-time and core count, but if there's something built in - or pre-existing - it would be nice to know. It's worth noting that the cluster is divided into several partitions with most nodes existing in several. This is partially political (to give groups increased priority on nodes they helped pay for) and partially practical (to ensure users explicitly requesting slow nodes instead of just dumping them on ancient Opterons). Also, each user gets their own Account, so the QoS Grp limits apply to each human separately. Accounts would also have absolute core limits. Thank you for your thoughts! Corey -- Corey Keasling Software Manager JILA Computing Group University of Colorado-Boulder 440 UCB Room S244 Boulder, CO 80309-0440 303-492-9643
[slurm-dev] Re: Elapsed time for slurm job
Dear Sema, You need to set up accounting first: https://slurm.schedmd.com/accounting.html You obviously won't have data for jobs which ran before accounting was set up. When you have done this, you will be able to do something like sacct -j 123456 -o jobid,elapsed for subsequent jobs. Read 'man sacct' for more info. Regards Loris Sema Atasever writes: > Re: [slurm-dev] Re: Elapsed time for slurm job > > Dear Loris, > > When i try this command (sacct -o 2893,elapsed) i get this error message > unfortunately: > > SLURM accounting storage is disabled > > How to solve this problem? > > Regards, Sema. > > On Mon, Jul 24, 2017 at 4:25 PM, Loris Bennett > wrote: > > Sema Atasever writes: > > > Elapsed time for slurm job > > > > Dear Friends, > > > > How can i retrieve elapsed time if the slurm job has completed? > > > > Thanks in advance. > > sacct -o jobid,elapsed > > See 'man sacct' or 'sacct -e' for the full list of fields. > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de > > -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Re: Elapsed time for slurm job
Dear Loris, When i try this command (*sacct -o 2893,elapsed*) i get this error message unfortunately: *SLURM accounting storage is disabled* How to solve this problem? Regards, Sema. On Mon, Jul 24, 2017 at 4:25 PM, Loris Bennett wrote: > > Sema Atasever writes: > > > Elapsed time for slurm job > > > > Dear Friends, > > > > How can i retrieve elapsed time if the slurm job has completed? > > > > Thanks in advance. > > sacct -o jobid,elapsed > > See 'man sacct' or 'sacct -e' for the full list of fields. > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de >
[slurm-dev] Re: Elapsed time for slurm job
Sema Atasever writes: > Elapsed time for slurm job > > Dear Friends, > > How can i retrieve elapsed time if the slurm job has completed? > > Thanks in advance. sacct -o jobid,elapsed See 'man sacct' or 'sacct -e' for the full list of fields. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
[slurm-dev] Elapsed time for slurm job
Dear Friends, How can i retrieve elapsed time if the slurm job has completed? Thanks in advance.
[slurm-dev] Re: Dynamic, small partition?
On Wed, 2017-07-19 at 09:52:37 -0600, Nicholas McCollum wrote: > You could try a QoS with > Flags=DenyOnLimit,OverPartQOS,PartitionTimeLimit Priority= > > Depending on how you have your accounting set up, you could tweak some > of the GrpTRES, MaxTRES, MaxTRESPU and MaxJobsPU to try to limit > resource usage down to your 20 node limit. I'm not sure, off the top > of my head, how to define a hard limit for max nodes that a QoS can > use. > > You could use accounting to prevent unauthorized users from submitting > to that QoS. > > If the QoS isn't going to be used often, you could have only one job > running in that QoS at a time, and use job_submit.lua to set the > MaxNodes submitted at job submission to 20. > > if string.match(job_desc.qos, "special") then > job_desc.max_nodes = 20 > end > > > Just a couple idea's for you, there's probably a way better way to do > it! Thanks for hurling such a long list of ideas at me. Since there's already a too long list of QoSes, I decided to have a look at the job_submit.lua path first. Only to discover that our installation doesn't know about that plugin, and that it would take me the better part of a few days to find out how to overcome this. Since our micro version is no longer available in the source archives, I'd have to upgrade to the latest micro release first, which adds unwanted complexity and the risk of having to scrub my vacation. I'm not too fluent in Lua, and there seems to be no safe platform to debug a job_submit.lua script. Nevertheless, I'm looking into this. Your suggestion would limit the number of nodes assigned to a single job - that can be achieved by MaxNodes=20 in the Partition definition. Actually I don't want to trim a job's size, not at submission time, and not later. I'm still in doubt a job_submit.lua script might fully handle my task: It's not about limiting a job's resources, it's about limiting jobs being allocated and run in a particular partition. So what I could do at submit time: modify the "uhold" state of the job according to the current usage of the special partition, and let the user retry. Would I "see" this job again (in job_submit.lua)? Documentation of what this plugin can do, and what it cannot, is rather sparse, and this spontaneous request is only one item on my list. Perhaps I'm looking for the wrong kind of tool - what I'd need is a plugin that decides whether a job is eligible for allocation at all, and so far I've been unsuccessful writing search queries to find such a plugin. Does it exist? Thanks, Steffen > > -- > Nicholas McCollum > HPC Systems Administrator > Alabama Supercomputer Authority > > On Wed, 2017-07-19 at 09:12 -0600, Steffen Grunewald wrote: > > Is it possible to define, and use, a subset of all available nodes in > > a > > partition *without* explicitly setting aside a number of nodes (in a > > static > > Nodes=... definition)? > > Let's say I want, starting with a 100-node cluster, make 20 nodes > > available > > for jobs needing an extended MaxTime and Priority (compared with the > > defaults) > > - and if these 20 nodes have been allocated, no more nodes will be > > available > > to jobs submitted to this particular partition, but the 20 nodes may > > cover > > a subset of all nodes changing over time (as it will not be in use > > very often)? > > > > Can this be done with Slurm's built-in functionality, 15.08.8 or > > later? > > Any pointers are welcome... > > > > Thanks, > > S