Re: [slurm-users] Simple free for all cluster
Thanks Chris will likely need it :) John On Sat, Oct 10, 2020 at 04:19:06PM -0700, Chris Samuel wrote: > On Tuesday, 6 October 2020 7:53:02 AM PDT Jason Simms wrote: > > > I currently don't have a MaxTime defined, because how do I know how long a > > job will take? Most jobs on my cluster require no more than 3-4 days, but > > in some cases at other campuses, I know that jobs can run for weeks. I > > suppose even setting a time limit such as 4 weeks would be overkill, but at > > least it's not infinite. I'm curious what others use as that value, and how > > you arrived at it > > My journey over the last 16 years in HPC has been one of decreasing time > limits, back in 2003 with VPAC's first Linux cluster we had no time limits, > we > then introduced a 90 day limit so we could plan quarterly maintenances (and > yes, we had users who had jobs which legitimately ran longer than that, so > they had to learn to checkpoint). At VLSCI we had 30 day limits (life > sciences, so many long running poorly scaling jobs), then when I was at > Swinburne it was a 7 day limit, and now here at NERSC we've got 2 day limits. > > It really is down to what your use cases are and how much influence you have > over your users. It's often the HPC sysadmins responsibility to try and find > that balance between good utilisation, effective use of the system and > reaching > the desired science/research/development outcomes. > > Best of luck! > Chris > -- > Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA > > > > -- j...@sdf.org SDF Public Access UNIX System - http://sdf.org
Re: [slurm-users] Simple free for all cluster
On Tuesday, 6 October 2020 7:53:02 AM PDT Jason Simms wrote: > I currently don't have a MaxTime defined, because how do I know how long a > job will take? Most jobs on my cluster require no more than 3-4 days, but > in some cases at other campuses, I know that jobs can run for weeks. I > suppose even setting a time limit such as 4 weeks would be overkill, but at > least it's not infinite. I'm curious what others use as that value, and how > you arrived at it My journey over the last 16 years in HPC has been one of decreasing time limits, back in 2003 with VPAC's first Linux cluster we had no time limits, we then introduced a 90 day limit so we could plan quarterly maintenances (and yes, we had users who had jobs which legitimately ran longer than that, so they had to learn to checkpoint). At VLSCI we had 30 day limits (life sciences, so many long running poorly scaling jobs), then when I was at Swinburne it was a 7 day limit, and now here at NERSC we've got 2 day limits. It really is down to what your use cases are and how much influence you have over your users. It's often the HPC sysadmins responsibility to try and find that balance between good utilisation, effective use of the system and reaching the desired science/research/development outcomes. Best of luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Re: [slurm-users] Simple free for all cluster
Hi Jason, we intend to have a maximum wallclock time of 5 days. We chose this, to have the possibility to do a timely maintenance without disturbing and killing the users jobs. Yet we see that some users and / or codes need a longer runtime. That is why we set the maxtime for the partitions to 30 days. Our users must write a proposal if they need a larger amount of core hours. There they have to justify why they need a longer runtime than 5 days. This maxtime is limited by the association created for the triple (account,user,partition). When we need to do a timely maintenance, we kill such long running jobs. Our users know that. Our default time is set to 15 minutes. Best Marcus Am 06.10.2020 um 16:53 schrieb Jason Simms: FWIW, I define the DefaultTime as 5 minutes, which effectively means for any "real" job that users must actually define a time. It helps users get into that habit, because in the absence of a DefaultTime, most will not even bother to think critically and carefully about what time limit is actually reasonable, which is important for, e.g., effective job backfill and scheduling estimations. I currently don't have a MaxTime defined, because how do I know how long a job will take? Most jobs on my cluster require no more than 3-4 days, but in some cases at other campuses, I know that jobs can run for weeks. I suppose even setting a time limit such as 4 weeks would be overkill, but at least it's not infinite. I'm curious what others use as that value, and how you arrived at it. Warmest regards, Jason On Tue, Oct 6, 2020 at 5:55 AM John H mailto:j...@sdf.org>> wrote: Yes I hadn't considered that! Thanks for the tip, Michael I shall do that. John On Fri, Oct 02, 2020 at 01:49:44PM +, Renfro, Michael wrote: > Depending on the users who will be on this cluster, I'd probably adjust the partition to have a defined, non-infinite MaxTime, and maybe a lower DefaultTime. Otherwise, it would be very easy for someone to start a job that reserves all cores until the nodes get rebooted, since all they have to do is submit a job with no explicit time limit (which would then use DefaultTime, which itself has a default value of MaxTime). > -- *Jason L. Simms, Ph.D., M.P.H.* Manager of Research and High-Performance Computing XSEDE Campus Champion Lafayette College Information Technology Services 710 Sullivan Rd | Easton, PA 18042 Office: 112 Skillman Library p: (610) 330-5632 -- Dipl.-Inf. Marcus Wagner IT Center Gruppe: Systemgruppe Linux Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24383 Fax: +49 241 80-624383 wag...@itc.rwth-aachen.de www.itc.rwth-aachen.de Social Media Kanäle des IT Centers: https://blog.rwth-aachen.de/itc/ https://www.facebook.com/itcenterrwth https://www.linkedin.com/company/itcenterrwth https://twitter.com/ITCenterRWTH https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ smime.p7s Description: S/MIME Cryptographic Signature
Re: [slurm-users] Simple free for all cluster
Il 06/10/20 16:53, Jason Simms ha scritto: > FWIW, I define the DefaultTime as 5 minutes, which effectively means for > any "real" job that users must actually define a time. It helps users > get into that habit, because in the absence of a DefaultTime, most will > not even bother to think critically and carefully about what time limit > is actually reasonable, which is important for, e.g., effective job > backfill and scheduling estimations. +1 > I currently don't have a MaxTime defined, because how do I know how long > a job will take? Most jobs on my cluster require no more than 3-4 days, > but in some cases at other campuses, I know that jobs can run for weeks. > I suppose even setting a time limit such as 4 weeks would be overkill, > but at least it's not infinite. I'm curious what others use as that > value, and how you arrived at it. We're currently using 24h,and will up to 72h. It's a compromise between users requesting more time for their jobs and how long they're willing to wait for their jobs to start. Checkpointing is always needed anyway. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786
Re: [slurm-users] Simple free for all cluster
Our MaxTime and DefaultTime are 14-days. Setting a high DefaultTime was a convenience to our users (and the support team) but has evolved into a mistake because it impacts backfill. Under high load we'll see small backfill jobs take over because the estimated start and end time of "DefaultTime" jobs are wildly incorrect -- the backfill algorithm is less likely to calculate a delay in larger, highest-priority jobs and backfills smaller jobs. I've tuned many of the backfill SchedulerParameters, but there's no replacement for an accurate time estimate. Default values also become difficult to change once hundreds of submit scripts ignore them. Jason, I think setting a small DefaultTime limit is a good approach. We've considered resetting our default to 1 min to force jobs to specify a time but will (likely) target an average-ish value now that we have stats from a couple of million jobs. - Sebastian -- [University of Nevada, Reno]<http://www.unr.edu/> Sebastian Smith High-Performance Computing Engineer Office of Information Technology 1664 North Virginia Street MS 0291 work-phone: 775-682-5050 email: stsm...@unr.edu<mailto:stsm...@unr.edu> website: http://rc.unr.edu<http://rc.unr.edu/> From: slurm-users on behalf of Jason Simms Sent: Tuesday, October 6, 2020 7:53 AM To: Slurm User Community List Subject: Re: [slurm-users] Simple free for all cluster FWIW, I define the DefaultTime as 5 minutes, which effectively means for any "real" job that users must actually define a time. It helps users get into that habit, because in the absence of a DefaultTime, most will not even bother to think critically and carefully about what time limit is actually reasonable, which is important for, e.g., effective job backfill and scheduling estimations. I currently don't have a MaxTime defined, because how do I know how long a job will take? Most jobs on my cluster require no more than 3-4 days, but in some cases at other campuses, I know that jobs can run for weeks. I suppose even setting a time limit such as 4 weeks would be overkill, but at least it's not infinite. I'm curious what others use as that value, and how you arrived at it. Warmest regards, Jason On Tue, Oct 6, 2020 at 5:55 AM John H mailto:j...@sdf.org>> wrote: Yes I hadn't considered that! Thanks for the tip, Michael I shall do that. John On Fri, Oct 02, 2020 at 01:49:44PM +, Renfro, Michael wrote: > Depending on the users who will be on this cluster, I'd probably adjust the > partition to have a defined, non-infinite MaxTime, and maybe a lower > DefaultTime. Otherwise, it would be very easy for someone to start a job that > reserves all cores until the nodes get rebooted, since all they have to do is > submit a job with no explicit time limit (which would then use DefaultTime, > which itself has a default value of MaxTime). > -- Jason L. Simms, Ph.D., M.P.H. Manager of Research and High-Performance Computing XSEDE Campus Champion Lafayette College Information Technology Services 710 Sullivan Rd | Easton, PA 18042 Office: 112 Skillman Library p: (610) 330-5632
Re: [slurm-users] Simple free for all cluster
FWIW, I define the DefaultTime as 5 minutes, which effectively means for any "real" job that users must actually define a time. It helps users get into that habit, because in the absence of a DefaultTime, most will not even bother to think critically and carefully about what time limit is actually reasonable, which is important for, e.g., effective job backfill and scheduling estimations. I currently don't have a MaxTime defined, because how do I know how long a job will take? Most jobs on my cluster require no more than 3-4 days, but in some cases at other campuses, I know that jobs can run for weeks. I suppose even setting a time limit such as 4 weeks would be overkill, but at least it's not infinite. I'm curious what others use as that value, and how you arrived at it. Warmest regards, Jason On Tue, Oct 6, 2020 at 5:55 AM John H wrote: > Yes I hadn't considered that! Thanks for the tip, Michael I shall do that. > > John > > On Fri, Oct 02, 2020 at 01:49:44PM +, Renfro, Michael wrote: > > Depending on the users who will be on this cluster, I'd probably adjust > the partition to have a defined, non-infinite MaxTime, and maybe a lower > DefaultTime. Otherwise, it would be very easy for someone to start a job > that reserves all cores until the nodes get rebooted, since all they have > to do is submit a job with no explicit time limit (which would then use > DefaultTime, which itself has a default value of MaxTime). > > > > -- *Jason L. Simms, Ph.D., M.P.H.* Manager of Research and High-Performance Computing XSEDE Campus Champion Lafayette College Information Technology Services 710 Sullivan Rd | Easton, PA 18042 Office: 112 Skillman Library p: (610) 330-5632
Re: [slurm-users] Simple free for all cluster
Yes I hadn't considered that! Thanks for the tip, Michael I shall do that. John On Fri, Oct 02, 2020 at 01:49:44PM +, Renfro, Michael wrote: > Depending on the users who will be on this cluster, I'd probably adjust the > partition to have a defined, non-infinite MaxTime, and maybe a lower > DefaultTime. Otherwise, it would be very easy for someone to start a job that > reserves all cores until the nodes get rebooted, since all they have to do is > submit a job with no explicit time limit (which would then use DefaultTime, > which itself has a default value of MaxTime). >
[slurm-users] Simple free for all cluster
Hi All Hope you are all keeping well in these difficult times. I have setup a small Slurm cluster of 8 compute nodes (4 x 1-core CPUs, 16GB RAM) without scheduling or accounting as it isn't really needed. I'm just looking for confirmation it's configured correctly to allow the controller to 'see' all resource and allocate incoming jobs to the most readily available node in the cluster. I can see jobs are being delivered to different nodes but want to ensure I haven't inadvertently done anything to render it sub optimal (even in such a simple use case!) Thanks very much for any assistance, here is my cfg: # # SLURM.CONF ControlMachine=slnode1 BackupController=slnode2 MpiDefault=none ProctrackType=proctrack/pgid ReturnToService=1 SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm StateSaveLocation=/var/spool/slurm-llnl SwitchType=switch/none TaskPlugin=task/none # # TIMERS MinJobAge=86400 # # SCHEDULING FastSchedule=1 SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_CPU_MEMORY # # LOGGING AND ACCOUNTING AccountingStorageType=accounting_storage/none ClusterName=cluster JobAcctGatherType=jobacct_gather/none SlurmctldDebug=3 SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log SlurmdDebug=3 SlurmdLogFile=/var/log/slurm-llnl/slurmd.log # # COMPUTE NODES NodeName=slnode[1-8] CPUs=4 Boards=1 SocketsPerBoard=4 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=16017 PartitionName=sl Nodes=slnode[1-8] Default=YES MaxTime=INFINITE State=UP John -- j...@sdf.org SDF Public Access UNIX System - http://sdf.org
Re: [slurm-users] Simple free for all cluster
Depending on the users who will be on this cluster, I'd probably adjust the partition to have a defined, non-infinite MaxTime, and maybe a lower DefaultTime. Otherwise, it would be very easy for someone to start a job that reserves all cores until the nodes get rebooted, since all they have to do is submit a job with no explicit time limit (which would then use DefaultTime, which itself has a default value of MaxTime). On 10/2/20, 7:37 AM, "slurm-users on behalf of John H" wrote: Hi All Hope you are all keeping well in these difficult times. I have setup a small Slurm cluster of 8 compute nodes (4 x 1-core CPUs, 16GB RAM) without scheduling or accounting as it isn't really needed. I'm just looking for confirmation it's configured correctly to allow the controller to 'see' all resource and allocate incoming jobs to the most readily available node in the cluster. I can see jobs are being delivered to different nodes but want to ensure I haven't inadvertently done anything to render it sub optimal (even in such a simple use case!) Thanks very much for any assistance, here is my cfg: # # SLURM.CONF ControlMachine=slnode1 BackupController=slnode2 MpiDefault=none ProctrackType=proctrack/pgid ReturnToService=1 SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm StateSaveLocation=/var/spool/slurm-llnl SwitchType=switch/none TaskPlugin=task/none # # TIMERS MinJobAge=86400 # # SCHEDULING FastSchedule=1 SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_CPU_MEMORY # # LOGGING AND ACCOUNTING AccountingStorageType=accounting_storage/none ClusterName=cluster JobAcctGatherType=jobacct_gather/none SlurmctldDebug=3 SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log SlurmdDebug=3 SlurmdLogFile=/var/log/slurm-llnl/slurmd.log # # COMPUTE NODES NodeName=slnode[1-8] CPUs=4 Boards=1 SocketsPerBoard=4 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=16017 PartitionName=sl Nodes=slnode[1-8] Default=YES MaxTime=INFINITE State=UP John