Re: [slurm-users] slurm node weights
I believe this is so that small jobs will naturally go on older, slower nodes first - leaving the bigger,better ones for jobs that actually need them. Merlin -- Merlin Hartley IT Support Engineer MRC Mitochondrial Biology Unit University of Cambridge Cambridge, CB2 0XY United Kingdom > On 5 Sep 2019, at 16:48, Douglas Duckworth wrote: > > Hello > > We added some newer Epyc nodes, with NVMe scratch, to our cluster and so want > jobs to run on these over others. So we added "Weight=100" to the older > nodes and left the new ones blank. So indeed, ceteris paribus, srun reveals > that the faster nodes will accept jobs over older ones. > > We have the desired outcome though I am a bit confused by two statements in > the manpage <https://slurm.schedmd.com/slurm.conf.html> that seem to be > contradictory: > > "All things being equal, jobs will be allocated the nodes with the lowest > weight which satisfies their requirements." > > "...larger weights should be assigned to nodes with more processors, memory, > disk space, higher processor speed, etc." > > 100 is larger than 1 and we do see jobs preferring the new nodes which have > the default weight of 1. Yet we're also told to assign larger weights to > faster nodes? > > Thanks! > Doug > > -- > Thanks, > > Douglas Duckworth, MSc, LFCS > HPC System Administrator > Scientific Computing Unit <https://scu.med.cornell.edu/> > Weill Cornell Medicine" > E: d...@med.cornell.edu <mailto:d...@med.cornell.edu> > O: 212-746-6305 > F: 212-746-8690
Re: [slurm-users] maximum size of array jobs
max_array_tasks Specify the maximum number of tasks that be included in a job array. The default limit is MaxArraySize, but this option can be used to set a lower limit. For example, max_array_tasks=1000 and MaxArraySize=11 would permit a maximum task ID of 10, but limit the number of tasks in any single job array to 1000. https://slurm.schedmd.com/slurm.conf.html <https://slurm.schedmd.com/slurm.conf.html> SchedulerParameters=max_array_tasks=1000 MaxArraySize=10 See commit: https://github.com/SchedMD/slurm/commit/09c13fb292a4a6a56b4078de840aae0d4db70309 <https://github.com/SchedMD/slurm/commit/09c13fb292a4a6a56b4078de840aae0d4db70309> -- Merlin Hartley Computer Officer MRC Mitochondrial Biology Unit University of Cambridge Cambridge, CB2 0XY United Kingdom > On 26 Feb 2019, at 14:27, Jeffrey Frey wrote: > > Also see "https://slurm.schedmd.com/slurm.conf.html > <https://slurm.schedmd.com/slurm.conf.html>" for MaxArraySize/MaxJobCount. > > We just went through a user-requested adjustment to MaxArraySize to bump it > from 1000 to 1; as the documentation states, since each index of an array > job is essentially "a job," you must be sure to also adjust MaxJobCount (from > 1 to 10 in our case). Adjusting MaxJobCount requires a restart of > slurmctld; though the documentation doesn't state it, so does adjustment of > MaxArraySize (scontrol reconfigure will succeed but leave the previous limit > in effect, see "https://bugs.schedmd.com/show_bug.cgi?id=6553 > <https://bugs.schedmd.com/show_bug.cgi?id=6553>"). > > The "MaxArraySize" is a bit of a misnomer since it's really 1 + the top of > the valid range of indices -- "MaxArrayIndex" would be more apt. Our users > were very happy with Grid Engine's allowance of any index range and striding > that produces no more than "max_aj_tasks" indices; since moving to Slurm > they're forced to come up with their own index-mapping functionality at > times, but the relatively low MaxArraySize versus what we had in GridEngine > (75000) has been especially frustrating for them. > > So far the 1/10 combo hasn't come close to exhausting resources on > our slurmctld nodes; but we haven't actually submitted a couple 1-index > array jobs and enough other jobs to hit 10 active jobs, so current memory > usage isn't an adequate measure of usage under load. Since the slurm.conf > documentation states: > > > Performance can suffer with more than a few hundred thousand jobs. > > > we're reluctant to increase MaxJobCount too much higher. > > > > >> On Feb 26, 2019, at 3:18 AM, Ole Holm Nielsen > <mailto:ole.h.niel...@fysik.dtu.dk>> wrote: >> >> On 2/26/19 9:07 AM, Marcus Wagner wrote: >>> Does anyone know, why per default the number of array elements is limited >>> to 1000? >>> We have one user, who would like to have 100k array elements! >>> What is more difficult for the scheduler, one array job with 100k elements >>> or 100k non-array jobs? >>> Where did you set the limit? Do your users use array jobs at all? >> >> Google is your friend :-) >> >> https://slurm.schedmd.com/job_array.html >> <https://slurm.schedmd.com/job_array.html> >> >>> A new configuration parameter has been added to control the maximum job >>> array size: MaxArraySize. The smallest index that can be specified by a >>> user is zero and the maximum index is MaxArraySize minus one. The default >>> value of MaxArraySize is 1001. The maximum MaxArraySize supported in Slurm >>> is 401. Be mindful about the value of MaxArraySize as job arrays offer >>> an easy way for users to submit large numbers of jobs very quickly. >> >> /Ole >> > > > :: > Jeffrey T. Frey, Ph.D. > Systems Programmer V / HPC Management > Network & Systems Services / College of Engineering > University of Delaware, Newark DE 19716 > Office: (302) 831-6034 Mobile: (302) 419-4976 > :: > > > >
Re: [slurm-users] How to get the CPU usage of history jobs at each compute node?
using sacct [1] - assuming you have accounting [2] enabled: sacct -j Hope this helps! Merlin [1] https://slurm.schedmd.com/sacct.html <https://slurm.schedmd.com/sacct.html> [2] https://slurm.schedmd.com/accounting.html <https://slurm.schedmd.com/accounting.html> -- Merlin Hartley Computer Officer MRC Mitochondrial Biology Unit University of Cambridge Cambridge, CB2 0XY United Kingdom > On 15 Feb 2019, at 10:05, hu...@sugon.com <mailto:hu...@sugon.com> wrote: > > Dear there, > How to view the cpu usage of history jobs at each compute node? > However, this command(control show jobs jobid --detail) can only get the cpu > usage of the currently running job at each compute node : > > Appreciatively, > Menglong -- Merlin Hartley Computer Officer MRC Mitochondrial Biology Unit University of Cambridge Cambridge, CB2 0XY United Kingdom
Re: [slurm-users] How to request ONLY one CPU instead of one socket or one node?
Seems like you aren't specifying a --mem option, so the default would be to ask for a whole-node’s worth of RAM thus you would use the whole node for each job. Hope this is useful! Merlin -- Merlin Hartley Computer Officer MRC Mitochondrial Biology Unit University of Cambridge Cambridge, CB2 0XY United Kingdom > On 14 Feb 2019, at 02:21, Wang, Liaoyuan <mailto:wan...@alfred.edu>> wrote: > > Dear there, > > I wrote an analytic program to analyze my data. The analysis costs around > twenty days to analyze all data for one species. When I submit my job to the > cluster, it always request one node instead of one CPU. I am wondering how I > can ONLY request one CPU using “sbatch” command? Below is my batch file. Any > comments and help would be highly appreciated. > > Appreciatively, > Leon > > #!/bin/sh > > #SBATCH --ntasks=1 > #SBATCH --cpus-per-task=1 > #SBATCH -t 45-00:00:00 > #SBATCH -J 9625%j > #SBATCH -o 9625.out > #SBATCH -e 9625.err > > /home/scripts/wcnqn.auto.pl > === > Where wcnqn.auto.pl is my program. 9625 denotes the species number. -- Merlin Hartley Computer Officer MRC Mitochondrial Biology Unit University of Cambridge Cambridge, CB2 0XY United Kingdom
Re: [slurm-users] Reserve CPUs/MEM for GPUs
You could instead only allow the cpu partition to use 192G RAM and 20 CPU on those nodes... -- Merlin Hartley > On 13 Feb 2019, at 07:38, Quirin Lohr wrote: > > Hi all, > > we have a slurm cluster running on nodes with 2x18 cores, 256GB RAM and 8 > GPUs. Is there a way to reserve a bare minimum of two CPUs and 8GB RAM for > each GPU, so a high-CPU job cannot render the GPUs "unusable"? > > Thanks in advance > Quirin > -- > Quirin Lohr > Systemadministration > Technische Universität München > Fakultät für Informatik > Lehrstuhl für Bildverarbeitung und Mustererkennung > > Boltzmannstrasse 3 > 85748 Garching > > Tel. +49 89 289 17769 > Fax +49 89 289 17757 > > quirin.l...@in.tum.de > www.vision.in.tum.de > -- Merlin Hartley Computer Officer MRC Mitochondrial Biology Unit University of Cambridge Cambridge, CB2 0XY United Kingdom
Re: [slurm-users] jobs stuck in ReqNodeNotAvail,
damn autocorrect - I meant: # scontrol show job 6982 -- Merlin Hartley Computer Officer MRC Mitochondrial Biology Unit Cambridge, CB2 0XY United Kingdom > On 29 Nov 2017, at 16:08, Merlin Hartley <merlin-sl...@mrc-mbu.cam.ac.uk> > wrote: > > Can you give us the output of > # control show job 6982 > > Could be an issue with requesting too many CPUs or something… > > > Merlin > -- > Merlin Hartley > Computer Officer > MRC Mitochondrial Biology Unit > Cambridge, CB2 0XY > United Kingdom > >> On 29 Nov 2017, at 15:21, Christian Anthon <ant...@rth.dk >> <mailto:ant...@rth.dk>> wrote: >> >> Hi, >> >> I have a problem with a newly setup slurm-17.02.7-1.el6.x86_64 that jobs >> seems to be stuck in ReqNodeNotAvail: >> >> 6982 panic Morgensferro PD 0:00 1 >> (ReqNodeNotAvail, UnavailableNodes:) >> 6981 panic SPECferro PD 0:00 1 >> (ReqNodeNotAvail, UnavailableNodes:) >> >> The nodes are fully allocated in terms of memory, but not all cpu resources >> are consumed >> >> PARTITION AVAIL TIMELIMIT NODES STATE NODELIST >> _default up infinite 19mix >> clone[05-11,25-29,31-32,36-37,39-40,45] >> _default up infinite 11 alloc alone[02-08,10-13] >> fastlane up infinite 19mix >> clone[05-11,25-29,31-32,36-37,39-40,45] >> fastlane up infinite 11 alloc alone[02-08,10-13] >> panicup infinite 19mix >> clone[05-11,25-29,31-32,36-37,39-40,45] >> panicup infinite 12 alloc alone[02-08,10-13,15] >> free*up infinite 19mix >> clone[05-11,25-29,31-32,36-37,39-40,45] >> free*up infinite 11 alloc alone[02-08,10-13] >> >> Possibly relevant lines in slurm.conf (full slurm.conf attached) >> >> SchedulerType=sched/backfill >> SelectType=select/cons_res >> SelectTypeParameters=CR_CPU_Memory >> TaskPlugin=task/none >> FastSchedule=1 >> >> Any advice? >> >> Cheers, Christian. >> >> >