[slurm-users] Backfill pushing jobs back
Hello, Last year I posted on this forum looking for some help on backfill in Slurm. We are currently using Slurm 19.05.8 and we find that backfilled (smaller) jobs tend to push back large jobs in our cluster. Chris Samuel replied to our post with the following response... This sounds like a problem that we had at NERSC (small jobs pushing back multi-thousand node jobs), and we carried a local patch for which Doug managed to get upstreamed in 20.02.x (I think it landed in 20.02.3, but 20.02.6 is the current version). We looked through the release notes and sure enough there is a reference to a job starvation patch, however I'm not sure that it is the relevant patch... (in 20.02.2) > -- Fix scheduling issue when there are not enough nodes available to run a > job > resulting in possible job starvation. We decided to download and install the latest production version, 20.11.2, of Slurm. One of my team members managed the installation and ran his backfill tests only to find that the above backfill issue was still present. Should we wind back to version 20.02.6 and insall/test that instead? Could someone please advise use? It would seem odd that a recent version of slurm would still have a backfill issue that starves larger job out. We're wondering if you have forgotten to configure something very fundamental, for example. Best regards, David
Re: [slurm-users] Backfill pushing jobs back
Hello, Could I please follow up on the Slurm patch that relates to smaller jobs pushing large jobs back? My colleague downloaded and installed the most recent production version of Slurm today and tells me that it did not appear to resolve the issue. Just to note, we are currently running v19.05.8 and finding that the backfill mechanism pushes large jobs back. In theory, should the latest Slurm help us in sorting that issue out? I understand that we're testing v20.11.2, however I should clarify that with my colleague tomorrow. Does anyone have any comments, please? Is there any parameter that we need to set to activate the backfill patch, for example? Best regards, David From: slurm-users on behalf of Chris Samuel Sent: 09 December 2020 16:37 To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Backfill pushing jobs back CAUTION: This e-mail originated outside the University of Southampton. Hi David, On 9/12/20 3:35 am, David Baker wrote: > We see the following issue with smaller jobs pushing back large jobs. We > are using slurm 19.05.8 so not sure if this is patched in newer releases. This sounds like a problem that we had at NERSC (small jobs pushing back multi-thousand node jobs), and we carried a local patch for which Doug managed to get upstreamed in 20.02.x (I think it landed in 20.02.3, but 20.02.6 is the current version). Hope this helps! Chris -- Chris Samuel : https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2Fdata=04%7C01%7Cd.j.baker%40soton.ac.uk%7Ccc84ff45cb604a29dd6208d89c614721%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C63743128890119%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=OuSpfkTGBscxqTfJ0CbvX44GanHn4J76p9tV1M1AqSw%3Dreserved=0 : Berkeley, CA, USA
Re: [slurm-users] Backfill pushing jobs back
Hi Chris, Thank you for your reply. It isn't long since we upgraded to Slurm v19, however it sounds like we should start to actively look at v20 since this issue is causing significant problems on our cluster. We're download and install v20 on our dev cluster, and experiment. Best regards, David From: slurm-users on behalf of Chris Samuel Sent: 09 December 2020 16:37 To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Backfill pushing jobs back CAUTION: This e-mail originated outside the University of Southampton. Hi David, On 9/12/20 3:35 am, David Baker wrote: > We see the following issue with smaller jobs pushing back large jobs. We > are using slurm 19.05.8 so not sure if this is patched in newer releases. This sounds like a problem that we had at NERSC (small jobs pushing back multi-thousand node jobs), and we carried a local patch for which Doug managed to get upstreamed in 20.02.x (I think it landed in 20.02.3, but 20.02.6 is the current version). Hope this helps! Chris -- Chris Samuel : https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2Fdata=04%7C01%7Cd.j.baker%40soton.ac.uk%7Ccc84ff45cb604a29dd6208d89c614721%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C63743128890119%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=OuSpfkTGBscxqTfJ0CbvX44GanHn4J76p9tV1M1AqSw%3Dreserved=0 : Berkeley, CA, USA
Re: [slurm-users] Backfill pushing jobs back
Hi David, On 9/12/20 3:35 am, David Baker wrote: We see the following issue with smaller jobs pushing back large jobs. We are using slurm 19.05.8 so not sure if this is patched in newer releases. This sounds like a problem that we had at NERSC (small jobs pushing back multi-thousand node jobs), and we carried a local patch for which Doug managed to get upstreamed in 20.02.x (I think it landed in 20.02.3, but 20.02.6 is the current version). Hope this helps! Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
[slurm-users] Backfill pushing jobs back
Hello, We see the following issue with smaller jobs pushing back large jobs. We are using slurm 19.05.8 so not sure if this is patched in newer releases. With a 4 node test partition I submit 3 jobs as 2 users ssh hpcdev1@navy51 'sbatch --nodes=3 --ntasks-per-node=40 --partition=backfilltest --time=120 --wrap="sleep 7200"' ssh hpcdev2@navy51 'sbatch --nodes=4 --ntasks-per-node=40 --partition=backfilltest --time=60 --wrap="sleep 3600"' ssh hpcdev2@navy51 'sbatch --nodes=4 --ntasks-per-node=40 --partition=backfilltest --time=60 --wrap="sleep 3600"' Then I increase the priority of the pending jobs significantly. Reading the manual, my understanding is that nodes job should be held for these jobs. for job in $(squeue -h -p backfilltest -t pd -o %i); do scontrol update job ${job} priority=10;done squeue -p backfilltest -o "%i | %u | %C | %Q | %l | %S | %T" JOBID | USER | CPUS | PRIORITY | TIME_LIMIT | START_TIME | STATE 28482 | hpcdev2 | 160 | 10 | 1:00:00 | N/A | PENDING 28483 | hpcdev2 | 160 | 10 | 1:00:00 | N/A | PENDING 28481 | hpcdev1 | 120 | 50083 | 2:00:00 | 2020-12-08T09:44:15 | RUNNING So, there is one node free in our 4 node partition. Naturally, a small job with a walltime of less than 1 hour could run in that but we are also seeing backfill start longer jobs. backfilltestup 2-12:00:00 3 alloc reddev[001-003] backfilltestup 2-12:00:00 1 idle reddev004 ssh hpcdev3@navy51 'sbatch --nodes=1 --ntasks-per-node=40 --partition=backfilltest --time=720 --wrap="sleep 432000"' squeue -p backfilltest -o "%i | %u | %C | %Q | %l | %S | %T" JOBID | USER | CPUS | PRIORITY | TIME_LIMIT | START_TIME | STATE 28482 | hpcdev2 | 160 | 10 | 1:00:00 | N/A | PENDING 28483 | hpcdev2 | 160 | 10 | 1:00:00 | N/A | PENDING 28481 | hpcdev1 | 120 | 50083 | 2:00:00 | 2020-12-08T09:44:15 | RUNNING 28484 | hpcdev3 | 40 | 37541 | 12:00:00 | 2020-12-08T09:54:48 | RUNNING Is this expect behaviour? It is also weird that the pending jobs don't have a start time. I have increased the backfill parameters significantly, but it doesn't seem to affect this at all. SchedulerParameters=bf_window=14400,bf_resolution=2400,bf_max_job_user=80,bf_continue,default_queue_depth=1000,bf_interval=60 Best regards, David