Re: [slurm-users] associations, limits,qos
Il 25/01/21 14:46, Durai Arasan ha scritto: > Jobs submitted with sbatch cannot run on multiple partitions. The job > will be submitted to the partition where it can start first. (from > sbatch reference) Did I misunderstand or heterogeneous jobs can workaround this limitation? -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786
Re: [slurm-users] Fairshare tree after SLURM upgrade
On 1/29/21 8:03 AM, Gestió Servidors wrote: I’m going to upgrade my SLURM version from 17.11.5 to 19.05.1. I know this is not the last version, but I manage another cluster that is running, also, this version. My question is: during the process, I need to upgrade “slurmdbd”. All the fairshare tree (with rawusage, effectvusage, fairshare, etc), will be kept in the new version after upgrading? Beware: You can as a maximum upgrade only by 2 Slurm major versions! This is well known, see a summary in https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm All data in your database will be migrated correctly to the new Slurm version. This assumes that your upgrade process worked without errors! Older MySQL versions may have problems! Therefore it is critical to first test the database upgrade on a test system! Please see the above page for advice on the upgrade testing. /Ole
Re: [slurm-users] how do array jobs stored in slurmdb database?
On 1/29/21 3:51 AM, taleinterve...@sjtu.edu.cn wrote: The reason we need to delete job record from database is our billing system will calculate user cost from these historical records. But after a slurm system faulty there will be some specific jobs which should not be charged. it seems the best practical solution is to directly modify the database since slurm does not provide commend to delete job records. I think the sreport command is normally used to generate accounting reports. I have described this in my Wiki page https://wiki.fysik.dtu.dk/niflheim/Slurm_accounting#accounting-reports I would like to understand how you have chosen to calculate user cost of a given job using the sacct command? The sacct command will report accounting for each individual job, so which sacct options do you use to get the total cost value for a user with many jobs? /Ole -邮件原件- 发件人: Ole Holm Nielsen 发送时间: 2021年1月29日 0:14 收件人: slurm-users@lists.schedmd.com 主题: Re: [slurm-users] how do array jobs stored in slurmdb database? On 1/28/21 11:59 AM, taleinterve...@sjtu.edu.cn wrote: From query command such as ‘sacct -j 123456’ I can see a series of jobs named 123456_1, 123456_2, etc. And I need to delete these job records from mysql database for some reason. But in job_table of slurmdb, there is only one record with id_job=123456. not any record has a id like 123456_2. After I delete the id_job=123456 record, sacct result show the 123456_1 job disappeared, but other jobs in the array still exist. So how do these array job recorded in the database? And how to completely delete all the jobs in a array? I think you need to study how job arrays are implemented in Slurm, please read https://slurm.schedmd.com/job_array.html You will discover that job arrays, when each individual jobs start running, become independent jobs and obtain their own unique JobIDs. It must be those JobIDs that will appear in the Slurm database. This command illustrates the different JobID types (please read the squeue manual page about ArrayJobID,JobArrayID,JobID): $ squeue -j 3394902 -O ArrayJobID,JobArrayID,JobID ARRAY_JOB_IDJOBID JOBID 3394902 3394902_[18-91] 3394902 3394902 3394902_17 3394919 3394902 3394902_16 3394918 3394902 3394902_15 3394917 3394902 3394902_14 3394916 The last 4 jobs are running, while the first job i still pending. Perhaps you may find my "showjob" script useful: https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs In this script you can see how I work with array jobs. I did not answer your question about how to delete array jobs in the Slurm database. But in most cases manipulating the database directly is probably a bad idea. I wonder why you want to delete jobs in the database at all? Best regards, Ole
Re: [slurm-users] how do array jobs stored in slurmdb database?
On 1/29/21 3:51 AM, taleinterve...@sjtu.edu.cn wrote: Thanks for the help. The doc page is useful and we can get the actual job id now. I'm glad that you solved the problem. The reason we need to delete job record from database is our billing system will calculate user cost from these historical records. But after a slurm system faulty there will be some specific jobs which should not be charged. it seems the best practical solution is to directly modify the database since slurm does not provide commend to delete job records. I understand, but I think there ought to be a way to set the charging cost (money) of specific jobs in the database to zero. I will post a separate message about that. /Ole -邮件原件-> -- Ole Holm Nielsen PhD, Senior HPC Officer Department of Physics, Technical University of Denmark, Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark E-mail: ole.h.niel...@fysik.dtu.dk Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/ Mobile: (+45) 5180 1620 发件人: Ole Holm Nielsen 发送时间: 2021年1月29日 0:14 收件人: slurm-users@lists.schedmd.com 主题: Re: [slurm-users] how do array jobs stored in slurmdb database? On 1/28/21 11:59 AM, taleinterve...@sjtu.edu.cn wrote: From query command such as ‘sacct -j 123456’ I can see a series of jobs named 123456_1, 123456_2, etc. And I need to delete these job records from mysql database for some reason. But in job_table of slurmdb, there is only one record with id_job=123456. not any record has a id like 123456_2. After I delete the id_job=123456 record, sacct result show the 123456_1 job disappeared, but other jobs in the array still exist. So how do these array job recorded in the database? And how to completely delete all the jobs in a array? I think you need to study how job arrays are implemented in Slurm, please read https://slurm.schedmd.com/job_array.html You will discover that job arrays, when each individual jobs start running, become independent jobs and obtain their own unique JobIDs. It must be those JobIDs that will appear in the Slurm database. This command illustrates the different JobID types (please read the squeue manual page about ArrayJobID,JobArrayID,JobID): $ squeue -j 3394902 -O ArrayJobID,JobArrayID,JobID ARRAY_JOB_IDJOBID JOBID 3394902 3394902_[18-91] 3394902 3394902 3394902_17 3394919 3394902 3394902_16 3394918 3394902 3394902_15 3394917 3394902 3394902_14 3394916 The last 4 jobs are running, while the first job i still pending. Perhaps you may find my "showjob" script useful: https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs In this script you can see how I work with array jobs. I did not answer your question about how to delete array jobs in the Slurm database. But in most cases manipulating the database directly is probably a bad idea. I wonder why you want to delete jobs in the database at all? Best regards, Ole
[slurm-users] Fairshare tree after SLURM upgrade
Hello, I'm going to upgrade my SLURM version from 17.11.5 to 19.05.1. I know this is not the last version, but I manage another cluster that is running, also, this version. My question is: during the process, I need to upgrade "slurmdbd". All the fairshare tree (with rawusage, effectvusage, fairshare, etc), will be kept in the new version after upgrading? Thanks.
Re: [slurm-users] [EXT]Re: only 1 job running
Thanks for the explanation Brian. Seems turning on IOMMU helped, as well as added sharing to slurm.conf: SelectType=select/cons_res SelectTypeParameters=CR_CPU Now all the CPUs are being used on all the compute nodes so things are working as expected. Thanks to everyone else on the list who helped also, Andy, Ole, Chris, appreciate it! Looking forward to helping out where I can as well. Brian Andrus wrote on 1/28/21 15:50: External Email Yep, Looks like you are on the right track. If the CPU count does not make sense to slurm, it will drain the node and jobs will not be able to start on them. There does seem more to it though. Detailed info about a job and node would help. The 'priority' pending jobs, you can ignore. Those aren't starting because another job is supposed to go first. That is the one with 'Resources' as the reason. Resources means the scheduler has allocated the resources on the node such that there aren't any left to be used. My bet here is that you aren't specifying memory. If you don't specify it, slurm assumes all memory on the node for the job. So, even if you are only using 1 cpu, all the memory is allocated, leaving none for any other job to run on the unallocated cpus. Brian Andrus On 1/28/2021 2:15 PM, Chandler wrote: Brian Andrus wrote on 1/28/21 13:59: What are the specific requests for resources from a job? Nodes, Cores, Memory, threads, etc? Well the jobs are only asking for 16 CPUs each. The 255 threads is weird though, seems to be related to this, https://askubuntu.com/questions/1182818/dual-amd-epyc-7742-cpus-show-only-255-threads The vendor recommended to turn on IOMMU in the BIOS so I will try that and see if it helps
Re: [slurm-users] how do array jobs stored in slurmdb database?
Thanks for the help. The doc page is useful and we can get the actual job id now. The reason we need to delete job record from database is our billing system will calculate user cost from these historical records. But after a slurm system faulty there will be some specific jobs which should not be charged. it seems the best practical solution is to directly modify the database since slurm does not provide commend to delete job records. -邮件原件- 发件人: Ole Holm Nielsen 发送时间: 2021年1月29日 0:14 收件人: slurm-users@lists.schedmd.com 主题: Re: [slurm-users] how do array jobs stored in slurmdb database? On 1/28/21 11:59 AM, taleinterve...@sjtu.edu.cn wrote: > From query command such as ‘sacct -j 123456’ I can see a series of > jobs named 123456_1, 123456_2, etc. And I need to delete these job > records from mysql database for some reason. > > But in job_table of slurmdb, there is only one record with id_job=123456. > not any record has a id like 123456_2. After I delete the > id_job=123456 record, sacct result show the 123456_1 job disappeared, > but other jobs in the array still exist. So how do these array job recorded > in the database? > And how to completely delete all the jobs in a array? I think you need to study how job arrays are implemented in Slurm, please read https://slurm.schedmd.com/job_array.html You will discover that job arrays, when each individual jobs start running, become independent jobs and obtain their own unique JobIDs. It must be those JobIDs that will appear in the Slurm database. This command illustrates the different JobID types (please read the squeue manual page about ArrayJobID,JobArrayID,JobID): $ squeue -j 3394902 -O ArrayJobID,JobArrayID,JobID ARRAY_JOB_IDJOBID JOBID 3394902 3394902_[18-91] 3394902 3394902 3394902_17 3394919 3394902 3394902_16 3394918 3394902 3394902_15 3394917 3394902 3394902_14 3394916 The last 4 jobs are running, while the first job i still pending. Perhaps you may find my "showjob" script useful: https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs In this script you can see how I work with array jobs. I did not answer your question about how to delete array jobs in the Slurm database. But in most cases manipulating the database directly is probably a bad idea. I wonder why you want to delete jobs in the database at all? Best regards, Ole
Re: [slurm-users] [EXT]Re: only 1 job running
Yep, Looks like you are on the right track. If the CPU count does not make sense to slurm, it will drain the node and jobs will not be able to start on them. There does seem more to it though. Detailed info about a job and node would help. The 'priority' pending jobs, you can ignore. Those aren't starting because another job is supposed to go first. That is the one with 'Resources' as the reason. Resources means the scheduler has allocated the resources on the node such that there aren't any left to be used. My bet here is that you aren't specifying memory. If you don't specify it, slurm assumes all memory on the node for the job. So, even if you are only using 1 cpu, all the memory is allocated, leaving none for any other job to run on the unallocated cpus. Brian Andrus On 1/28/2021 2:15 PM, Chandler wrote: Brian Andrus wrote on 1/28/21 13:59: What are the specific requests for resources from a job? Nodes, Cores, Memory, threads, etc? Well the jobs are only asking for 16 CPUs each. The 255 threads is weird though, seems to be related to this, https://askubuntu.com/questions/1182818/dual-amd-epyc-7742-cpus-show-only-255-threads The vendor recommended to turn on IOMMU in the BIOS so I will try that and see if it helps
Re: [slurm-users] [EXT]Re: only 1 job running
Brian Andrus wrote on 1/28/21 13:59: What are the specific requests for resources from a job? Nodes, Cores, Memory, threads, etc? Well the jobs are only asking for 16 CPUs each. The 255 threads is weird though, seems to be related to this, https://askubuntu.com/questions/1182818/dual-amd-epyc-7742-cpus-show-only-255-threads The vendor recommended to turn on IOMMU in the BIOS so I will try that and see if it helps
Re: [slurm-users] [EXT]Re: only 1 job running
You are getting close :) You can see why n010 is able to have multiple jobs. It shows more resources available. What are the specific requests for resources from a job? Nodes, Cores, Memory, threads, etc? Brian Andrus On 1/28/2021 12:52 PM, Chandler wrote: OK I'm getting this same output on nodes n[011-013]: # slurmd -C NodeName=n011 slurmd: error: FastSchedule will be removed in 20.02, as will the FastSchedule=0 functionality. Please consider removing this from your configuration now. slurmd: Considering each NUMA node as a socket slurmd: error: Thread count (255) not multiple of core count (128) CPUs=255 Boards=1 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=1031878 UpTime=86-20:59:54 # but on n010 it looks like: # slurmd -C NodeName=n010 CPUs=256 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=1031887 UpTime=20-00:01:31 #
Re: [slurm-users] [EXT]Re: only 1 job running
OK I'm getting this same output on nodes n[011-013]: # slurmd -C NodeName=n011 slurmd: error: FastSchedule will be removed in 20.02, as will the FastSchedule=0 functionality. Please consider removing this from your configuration now. slurmd: Considering each NUMA node as a socket slurmd: error: Thread count (255) not multiple of core count (128) CPUs=255 Boards=1 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=1031878 UpTime=86-20:59:54 # but on n010 it looks like: # slurmd -C NodeName=n010 CPUs=256 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 ThreadsPerCore=2 RealMemory=1031887 UpTime=20-00:01:31 #
Re: [slurm-users] only 1 job running
Christopher Samuel wrote on 1/28/21 12:50: Did you restart the slurm daemons when you added the new node? Some internal data structures (bitmaps) are build based on the number of nodes and they need to be rebuild with a restart in this situation. https://slurm.schedmd.com/faq.html#add_nodes Ok this seems to have helped a little. After restarting the services and killing the task and restarting it as well, more jobs are running. Still, 1. n[011-013] were in drain state, had to update those to "idle". 2. n[011-013] are also now running 8 jobs, but they should be running 16 n010 is running 16 jobs now as expected. The squeue looks like: JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 247 defq cromwell smrtanal PD 0:00 1 (Resources) 248 defq cromwell smrtanal PD 0:00 1 (Priority) ... 278 defq cromwell smrtanal PD 0:00 1 (Priority) 207 defq cromwell smrtanal R 5:01 1 n010 ... 222 defq cromwell smrtanal R 5:01 1 n010 223 defq cromwell smrtanal R 4:55 1 n011 ... 230 defq cromwell smrtanal R 4:55 1 n011 231 defq cromwell smrtanal R 4:55 1 n012 ... 238 defq cromwell smrtanal R 4:55 1 n012 239 defq cromwell smrtanal R 4:55 1 n013 ... 246 defq cromwell smrtanal R 4:55 1 n013 need to get the (Priority) and (Resources) jobs running on n[011-013]...
Re: [slurm-users] [EXT]Re: only 1 job running
Ahh. One one of the new nodes do: slurmd -C The output of that will tell you what those settings should be. I suspect they are off, which forces them into drain mode. Brian Andrus On 1/28/2021 12:25 PM, Chandler wrote: Andy Riebs wrote on 1/28/21 07:53: If the only changes to your system have been the slurm.conf configuration and the addition of a new node, the easiest way to track this down is probably to show us the diffs between the previous and current versions of slurm.conf, and a note about what's different about the new node that you want to address. Well I don't know what's different, that was months ago. Pretty sure i just added n010 to PartitionName and updated NodeName=n[010-013] Procs=256 CoresPerSocket=64 Sockets=2 ThreadsPerCore=2 since I turned on multi-threading.
Re: [slurm-users] [EXT]Re: only 1 job running
On 28-01-2021 21:21, Chandler wrote: Brian Andrus wrote on 1/28/21 12:07: scontrol update state=resume nodename=n[011-013] I tried that but got, slurm_update error: Invalid node state specified As Chris Samuel said, you must restart the Slurm daemons when adding (or removing) nodes! See a summary in https://wiki.fysik.dtu.dk/niflheim/SLURM#add-and-remove-nodes /Ole
Re: [slurm-users] [EXT]Re: only 1 job running
Andy Riebs wrote on 1/28/21 07:53: If the only changes to your system have been the slurm.conf configuration and the addition of a new node, the easiest way to track this down is probably to show us the diffs between the previous and current versions of slurm.conf, and a note about what's different about the new node that you want to address. Well I don't know what's different, that was months ago. Pretty sure i just added n010 to PartitionName and updated NodeName=n[010-013] Procs=256 CoresPerSocket=64 Sockets=2 ThreadsPerCore=2 since I turned on multi-threading.
Re: [slurm-users] [EXT]Re: only 1 job running
Brian Andrus wrote on 1/28/21 12:07: scontrol update state=resume nodename=n[011-013] I tried that but got, slurm_update error: Invalid node state specified
Re: [slurm-users] only 1 job running
On 1/27/21 9:28 pm, Chandler wrote: Hi list, we have a new cluster setup with Bright cluster manager. Looking into a support contract there, but trying to get community support in the mean time. I'm sure things were working when the cluster was delivered, but I provisioned an additional node and now the scheduler isn't quite working right. Did you restart the slurm daemons when you added the new node? Some internal data structures (bitmaps) are build based on the number of nodes and they need to be rebuild with a restart in this situation. https://slurm.schedmd.com/faq.html#add_nodes All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Re: [slurm-users] only 1 job running
Heh. Your nodes are drained. do: scontrol update state=resume nodename=n[011-013] If they go back into a drained state, you need to look into why. That will be in the slurmctld log. You can also see it with 'sinfo -R' Brian Andrus On 1/27/2021 10:18 PM, Chandler wrote: Made a little bit of progress by running sinfo: PARTITION AVAIL TIMELIMIT NODES STATE NODELIST defq* up infinite 3 drain n[011-013] defq* up infinite 1 alloc n010 not sure why n[011-013] are in drain state, that needs to be fixed. After some searching, I ran: scontrol update nodename=n[011-013] state=idle and now 1 additional job has started on each of the n[011-013], so now 4 jobs are running but the rest are still queued. They should all be running. After some more searching, I guess resource sharing needs to be turned on? Can help with doing that? I also attached the slurm.conf. Thanks
[slurm-users] sbatch output logs get truncated
This has started happening after upgrading slurm from 20.02 to latest 20.11. It seems like something exits too early, before slurm, or whatever else is writing that file, has a chance to flush the final output buffer to disk. For example, take this very simple batch script, which gets submitted via sbatch: #!/bin/bash #SBATCH --job-name=test #SBATCH --ntasks=1 #SBATCH --exclusive set -e echo A echo B sleep 5 echo C The resulting slurm-$jobid.out file is only > A > B The final echo never gets written to the output file. A lot of users print a final result status at the end, which then never hits the logs. So this is a major for them. The scripts run to completion just fine, it's only the log being missing the end. For example touching some file after the "echo C" will touch that file as expected. The behaviour is also not at all consistent. Sometimes the output log is written as expected, with no recognizable pattern. Though this seems to be the exception, majority of the time it's truncated. This was never an issue before the recent slurm update.
Re: [slurm-users] how do array jobs stored in slurmdb database?
On 1/28/21 11:59 AM, taleinterve...@sjtu.edu.cn wrote: From query command such as ‘sacct -j 123456’ I can see a series of jobs named 123456_1, 123456_2, etc. And I need to delete these job records from mysql database for some reason. But in job_table of slurmdb, there is only one record with id_job=123456. not any record has a id like 123456_2. After I delete the id_job=123456 record, sacct result show the 123456_1 job disappeared, but other jobs in the array still exist. So how do these array job recorded in the database? And how to completely delete all the jobs in a array? I think you need to study how job arrays are implemented in Slurm, please read https://slurm.schedmd.com/job_array.html You will discover that job arrays, when each individual jobs start running, become independent jobs and obtain their own unique JobIDs. It must be those JobIDs that will appear in the Slurm database. This command illustrates the different JobID types (please read the squeue manual page about ArrayJobID,JobArrayID,JobID): $ squeue -j 3394902 -O ArrayJobID,JobArrayID,JobID ARRAY_JOB_IDJOBID JOBID 3394902 3394902_[18-91] 3394902 3394902 3394902_17 3394919 3394902 3394902_16 3394918 3394902 3394902_15 3394917 3394902 3394902_14 3394916 The last 4 jobs are running, while the first job i still pending. Perhaps you may find my "showjob" script useful: https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs In this script you can see how I work with array jobs. I did not answer your question about how to delete array jobs in the Slurm database. But in most cases manipulating the database directly is probably a bad idea. I wonder why you want to delete jobs in the database at all? Best regards, Ole
Re: [slurm-users] only 1 job running
Hi Chandler, If the only changes to your system have been the slurm.conf configuration and the addition of a new node, the easiest way to track this down is probably to show us the diffs between the previous and current versions of slurm.conf, and a note about what's different about the new node that you want to address. Andy On 1/28/2021 1:18 AM, Chandler wrote: Made a little bit of progress by running sinfo: PARTITION AVAIL TIMELIMIT NODES STATE NODELIST defq* up infinite 3 drain n[011-013] defq* up infinite 1 alloc n010 not sure why n[011-013] are in drain state, that needs to be fixed. After some searching, I ran: scontrol update nodename=n[011-013] state=idle and now 1 additional job has started on each of the n[011-013], so now 4 jobs are running but the rest are still queued. They should all be running. After some more searching, I guess resource sharing needs to be turned on? Can help with doing that? I also attached the slurm.conf. Thanks
[slurm-users] how do array jobs stored in slurmdb database?
Hello, The question background is: >From query command such as 'sacct -j 123456' I can see a series of jobs named 123456_1, 123456_2, etc. And I need to delete these job records from mysql database for some reason. But in job_table of slurmdb, there is only one record with id_job=123456. not any record has a id like 123456_2. After I delete the id_job=123456 record, sacct result show the 123456_1 job disappeared, but other jobs in the array still exist. So how do these array job recorded in the database? And how to completely delete all the jobs in a array? Thanks.