[slurm-users] Limit concurrent gpu resources
Hi everyone, We have a single node with 8 gpus. Users often pile up lots of pending jobs and are using all 8 at the same time, but for a user who just wants to do a short run debug job and needs one of the gpus, they are having to wait too long for a gpu to free up. Is there a way with gres.conf or qos to limit the number of concurrent gpus in use for all users? Most jobs submitted are single jobs, so they request a gpu with --gres=gpu:1 but submit many (no array), and our gres.conf looks like the following Name=gpu File=/dev/nvidia0 #CPUs=0,1,2,3 Name=gpu File=/dev/nvidia1 #CPUs=4,5,6,7 Name=gpu File=/dev/nvidia2 #CPUs=8,9,10,11 Name=gpu File=/dev/nvidia3 #CPUs=12,13,14,15 Name=gpu File=/dev/nvidia4 #CPUs=16,17,18,19 Name=gpu File=/dev/nvidia5 #CPUs=20,21,22,23 Name=gpu File=/dev/nvidia6 #CPUs=24,25,26,27 Name=gpu File=/dev/nvidia7 #CPUs=28,29,30,31 I thought of insisting that they submit the jobs as an array and limit with %7, but maybe there's a more elegant solution using the config. Any tips appreciated. Mike Cammilleri Systems Administrator Department of Statistics | UW-Madison 1300 University Ave | Room 1280 608-263-6673 | mi...@stat.wisc.edu
Re: [slurm-users] Seff error with Slurm-18.08.1
Thanks for this. We'll try the workaround script. It is not mission-critical but our users have gotten accustomed to seeing these metrics at the end of each run and its nice to have. We are currently doing this in a test VM environment, so by the time we actually do the upgrade to the cluster perhaps the fix will be available then. Mike Cammilleri Systems Administrator Department of Statistics | UW-Madison 1300 University Ave | Room 1280 608-263-6673 | mi...@stat.wisc.edu From: slurm-users on behalf of Chris Samuel Sent: Tuesday, November 6, 2018 5:03 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Seff error with Slurm-18.08.1 On 6/11/18 7:49 pm, Baker D.J. wrote: > The good new is that I am assured by SchedMD that the bug has been fixed > in v18.08.3. Looks like it's fixed in this commmit. commit 3d85c8f9240542d9e6dfb727244e75e449430aac Author: Danny Auble Date: Wed Oct 24 14:10:12 2018 -0600 Handle symbol resolution errors in the 18.08 slurmdbd. Caused by b1ff43429f6426c when moving the slurmdbd agent internals. Bug 5882. > Having said that we will probably live with this issue > rather than disrupt users with another upgrade so soon . An upgrade to 18.08.3 from 18.08.1 shouldn't be disruptive though, should it? We just flip a symlink and the users see the new binaries, libraries, etc immediately, we can then restart daemons as and when we need to (in the right order of course, slurmdbd, slurmctld and then slurmd's). All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
Re: [slurm-users] Seff error with Slurm-18.08.1
I'm also interested in this issue since I've come across the same error today. We built Slurm-18.08.1 with the contribs packages on Ubuntu Bionic and seff is also complaining with $ /s/slurm/bin/seff 36 perl: error: plugin_load_from_file: dlopen(/s/slurm/lib/slurm/accounting_storage_slurmdbd.so): /s/slurm/lib/slurm/accounting_storage_slurmdbd.so: undefined symbol: node_record_count perl: error: Couldn't load specified plugin name for accounting_storage/slurmdbd: Dlopen of plugin file failed perl: error: cannot create accounting_storage context for accounting_storage/slurmdbd perl: error: plugin_load_from_file: dlopen(/s/slurm/lib/slurm/accounting_storage_slurmdbd.so): /s/slurm/lib/slurm/accounting_storage_slurmdbd.so: undefined symbol: node_record_count perl: error: Couldn't load specified plugin name for accounting_storage/slurmdbd: Dlopen of plugin file failed perl: error: cannot create accounting_storage context for accounting_storage/slurmdbd Job not found. Mike Cammilleri Systems Administrator Department of Statistics | UW-Madison 1300 University Ave | Room 1280 608-263-6673 | mi...@stat.wisc.edu From: slurm-users on behalf of Miguel A. Sánchez Sent: Tuesday, October 23, 2018 10:26 AM To: slurm-us...@schedmd.com Subject: [slurm-users] Seff error with Slurm-18.08.1 Hi all I have updated my slurm from the 17.11.0 version to the 18.08.1. With the previous version, the 17.11.0 version, the seff tool was working fine but with the 18.08.1 version, when I try to run the seff tool I receive the next error message: # ./seff perl: error: plugin_load_from_file: dlopen(/usr/local/slurm-18.08.2/lib/slurm/accounting_storage_slurmdbd.so): /usr/local/slurm-18.08.2/lib/slurm/accounting_storage_slurmdbd.so: undefined symbol: node_record_count perl: error: Couldn't load specified plugin name for accounting_storage/slurmdbd: Dlopen of plugin file failed perl: error: cannot create accounting_storage context for accounting_storage/slurmdbd perl: error: plugin_load_from_file: dlopen(/usr/local/slurm-18.08.2/lib/slurm/accounting_storage_slurmdbd.so): /usr/local/slurm-18.08.2/lib/slurm/accounting_storage_slurmdbd.so: undefined symbol: node_record_count perl: error: Couldn't load specified plugin name for accounting_storage/slurmdbd: Dlopen of plugin file failed perl: error: cannot create accounting_storage context for accounting_storage/slurmdbd Job not found. # Both Slurm installations has been compiled from sources in the same computer but only the seff that was compiled in the 17.11.0 version works fine. To compile the seff tool, from the source Slurm tree: cd contrib make make install I think the problem is in the perlapi. Could it be a bug? Any Idea about how can I fix this problem? Thanks a lot. -- Miguel A. Sánchez Gómez System Administrator Research Programme on Biomedical Informatics - GRIB (IMIM-UPF) Barcelona Biomedical Research Park (office 4.80) Doctor Aiguader 88 | 08003 Barcelona (Spain) Phone: +34/ 93 316 0522 | Fax: +34/ 93 3160 550 e-mail: miguelangel.sanc...@upf.edu
Re: [slurm-users] can't create memory group (cgroup)
Just an update: the cgroup.conf file could not be parsed when I added ConstrainKmemSpace=no. I guess this option is not compatible with our kernel/slurm versions on Ubuntu? Not sure. For now we took the lazy way out and rebooted nodes. Will try the kernel options or a full slurm update as time allows. -Original Message- From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Mike Cammilleri Sent: Monday, September 10, 2018 9:49 AM To: Slurm User Community List Subject: Re: [slurm-users] can't create memory group (cgroup) Thanks everyone for your responses. It looks like the two suggestions were: 1. add "cgroup_enable=memory swapaccount=1" to the kernel command by adding it to /etc/default/grub in the GRUB_CMDLIND_LINUX variable 2. Add ConstrainKmemSpace=no in cgroup.conf >From this information I think option 2 is the least troublesome so we'll give >that a shot first. Changing the kernel options would be the second try I >suppose. Eventually we'll upgrade SLURM and OS versions but you knowwhen >things are functional and work is getting done its hard to justify during >an academic semester. --mike -Original Message- From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Chris Samuel Sent: Monday, September 10, 2018 6:49 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] can't create memory group (cgroup) On Monday, 10 September 2018 4:42:00 PM AEST Janne Blomqvist wrote: > One workaround is to reboot the node whenever this happens. Another > is to set ConstrainKmemSpace=no is cgroup.conf (but AFAICS this option > was added in slurm 17.02 and is not present in 16.05 that you're using). Phew, we had to set ConstrainKmemSpace=no to avoid breaking Intel Omnipath so looks like we dodged a bullet there. Nice work tracking it down! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
Re: [slurm-users] can't create memory group (cgroup)
Thanks everyone for your responses. It looks like the two suggestions were: 1. add "cgroup_enable=memory swapaccount=1" to the kernel command by adding it to /etc/default/grub in the GRUB_CMDLIND_LINUX variable 2. Add ConstrainKmemSpace=no in cgroup.conf >From this information I think option 2 is the least troublesome so we'll give >that a shot first. Changing the kernel options would be the second try I >suppose. Eventually we'll upgrade SLURM and OS versions but you knowwhen >things are functional and work is getting done its hard to justify during >an academic semester. --mike -Original Message- From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Chris Samuel Sent: Monday, September 10, 2018 6:49 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] can't create memory group (cgroup) On Monday, 10 September 2018 4:42:00 PM AEST Janne Blomqvist wrote: > One workaround is to reboot the node whenever this happens. Another > is to set ConstrainKmemSpace=no is cgroup.conf (but AFAICS this option > was added in slurm 17.02 and is not present in 16.05 that you're using). Phew, we had to set ConstrainKmemSpace=no to avoid breaking Intel Omnipath so looks like we dodged a bullet there. Nice work tracking it down! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
[slurm-users] can't create memory group (cgroup)
Hi everyone, I'm getting this error lately for everyone's jobs, which results in memory not being constrained via the cgroups plugin. slurmstepd: error: task/cgroup: unable to add task[pid=21681] to memory cg '(null)' slurmstepd: error: jobacct_gather/cgroup: unable to instanciate user 3691 memory cgroup The result is that no uid_ direcotries are created under /sys/fs/cgroup/memory Here is our cgroup.conf file: CgroupAutomount=yes CgroupReleaseAgentDir="/etc/cgroup" CgroupMountpoint=/sys/fs/cgroup ConstrainCores=yes ConstrainDevices=no ConstrainRAMSpace=yes ConstrainSwapSpace=yes AllowedSwapSpace=0 We are using jobacct_gather/cgroup # ACCOUNTING JobAcctGatherType=jobacct_gather/cgroup The partition is configured like this PartitionName=long Nodes=marzano[05-13] PriorityTier=30 Default=NO MaxTime=5-0 State=UP OverSubscribe=FORCE:1 We are using slurm 16.05.6 on Ubuntu 14.04 LTS Any ideas how to get cgroups going again?
Re: [slurm-users] Are these threads actually unused?
I should also mention that of course we are aware that R is a single threaded application - but users can be doing all sorts of things within their R scripting. In this particular case this user is using the FLARE package I believe. Often they are seeking to do embarrassingly parallel types of tasks. -Original Message- From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Mike Cammilleri Sent: Tuesday, February 13, 2018 10:31 AM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Are these threads actually unused? I posted a question similar to this a couple months ago regarding CPU utilization which we figured out - sometimes too many threads on one cpu creates high CPU load, and thus slower compute time because things are waiting. A more proper allocation should be set in the submit script (e.g. --cpus-per-task). We've been doing pretty good with CPU efficiency as we monitor users' allocations to make sure they're getting the most efficient resource reservations. One thing I notice is that sometimes R has 48 threads but only one seems active. Looking at 'top' on a node that has 48 cpus: 10:27:46 up 52 days, 19:40, 1 user, load average: 11.99, 11.98, 12.03 32862 hyunseu+ 20 0 2304464 190184 8500 R 100.0 0.1 767:51.76 R 48 0 32919 hyunseu+ 20 0 2302516 186568 8484 R 100.0 0.1 767:59.15 R 48 6 32932 hyunseu+ 20 0 2303616 187688 8488 R 100.0 0.1 767:59.41 R 48 5 32947 hyunseu+ 20 0 2303508 188028 8484 R 100.0 0.1 767:59.97 R 48 7 32950 hyunseu+ 20 0 2305800 189668 8456 R 100.0 0.1 767:59.73 R 48 2 32964 hyunseu+ 20 0 2303304 187972 8484 R 100.0 0.1 767:59.70 R 48 1 32980 hyunseu+ 20 0 2303396 187284 8500 R 100.0 0.1 767:58.84 R 48 4 The far two right columns are "number of threads" and "last cpu used." Each of his R processes being launched using --array, are having 48 threads, however, the CPU utilization is a nice 100% and the load average on the node is around 12 - which is how many array jobs are running on that node (I didn't copy/paste all of his processes listed in 'top'). So, it appears that one thread is running for each R process and things are proceeding nicely - but why do we see 48 threads for each R process and are they truly unused? Would they find performance increases by correcting these to have single threads each? I've noticed this difference with various versions of R. R installed via apt-get in /usr/bin will have many threads like this example - but the R I build for the cluster will list a single thread in 'top' unless another package or certain method causes it to do otherwise. In this case, the users is using R-3.4.3/bin/Rscript Thanks! mike
[slurm-users] Are these threads actually unused?
I posted a question similar to this a couple months ago regarding CPU utilization which we figured out - sometimes too many threads on one cpu creates high CPU load, and thus slower compute time because things are waiting. A more proper allocation should be set in the submit script (e.g. --cpus-per-task). We've been doing pretty good with CPU efficiency as we monitor users' allocations to make sure they're getting the most efficient resource reservations. One thing I notice is that sometimes R has 48 threads but only one seems active. Looking at 'top' on a node that has 48 cpus: 10:27:46 up 52 days, 19:40, 1 user, load average: 11.99, 11.98, 12.03 32862 hyunseu+ 20 0 2304464 190184 8500 R 100.0 0.1 767:51.76 R 48 0 32919 hyunseu+ 20 0 2302516 186568 8484 R 100.0 0.1 767:59.15 R 48 6 32932 hyunseu+ 20 0 2303616 187688 8488 R 100.0 0.1 767:59.41 R 48 5 32947 hyunseu+ 20 0 2303508 188028 8484 R 100.0 0.1 767:59.97 R 48 7 32950 hyunseu+ 20 0 2305800 189668 8456 R 100.0 0.1 767:59.73 R 48 2 32964 hyunseu+ 20 0 2303304 187972 8484 R 100.0 0.1 767:59.70 R 48 1 32980 hyunseu+ 20 0 2303396 187284 8500 R 100.0 0.1 767:58.84 R 48 4 The far two right columns are "number of threads" and "last cpu used." Each of his R processes being launched using --array, are having 48 threads, however, the CPU utilization is a nice 100% and the load average on the node is around 12 - which is how many array jobs are running on that node (I didn't copy/paste all of his processes listed in 'top'). So, it appears that one thread is running for each R process and things are proceeding nicely - but why do we see 48 threads for each R process and are they truly unused? Would they find performance increases by correcting these to have single threads each? I've noticed this difference with various versions of R. R installed via apt-get in /usr/bin will have many threads like this example - but the R I build for the cluster will list a single thread in 'top' unless another package or certain method causes it to do otherwise. In this case, the users is using R-3.4.3/bin/Rscript Thanks! mike
Re: [slurm-users] detectCores() mess
Thanks for the responses. I think I didn't investigate deep enough - it appears that although I saw many processes running and a load average of something very high, the cgroups are indeed allocating the correct number of cores to the jobs, and threads are simply going to wait to run on the same cores/threads that were allocated. I guess that when this happens, the load average in 'top' can show an extremely elevated number due to the fact that lots of processes are waiting to run - but in fact the overall availability of the node is still quite open as there are plenty of available cores left for other jobs. Would this be an accurate interpretation of the scheduling and load I'm observing? Are there impacts to the performance of the node when it is in this state? Thanks everyone. -Original Message- From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Chris Samuel Sent: Friday, December 8, 2017 6:46 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] detectCores() mess On 9/12/17 4:54 am, Mike Cammilleri wrote: > I thought cgroups (which we are using) would prevent some of this > behavior on the nodes (we are constraining CPU and RAM) -I'd like > there to be no I/O wait times if possible. I would like it if either > linux or slurm could constrain a job from grabbing more cores than > assigned at submit time. Is there something else I should be > configuring to safeguard against this behavior? If SLURM assigns 1 cpu > to the task then no matter what craziness is in the code, 1 is all > they're getting. Possible? That is exactly what cgroups does, a process within a cgroup that only has a single core available to it will only be able to use that one core. If it fires up (for example) 8 threads or processes then they will all run, but they will all be contending for that single core. You can check the cgroup for a process with: cat /proc/$PID/cgroup From that you should be able to find the cgroup in the cpuset controller and see how many cores are available to it. You mention I/O wait times, that's going to be separate to the number of cores available to a code, could you elaborate a little on what you are seeing there? There is some support for this in current kernels, but I don't know when that landed and whether that will be in the kernel available to you. Also I don't remember seeing mention for support for that in Slurm. https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt Best of luck, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
[slurm-users] detectCores() mess
Hi, We have allowed some courses to use our slurm cluster for teaching purposes, which of course leads to all kinds of exciting experiments - not always the most clever programming but it certainly teaches me where we need tighten up configurations. The default method of thinking for many students just starting out is to grab as much CPU as possible - not fully understanding cluster computing and batch scheduling. One example I see often is students using the R parallel package and calling detectCores(), which of course is returning all the cores linux reports. They also did not specify --ntasks, so slurm assigns 1 of course - but there is no check on the ballooning of R processes created with detectCores() and then whatever they're doing with that number. Now we have overloaded nodes. I see that availableCores() is suggested as a more friendly method for shared resources like this, where it would return the number of cores that were assigned via SLURM. Therefore, a student using the parallel package would need to explicitly specify the number of cores in their submit file. This would be nice IF students voluntarily used availableCores() instead of detectCores(), but we know that's not really enforceable. I thought cgroups (which we are using) would prevent some of this behavior on the nodes (we are constraining CPU and RAM) -I'd like there to be no I/O wait times if possible. I would like it if either linux or slurm could constrain a job from grabbing more cores than assigned at submit time. Is there something else I should be configuring to safeguard against this behavior? If SLURM assigns 1 cpu to the task then no matter what craziness is in the code, 1 is all they're getting. Possible? Thanks for any insight! --mike