[slurm-users] Limit concurrent gpu resources

2019-04-24 Thread Mike Cammilleri
Hi everyone,

We have a single node with 8 gpus. Users often pile up lots of pending jobs and 
are using all 8 at the same time, but for a user who just wants to do a short 
run debug job and needs one of the gpus, they are having to wait too long for a 
gpu to free up. Is there a way with gres.conf or qos to limit the number of 
concurrent gpus in use for all users? Most jobs submitted are single jobs, so 
they request a gpu with --gres=gpu:1 but submit many (no array), and our 
gres.conf looks like the following

Name=gpu File=/dev/nvidia0 #CPUs=0,1,2,3
Name=gpu File=/dev/nvidia1 #CPUs=4,5,6,7
Name=gpu File=/dev/nvidia2 #CPUs=8,9,10,11
Name=gpu File=/dev/nvidia3 #CPUs=12,13,14,15
Name=gpu File=/dev/nvidia4 #CPUs=16,17,18,19
Name=gpu File=/dev/nvidia5 #CPUs=20,21,22,23
Name=gpu File=/dev/nvidia6 #CPUs=24,25,26,27
Name=gpu File=/dev/nvidia7 #CPUs=28,29,30,31

I thought of insisting that they submit the jobs as an array and limit with %7, 
but maybe there's a more elegant solution using the config.

Any tips appreciated.


Mike Cammilleri

Systems Administrator

Department of Statistics | UW-Madison

1300 University Ave | Room 1280
608-263-6673 | mi...@stat.wisc.edu


Re: [slurm-users] Seff error with Slurm-18.08.1

2018-11-06 Thread Mike Cammilleri
Thanks for this. We'll try the workaround script. It is not mission-critical 
but our users have gotten accustomed to seeing these metrics at the end of each 
run and its nice to have. We are currently doing this in a test VM environment, 
so by the time we actually do the upgrade to the cluster perhaps the fix will 
be available then.


Mike Cammilleri

Systems Administrator

Department of Statistics | UW-Madison

1300 University Ave | Room 1280
608-263-6673 | mi...@stat.wisc.edu



From: slurm-users  on behalf of Chris 
Samuel 
Sent: Tuesday, November 6, 2018 5:03 AM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Seff error with Slurm-18.08.1

On 6/11/18 7:49 pm, Baker D.J. wrote:

> The good new is that I am assured by SchedMD that the bug has been fixed
> in v18.08.3.

Looks like it's fixed in this commmit.

commit 3d85c8f9240542d9e6dfb727244e75e449430aac
Author: Danny Auble 
Date:   Wed Oct 24 14:10:12 2018 -0600

 Handle symbol resolution errors in the 18.08 slurmdbd.

 Caused by b1ff43429f6426c when moving the slurmdbd agent internals.

 Bug 5882.


> Having said that we will probably live with this issue
> rather than disrupt users with another upgrade so soon .

An upgrade to 18.08.3 from 18.08.1 shouldn't be disruptive though,
should it?  We just flip a symlink and the users see the new binaries,
libraries, etc immediately, we can then restart daemons as and when we
need to (in the right order of course, slurmdbd, slurmctld and then
slurmd's).

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



Re: [slurm-users] Seff error with Slurm-18.08.1

2018-11-05 Thread Mike Cammilleri
I'm also interested in this issue since I've come across the same error today. 
We built Slurm-18.08.1 with the contribs packages on Ubuntu Bionic and seff is 
also complaining with

$ /s/slurm/bin/seff 36
perl: error: plugin_load_from_file: 
dlopen(/s/slurm/lib/slurm/accounting_storage_slurmdbd.so): 
/s/slurm/lib/slurm/accounting_storage_slurmdbd.so: undefined symbol: 
node_record_count
perl: error: Couldn't load specified plugin name for 
accounting_storage/slurmdbd: Dlopen of plugin file failed
perl: error: cannot create accounting_storage context for 
accounting_storage/slurmdbd
perl: error: plugin_load_from_file: 
dlopen(/s/slurm/lib/slurm/accounting_storage_slurmdbd.so): 
/s/slurm/lib/slurm/accounting_storage_slurmdbd.so: undefined symbol: 
node_record_count
perl: error: Couldn't load specified plugin name for 
accounting_storage/slurmdbd: Dlopen of plugin file failed
perl: error: cannot create accounting_storage context for 
accounting_storage/slurmdbd
Job not found.





Mike Cammilleri

Systems Administrator

Department of Statistics | UW-Madison

1300 University Ave | Room 1280
608-263-6673 | mi...@stat.wisc.edu



From: slurm-users  on behalf of Miguel 
A. Sánchez 
Sent: Tuesday, October 23, 2018 10:26 AM
To: slurm-us...@schedmd.com
Subject: [slurm-users] Seff error with Slurm-18.08.1

Hi all

I have updated my slurm from the 17.11.0 version to the 18.08.1. With
the previous version, the 17.11.0 version, the seff tool was working
fine but with the 18.08.1 version, when I try to run the seff tool I
receive the next error message:

# ./seff 
perl: error: plugin_load_from_file:
dlopen(/usr/local/slurm-18.08.2/lib/slurm/accounting_storage_slurmdbd.so):
/usr/local/slurm-18.08.2/lib/slurm/accounting_storage_slurmdbd.so:
undefined symbol: node_record_count
perl: error: Couldn't load specified plugin name for
accounting_storage/slurmdbd: Dlopen of plugin file failed
perl: error: cannot create accounting_storage context for
accounting_storage/slurmdbd
perl: error: plugin_load_from_file:
dlopen(/usr/local/slurm-18.08.2/lib/slurm/accounting_storage_slurmdbd.so):
/usr/local/slurm-18.08.2/lib/slurm/accounting_storage_slurmdbd.so:
undefined symbol: node_record_count
perl: error: Couldn't load specified plugin name for
accounting_storage/slurmdbd: Dlopen of plugin file failed
perl: error: cannot create accounting_storage context for
accounting_storage/slurmdbd
Job not found.
#

Both Slurm installations has been compiled from sources in the same
computer but only the seff that was compiled in the 17.11.0 version
works fine. To compile the seff tool, from the source Slurm tree:

cd contrib

make

make install

I think the problem is in the perlapi. Could it be a bug? Any Idea about
how can I fix this problem? Thanks a lot.


--

Miguel A. Sánchez Gómez
System Administrator
Research Programme on Biomedical Informatics - GRIB (IMIM-UPF)

Barcelona Biomedical Research Park (office 4.80)
Doctor Aiguader 88 | 08003 Barcelona (Spain)
Phone: +34/ 93 316 0522 | Fax: +34/ 93 3160 550
e-mail: miguelangel.sanc...@upf.edu




Re: [slurm-users] can't create memory group (cgroup)

2018-09-10 Thread Mike Cammilleri
Just an update: the cgroup.conf file could not be parsed when I added 
ConstrainKmemSpace=no. I guess this option is not compatible with our 
kernel/slurm versions on Ubuntu? Not sure. For now we took the lazy way out and 
rebooted nodes. Will try the kernel options or a full slurm update as time 
allows. 

-Original Message-
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
Mike Cammilleri
Sent: Monday, September 10, 2018 9:49 AM
To: Slurm User Community List 
Subject: Re: [slurm-users] can't create memory group (cgroup)

Thanks everyone for your responses.  It looks like the two suggestions were:

1. add "cgroup_enable=memory swapaccount=1" to the kernel command  by adding it 
to /etc/default/grub in the GRUB_CMDLIND_LINUX variable 2. Add 
ConstrainKmemSpace=no in cgroup.conf

>From this information I think option 2 is the least troublesome so we'll give 
>that a shot first. Changing the kernel options would be the second try I 
>suppose. Eventually we'll upgrade SLURM and OS versions but you knowwhen 
>things are functional and work is getting done its hard to justify during 
>an academic semester.

--mike

-Original Message-
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
Chris Samuel
Sent: Monday, September 10, 2018 6:49 AM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] can't create memory group (cgroup)

On Monday, 10 September 2018 4:42:00 PM AEST Janne Blomqvist wrote:

> One workaround is to reboot the node whenever this happens.  Another 
> is to set ConstrainKmemSpace=no is cgroup.conf (but AFAICS this option 
> was added in slurm 17.02 and is not present in 16.05 that you're using).

Phew, we had to set ConstrainKmemSpace=no to avoid breaking Intel Omnipath so 
looks like we dodged a bullet there.  Nice work tracking it down!

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC








Re: [slurm-users] can't create memory group (cgroup)

2018-09-10 Thread Mike Cammilleri
Thanks everyone for your responses.  It looks like the two suggestions were:

1. add "cgroup_enable=memory swapaccount=1" to the kernel command  by adding it 
to /etc/default/grub in the GRUB_CMDLIND_LINUX variable
2. Add ConstrainKmemSpace=no in cgroup.conf

>From this information I think option 2 is the least troublesome so we'll give 
>that a shot first. Changing the kernel options would be the second try I 
>suppose. Eventually we'll upgrade SLURM and OS versions but you knowwhen 
>things are functional and work is getting done its hard to justify during 
>an academic semester.

--mike

-Original Message-
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
Chris Samuel
Sent: Monday, September 10, 2018 6:49 AM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] can't create memory group (cgroup)

On Monday, 10 September 2018 4:42:00 PM AEST Janne Blomqvist wrote:

> One workaround is to reboot the node whenever this happens.  Another 
> is to set ConstrainKmemSpace=no is cgroup.conf (but AFAICS this option 
> was added in slurm 17.02 and is not present in 16.05 that you're using).

Phew, we had to set ConstrainKmemSpace=no to avoid breaking Intel Omnipath so 
looks like we dodged a bullet there.  Nice work tracking it down!

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC







[slurm-users] can't create memory group (cgroup)

2018-09-07 Thread Mike Cammilleri
Hi everyone,

I'm getting this error lately for everyone's jobs, which results in memory not 
being constrained via the cgroups plugin.


slurmstepd: error: task/cgroup: unable to add task[pid=21681] to memory cg 
'(null)'
slurmstepd: error: jobacct_gather/cgroup: unable to instanciate user 3691 
memory cgroup

The result is that no uid_ direcotries are created under /sys/fs/cgroup/memory


Here is our cgroup.conf file:

CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/cgroup"
CgroupMountpoint=/sys/fs/cgroup
ConstrainCores=yes
ConstrainDevices=no
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
AllowedSwapSpace=0

We are using jobacct_gather/cgroup
# ACCOUNTING
JobAcctGatherType=jobacct_gather/cgroup

The partition is configured like this
PartitionName=long Nodes=marzano[05-13] PriorityTier=30 Default=NO MaxTime=5-0 
State=UP OverSubscribe=FORCE:1

We are using slurm 16.05.6 on Ubuntu 14.04 LTS

Any ideas how to get cgroups going again?



Re: [slurm-users] Are these threads actually unused?

2018-02-13 Thread Mike Cammilleri
I should also mention that of course we are aware that R is a single threaded 
application - but users can be doing all sorts of things within their R 
scripting. In this particular case this user is using the FLARE package I 
believe. Often they are seeking to do embarrassingly parallel types of tasks.

-Original Message-
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
Mike Cammilleri
Sent: Tuesday, February 13, 2018 10:31 AM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Are these threads actually unused?

I posted a question similar to this a couple months ago regarding CPU 
utilization which we figured out - sometimes too many threads on one cpu 
creates high CPU load, and thus slower compute time because things are waiting. 
 A more proper allocation should be set in the submit script (e.g. 
--cpus-per-task). We've been doing pretty good with CPU efficiency as we 
monitor users' allocations to make sure they're getting the most efficient 
resource reservations.

One thing I notice is that sometimes R has 48 threads but only one seems 
active. Looking at 'top' on a node that has 48 cpus:

10:27:46 up 52 days, 19:40,  1 user,  load average: 11.99, 11.98, 12.03

32862 hyunseu+  20   0 2304464 190184   8500 R 100.0  0.1 767:51.76 R   

   48  0
32919 hyunseu+  20   0 2302516 186568   8484 R 100.0  0.1 767:59.15 R   

   48  6
32932 hyunseu+  20   0 2303616 187688   8488 R 100.0  0.1 767:59.41 R   

   48  5
32947 hyunseu+  20   0 2303508 188028   8484 R 100.0  0.1 767:59.97 R   

   48  7
32950 hyunseu+  20   0 2305800 189668   8456 R 100.0  0.1 767:59.73 R   

   48  2
32964 hyunseu+  20   0 2303304 187972   8484 R 100.0  0.1 767:59.70 R   

   48  1
32980 hyunseu+  20   0 2303396 187284   8500 R 100.0  0.1 767:58.84 R   

   48  4

The far two right columns are "number of threads" and "last cpu used." Each of 
his R processes being launched using --array, are having 48 threads, however, 
the CPU utilization is a nice 100% and the load average on the node is around 
12 - which is how many array jobs are running on that node (I didn't copy/paste 
all of his processes listed in 'top'). So, it appears that one thread is 
running for each R process and things are proceeding nicely - but why do we see 
48 threads for each R process and are they truly unused? Would they find 
performance increases by correcting these to have single threads each?

I've noticed this difference with various versions of R. R installed via 
apt-get in /usr/bin will have many threads like this example - but the R I 
build for the cluster will list a single thread in 'top' unless another package 
or certain method causes it to do otherwise. In this case, the users is using 
R-3.4.3/bin/Rscript

Thanks!
mike




[slurm-users] Are these threads actually unused?

2018-02-13 Thread Mike Cammilleri
I posted a question similar to this a couple months ago regarding CPU 
utilization which we figured out - sometimes too many threads on one cpu 
creates high CPU load, and thus slower compute time because things are waiting. 
 A more proper allocation should be set in the submit script (e.g. 
--cpus-per-task). We've been doing pretty good with CPU efficiency as we 
monitor users' allocations to make sure they're getting the most efficient 
resource reservations.

One thing I notice is that sometimes R has 48 threads but only one seems 
active. Looking at 'top' on a node that has 48 cpus:

10:27:46 up 52 days, 19:40,  1 user,  load average: 11.99, 11.98, 12.03

32862 hyunseu+  20   0 2304464 190184   8500 R 100.0  0.1 767:51.76 R   

   48  0
32919 hyunseu+  20   0 2302516 186568   8484 R 100.0  0.1 767:59.15 R   

   48  6
32932 hyunseu+  20   0 2303616 187688   8488 R 100.0  0.1 767:59.41 R   

   48  5
32947 hyunseu+  20   0 2303508 188028   8484 R 100.0  0.1 767:59.97 R   

   48  7
32950 hyunseu+  20   0 2305800 189668   8456 R 100.0  0.1 767:59.73 R   

   48  2
32964 hyunseu+  20   0 2303304 187972   8484 R 100.0  0.1 767:59.70 R   

   48  1
32980 hyunseu+  20   0 2303396 187284   8500 R 100.0  0.1 767:58.84 R   

   48  4

The far two right columns are "number of threads" and "last cpu used." Each of 
his R processes being launched using --array, are having 48 threads, however, 
the CPU utilization is a nice 100% and the load average on the node is around 
12 - which is how many array jobs are running on that node (I didn't copy/paste 
all of his processes listed in 'top'). So, it appears that one thread is 
running for each R process and things are proceeding nicely - but why do we see 
48 threads for each R process and are they truly unused? Would they find 
performance increases by correcting these to have single threads each?

I've noticed this difference with various versions of R. R installed via 
apt-get in /usr/bin will have many threads like this example - but the R I 
build for the cluster will list a single thread in 'top' unless another package 
or certain method causes it to do otherwise. In this case, the users is using 
R-3.4.3/bin/Rscript

Thanks!
mike



Re: [slurm-users] detectCores() mess

2017-12-11 Thread Mike Cammilleri
Thanks for the responses. I think I didn't investigate deep enough - it appears 
that although I saw many processes running and a load average of something very 
high, the cgroups are indeed allocating the correct number of cores to the 
jobs, and threads are simply going to wait to run on the same cores/threads 
that were allocated.

I guess that when this happens, the load average in 'top' can show an extremely 
elevated number due to the fact that lots of processes are waiting to run - but 
in fact the overall availability of the node is still quite open as there are 
plenty of available cores left for other jobs. Would this be an accurate 
interpretation of the scheduling and load I'm observing? Are there impacts to 
the performance of the node when it is in this state?

Thanks everyone.


-Original Message-
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
Chris Samuel
Sent: Friday, December 8, 2017 6:46 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] detectCores() mess

On 9/12/17 4:54 am, Mike Cammilleri wrote:

> I thought cgroups (which we are using) would prevent some of this 
> behavior on the nodes (we are constraining CPU and RAM) -I'd like 
> there to be no I/O wait times if possible. I would like it if either 
> linux or slurm could constrain a job from grabbing more cores than 
> assigned at submit time. Is there something else I should be 
> configuring to safeguard against this behavior? If SLURM assigns 1 cpu 
> to the task then no matter what craziness is in the code, 1 is all 
> they're getting. Possible?

That is exactly what cgroups does, a process within a cgroup that only has a 
single core available to it will only be able to use that one core.  If it 
fires up (for example) 8 threads or processes then they will all run, but they 
will all be contending for that single core.

You can check the cgroup for a process with:

cat /proc/$PID/cgroup

 From that you should be able to find the cgroup in the cpuset controller and 
see how many cores are available to it.

You mention I/O wait times, that's going to be separate to the number of cores 
available to a code, could you elaborate a little on what you are seeing there?

There is some support for this in current kernels, but I don't know when that 
landed and whether that will be in the kernel available to you.  Also I don't 
remember seeing mention for support for that in Slurm.

https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt

Best of luck,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



[slurm-users] detectCores() mess

2017-12-08 Thread Mike Cammilleri
Hi,

We have allowed some courses to use our slurm cluster for teaching purposes, 
which of course leads to all kinds of exciting experiments - not always the 
most clever programming but it certainly teaches me where we need tighten up 
configurations.

The default method of thinking for many students just starting out is to grab 
as much CPU as possible - not fully understanding cluster computing and batch 
scheduling. One example I see often is students using the R parallel package 
and calling detectCores(), which of course is returning all the cores linux 
reports. They also did not specify --ntasks, so slurm assigns 1 of course - but 
there is no check on the ballooning of R processes created with detectCores() 
and then whatever they're doing with that number. Now we have overloaded nodes.

I see that availableCores() is suggested as a more friendly method for shared 
resources like this, where it would return the number of cores that were 
assigned via SLURM. Therefore, a student using the parallel package would need 
to explicitly specify the number of cores in their submit file. This would be 
nice IF students voluntarily used availableCores() instead of detectCores(), 
but we know that's not really enforceable.

I thought cgroups (which we are using) would prevent some of this behavior on 
the nodes (we are constraining CPU and RAM) -I'd like there to be no I/O wait 
times if possible. I would like it if either linux or slurm could constrain a 
job from grabbing more cores than assigned at submit time. Is there something 
else I should be configuring to safeguard against this behavior? If SLURM 
assigns 1 cpu to the task then no matter what craziness is in the code, 1 is 
all they're getting. Possible?

Thanks for any insight!

--mike