Re: [slurm-users] cgroup limits not created for jobs

2020-07-26 Thread Christopher Samuel

On 7/26/20 12:21 pm, Paul Raines wrote:


Thank you so much.  This also explains my GPU CUDA_VISIBLE_DEVICES missing
problem in my previous post.


I've missed that, but yes, that would do it.


As a new SLURM admin, I am a bit suprised at this default behavior.
Seems like a way for users to game the system by never running srun.


This is because by default salloc only requests a job allocation, it 
expects you to use srun to run an application on a compute node. But 
yes, it is non-obvious (as evidenced by the number of "sinteractive" and 
other scripts out there that folks have written not realising about the 
SallocDefaultCommand config option - I wrote one back in 2013!).



The only limit I suppose that is being really enforced at that point
is walltime?


Well the user isn't on the compute node so there's nothing really else 
to enforce.


I guess I need to research srun and SallocDefaultCommand more, but is 
there some way to set some kind of separate walltime limit on a

job for the time a salloc has to run srun?  It is not clear if one
can make a SallocDefaultCommand that does "srun ..." that really
covers all possibilities.


An srun inside of a salloc (just like an sbatch) should not be able to 
exceed the time limit for the job allocation.


If it helps this is the SallocDefaultCommand we use for our GPU nodes:

srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 -G 0 --gpus-per-task=0 
--gpus-per-node=0 --gpus-per-socket=0  --pty --preserve-env --mpi=none 
-m block $SHELL


We have to give all those possible permutations to not use various GPU 
GRES as otherwise this srun will consume them if the salloc asked for it 
and then when the user tries to "srun" their application across the 
nodes it will block as there won't be any available on this first node.


Of course the fact that because of this the user can't see the GPUs 
without the srun can confuse some people, but it's unavoidable for this 
use case.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] cgroup limits not created for jobs

2020-07-26 Thread Paul Raines



On Sat, 25 Jul 2020 2:00am, Chris Samuel wrote:


On Friday, 24 July 2020 9:48:35 AM PDT Paul Raines wrote:


But when I run a job on the node it runs I can find no
evidence in cgroups of any limits being set

Example job:

mlscgpu1[0]:~$ salloc -n1 -c3 -p batch --gres=gpu:quadro_rtx_6000:1 --mem=1G
salloc: Granted job allocation 17
mlscgpu1[0]:~$ echo $$
137112
mlscgpu1[0]:~$


You're not actually running inside a job at that point unless you've defined
"SallocDefaultCommand" in your slurm.conf, and I'm guessing that's not the
case there.  You can make salloc fire up an srun for you in the allocation
using that option, see the docs here:

https://slurm.schedmd.com/slurm.conf.html#OPT_SallocDefaultCommand



Thank you so much.  This also explains my GPU CUDA_VISIBLE_DEVICES missing
problem in my previous post.

As a new SLURM admin, I am a bit suprised at this default behavior.
Seems like a way for users to game the system by never running srun.

The only limit I suppose that is being really enforced at that point
is walltime?

I guess I need to research srun and SallocDefaultCommand more, but is 
there some way to set some kind of separate walltime limit on a

job for the time a salloc has to run srun?  It is not clear if one
can make a SallocDefaultCommand that does "srun ..." that really
covers all possibilities.




Re: [slurm-users] cgroup limits not created for jobs

2020-07-25 Thread Chris Samuel
On Friday, 24 July 2020 9:48:35 AM PDT Paul Raines wrote:

> But when I run a job on the node it runs I can find no
> evidence in cgroups of any limits being set
> 
> Example job:
> 
> mlscgpu1[0]:~$ salloc -n1 -c3 -p batch --gres=gpu:quadro_rtx_6000:1 --mem=1G
> salloc: Granted job allocation 17
> mlscgpu1[0]:~$ echo $$
> 137112
> mlscgpu1[0]:~$

You're not actually running inside a job at that point unless you've defined 
"SallocDefaultCommand" in your slurm.conf, and I'm guessing that's not the 
case there.  You can make salloc fire up an srun for you in the allocation 
using that option, see the docs here:

https://slurm.schedmd.com/slurm.conf.html#OPT_SallocDefaultCommand

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA






[slurm-users] cgroup limits not created for jobs

2020-07-24 Thread Paul Raines



I am not seeing any cgroup limits being put in place on the nodes
when jobs run.  I have slurm 20.02 running on CentOS 7.8

In slurm.conf I have

ProctrackType=proctrack/cgroup
TaskPlugin=task/affinity,task/cgroup
SelectTypeParameters=CR_Core_Memory
JobAcctGatherType=jobacct_gather/cgroup

and cgroup.conf has

CgroupAutomount=yes
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes

But when I run a job on the node it runs I can find no
evidence in cgroups of any limits being set

Example job:

mlscgpu1[0]:~$ salloc -n1 -c3 -p batch --gres=gpu:quadro_rtx_6000:1 --mem=1G
salloc: Granted job allocation 17
mlscgpu1[0]:~$ echo $$
137112
mlscgpu1[0]:~$

But in the cgroup fs

[root@mlscgpu1 slurm]# find /sys/fs/cgroup/ -name slurm 
[root@mlscgpu1 slurm]#
[root@mlscgpu1 slurm]# find /sys/fs/cgroup/ -name tasks -exec grep -l 137112 
{} \;

/sys/fs/cgroup/pids/user.slice/tasks
/sys/fs/cgroup/memory/user.slice/tasks
/sys/fs/cgroup/cpuset/tasks
/sys/fs/cgroup/blkio/user.slice/tasks
/sys/fs/cgroup/cpu,cpuacct/user.slice/tasks
/sys/fs/cgroup/devices/user.slice/tasks
/sys/fs/cgroup/net_cls,net_prio/tasks
/sys/fs/cgroup/perf_event/tasks
/sys/fs/cgroup/hugetlb/tasks
/sys/fs/cgroup/freezer/tasks
/sys/fs/cgroup/systemd/user.slice/user-5829.slice/session-80624.scope/tasks


---
Paul Raines http://help.nmr.mgh.harvard.edu
MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
149 (2301) 13th Street Charlestown, MA 02129USA