Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-10 Thread Bjørn-Helge Mevik
Matthew BETTINGER  writes:

> Just curious if this option or oom setting (which we use) can leave
> the nodes in CG "completing" state.

I don't think so.  As far as I know, jobs go into completing state when
Slurm is cancelling them or when they exit on their own, and stays in
that state until any epilogs are run.  In my experience, the most
typical reasons for jobs hanging in CG are disk system failures or other
failures leading to either the job processes or the epilog processes
hanging in "disk wait".

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo


signature.asc
Description: PGP signature


Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-09 Thread Matthew BETTINGER
Just curious if this option or oom setting (which we use) can leave the nodes 
in CG  "completing" state.  We have CG states quite often and only way is to 
reboot the node.  I believe it occurs when parent process dies or gets killed 
or Z?  Thanks.

MB

On 10/8/19, 6:11 AM, "slurm-users on behalf of Bjørn-Helge Mevik" 
 
wrote:

Marcus Boden  writes:

> you're looking for KillOnBadExit in the slurm.conf:
> KillOnBadExit

[...]

> this should terminate the job if a step or a process gets oom-killed.

That is a good tip!

But as I read the documentation (I haven't tested it), it will only kill
the job step itself, it will not kill the whole job.  Also, it will only
have effect for things started with srun, mpirun or similar.  However,
in combination with "set -o errexit", I believe most OOM kills would get
the job itself terminated.

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo




Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-09 Thread Jean-mathieu CHANTREIN



- Mail original -

> Maybe I missed something else...


That's right. Thank to Bjørn-Helge who help me.

You must enable swapaccount in the kernel as shown here: 
https://unix.stackexchange.com/questions/531480/what-does-swapaccount-1-in-grub-cmdline-linux-default-do
By default, this is apparently not necessary to explicit this option for RHEL7, 
but I'm on Debian.

Now everything works fine with this cgroup.conf configuration:

CgroupAutomount=yes
ConstrainCores=yes
ConstrainSwapSpace=yes

Thanks.

Best regards,

Jean-Mathieu



Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-08 Thread Bjørn-Helge Mevik
Marcus Boden  writes:

> you're looking for KillOnBadExit in the slurm.conf:
> KillOnBadExit

[...]

> this should terminate the job if a step or a process gets oom-killed.

That is a good tip!

But as I read the documentation (I haven't tested it), it will only kill
the job step itself, it will not kill the whole job.  Also, it will only
have effect for things started with srun, mpirun or similar.  However,
in combination with "set -o errexit", I believe most OOM kills would get
the job itself terminated.

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo


signature.asc
Description: PGP signature


Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-08 Thread Bjørn-Helge Mevik
Juergen Salk  writes:

> that is interesting. We have a very similar setup as well. However, in
> our Slurm test cluster I have noticed that it is not the *job* that
> gets killed. Instead, the OOM killer terminates one (or more)
> *processes*

Yes, that is how the kernel OOM killer works.

This is why we always tell users to use "set -o errexit" in their job
scripts.  Then at least the job script exits as soon as one of its
processes are killed.

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo


signature.asc
Description: PGP signature


Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-08 Thread Jean-mathieu CHANTREIN
Hello, thanks for you answers,

> - Does it work if you remove the space in "TaskPlugin=task/affinity,
>  task/cgroup"? (Slurm can be quite picky when reading slurm.conf).

It was the case, I make a mistake when I copy/cut... So, I haven't space here.

> 
> - See in slurmd.log on the node(s) of the job if cgroup actually gets
>  activated and starts limit memory for the job, or if there are any
>  errors related to cgroup.

Yes, example:
Launching batch job 1605839 for UID 
[1605839.batch] task/cgroup: /slurm/uid_/job_1605839: alloc=200MB 
mem.limit=200MB memsw.limit=200MB
[1605839.batch] task/cgroup: /slurm/uid_/job_1605839/step_batch: 
alloc=200MB mem.limit=200MB memsw.limit=200MB

> 
> - While a job is running, see in the cgroup memory directory (typically
>  /sys/fs/cgroup/memory/slurm/uid_/job_ for the job (on the
>  compute node).  Does the values there, for instance
>  memory.limit_in_bytes and memory.max_usage_in_bytes, make sense?

Yes, for the same job:
cat /sys/fs/cgroup/memory/slurm/uid_/job_1605839/memory.limit_in_bytes 
209715200
root@star190:~# cat 
/sys/fs/cgroup/memory/slurm/uid_/job_1605839/memory.max_usage_in_bytes 
209715200

But:

cat /sys/fs/cgroup/memory/slurm/uid_/job_1605839/memory.usage_in_bytes 
209711104

is always under memory.max_usage_in_bytes. I think it's because the field 
ConstrainRAMSpace=yes in cgroup.conf, and the process swap (with 
ConstrainRAMSpace=no)... I try configuration of Michael Renfro in precedent 
email, but when ConstrainRAMSpace=no and ConstrainSwapSpace=no, cgroup are no 
activate for the job (nothing appears in slurm.log or 
/sys/fs/cgroup/memory/slurm/uid_/ ) Set the MemEnforceLimit to no or yes 
seem to be have no influence...

Maybe I missed something else...

Regards,

Jean-Mathieu
 
> --
> Regards,
> Bjørn-Helge Mevik, dr. scient,
> Department for Research Computing, University of Oslo



Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-08 Thread Juergen Salk
> On 19-10-08 10:36, Juergen Salk wrote:
> > * Bjørn-Helge Mevik  [191008 08:34]:
> > > Jean-mathieu CHANTREIN  writes:
> > > 
> > > > I tried using, in slurm.conf 
> > > > TaskPlugin=task/affinity, task/cgroup 
> > > > SelectTypeParameters=CR_CPU_Memory 
> > > > MemLimitEnforce=yes 
> > > >
> > > > and in cgroup.conf: 
> > > > CgroupAutomount=yes 
> > > > ConstrainCores=yes 
> > > > ConstrainRAMSpace=yes 
> > > > ConstrainSwapSpace=yes 
> > > > MaxSwapPercent=10 
> > > > TaskAffinity=no 
> > > 
> > > We have a very similar setup, the biggest difference being that we have
> > > MemLimitEnforce=no, and leave the killing to the kernel's cgroup.  For
> > > us, jobs are killed as they should. [...] 
> > 
> > that is interesting. We have a very similar setup as well. However, in
> > our Slurm test cluster I have noticed that it is not the *job* that
> > gets killed. Instead, the OOM killer terminates one (or more)
> > *processes* but keeps the job itself running in a potentially 
> > unhealthy state.
> > 
> > Is there a way to tell Slurm to terminate the whole job as soon as 
> > the first OOM kill event takes place during execution? 

* Marcus Boden  [191008 10:46]:
> 
> you're looking for KillOnBadExit in the slurm.conf:
> KillOnBadExit
>
> If set to 1, a step will be terminated immediately if any task
> is crashed or aborted, as indicated by a non-zero exit code.
> With the default value of 0, if one of the processes is crashed
> or aborted the other processes will continue to run while the
> crashed or aborted process waits. The user can override this
> configuration parameter by using srun's -K, --kill-on-bad-exit.
> 
> this should terminate the job if a step or a process gets oom-killed.

Hi Marcus,

thank you. I did not consider `KillOnBadExit=1´ so far.

It seems this does indeed kill the current job step if it hits the
memory limit - but then happily proceeds with the next one. 

I've also noticed that, in order to work as described above, this 
requires all the processes to be launched via srun from within the 
batch script. Right?

Admittedly, I am also somewhat scared about potential side effects
with `KillOnBadExit=1´ set in a productive environment that needs to
cope with all sorts of batch scripts. A non-zero exit code of some
process may or may not harm the batch job whereas process(es) that get
oom-killed most probably affect the job as a whole. Is
`KillOnBadExit=1´ commonly used?

Thanks again.

Best regards
Jürgen

-- 
Jürgen Salk
Scientific Software & Compute Services (SSCS)
Kommunikations- und Informationszentrum (kiz)
Universität Ulm
Telefon: +49 (0)731 50-22478
Telefax: +49 (0)731 50-22471



Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-08 Thread Marcus Boden
Hi Jürgen,

you're looking for KillOnBadExit in the slurm.conf:
KillOnBadExit
If set to 1, a step will be terminated immediately if any task is crashed 
or aborted, as indicated by a non-zero exit code. With the default value of 0, 
if one of the processes is crashed or aborted the other processes will continue 
to run while the crashed or aborted process waits. The user can override this 
configuration parameter by using srun's -K, --kill-on-bad-exit.

this should terminate the job if a step or a process gets oom-killed.

Best,
Marcus

On 19-10-08 10:36, Juergen Salk wrote:
> * Bjørn-Helge Mevik  [191008 08:34]:
> > Jean-mathieu CHANTREIN  writes:
> > 
> > > I tried using, in slurm.conf 
> > > TaskPlugin=task/affinity, task/cgroup 
> > > SelectTypeParameters=CR_CPU_Memory 
> > > MemLimitEnforce=yes 
> > >
> > > and in cgroup.conf: 
> > > CgroupAutomount=yes 
> > > ConstrainCores=yes 
> > > ConstrainRAMSpace=yes 
> > > ConstrainSwapSpace=yes 
> > > MaxSwapPercent=10 
> > > TaskAffinity=no 
> > 
> > We have a very similar setup, the biggest difference being that we have
> > MemLimitEnforce=no, and leave the killing to the kernel's cgroup.  For
> > us, jobs are killed as they should. [...] 
> 
> Hello Bjørn-Helge,
> 
> that is interesting. We have a very similar setup as well. However, in
> our Slurm test cluster I have noticed that it is not the *job* that
> gets killed. Instead, the OOM killer terminates one (or more)
> *processes* but keeps the job itself running in a potentially 
> unhealthy state.
> 
> Is there a way to tell Slurm to terminate the whole job as soon as 
> the first OOM kill event takes place during execution? 
> 
> Best regards
> Jürgen
> 
> -- 
> Jürgen Salk
> Scientific Software & Compute Services (SSCS)
> Kommunikations- und Informationszentrum (kiz)
> Universität Ulm
> Telefon: +49 (0)731 50-22478
> Telefax: +49 (0)731 50-22471
> 

-- 
Marcus Vincent Boden, M.Sc.
Arbeitsgruppe eScience
Tel.:   +49 (0)551 201-2191
E-Mail: mbo...@gwdg.de
---
Gesellschaft fuer wissenschaftliche
Datenverarbeitung mbH Goettingen (GWDG)
Am Fassberg 11, 37077 Goettingen
URL:http://www.gwdg.de
E-Mail: g...@gwdg.de
Tel.:   +49 (0)551 201-1510
Fax:+49 (0)551 201-2150
Geschaeftsfuehrer: Prof. Dr. Ramin Yahyapour
Aufsichtsratsvorsitzender:
Prof. Dr. Christian Griesinger
Sitz der Gesellschaft: Goettingen
Registergericht: Goettingen
Handelsregister-Nr. B 598
---


smime.p7s
Description: S/MIME cryptographic signature


Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-08 Thread Juergen Salk
* Bjørn-Helge Mevik  [191008 08:34]:
> Jean-mathieu CHANTREIN  writes:
> 
> > I tried using, in slurm.conf 
> > TaskPlugin=task/affinity, task/cgroup 
> > SelectTypeParameters=CR_CPU_Memory 
> > MemLimitEnforce=yes 
> >
> > and in cgroup.conf: 
> > CgroupAutomount=yes 
> > ConstrainCores=yes 
> > ConstrainRAMSpace=yes 
> > ConstrainSwapSpace=yes 
> > MaxSwapPercent=10 
> > TaskAffinity=no 
> 
> We have a very similar setup, the biggest difference being that we have
> MemLimitEnforce=no, and leave the killing to the kernel's cgroup.  For
> us, jobs are killed as they should. [...] 

Hello Bjørn-Helge,

that is interesting. We have a very similar setup as well. However, in
our Slurm test cluster I have noticed that it is not the *job* that
gets killed. Instead, the OOM killer terminates one (or more)
*processes* but keeps the job itself running in a potentially 
unhealthy state.

Is there a way to tell Slurm to terminate the whole job as soon as 
the first OOM kill event takes place during execution? 

Best regards
Jürgen

-- 
Jürgen Salk
Scientific Software & Compute Services (SSCS)
Kommunikations- und Informationszentrum (kiz)
Universität Ulm
Telefon: +49 (0)731 50-22478
Telefax: +49 (0)731 50-22471



Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-08 Thread Bjørn-Helge Mevik
Jean-mathieu CHANTREIN  writes:

> I tried using, in slurm.conf 
> TaskPlugin=task/affinity, task/cgroup 
> SelectTypeParameters=CR_CPU_Memory 
> MemLimitEnforce=yes 
>
> and in cgroup.conf: 
> CgroupAutomount=yes 
> ConstrainCores=yes 
> ConstrainRAMSpace=yes 
> ConstrainSwapSpace=yes 
> MaxSwapPercent=10 
> TaskAffinity=no 

We have a very similar setup, the biggest difference being that we have
MemLimitEnforce=no, and leave the killing to the kernel's cgroup.  For
us, jobs are killed as they should.  Here are a couple of things you
could check:

- Does it work if you remove the space in "TaskPlugin=task/affinity,
  task/cgroup"? (Slurm can be quite picky when reading slurm.conf).

- See in slurmd.log on the node(s) of the job if cgroup actually gets
  activated and starts limit memory for the job, or if there are any
  errors related to cgroup.

- While a job is running, see in the cgroup memory directory (typically
  /sys/fs/cgroup/memory/slurm/uid_/job_ for the job (on the
  compute node).  Does the values there, for instance
  memory.limit_in_bytes and memory.max_usage_in_bytes, make sense?

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo


signature.asc
Description: PGP signature


Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-07 Thread Renfro, Michael
Our cgroup settings are quite a bit different, and we don’t allow jobs to swap, 
but the following works to limit memory here (I know, because I get emails 
frequent emails from users who don’t change their jobs from the default 2 GB 
per CPU that we use):

CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=no
CgroupReleaseAgentDir="/etc/slurm/cgroup"
AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf"
ConstrainCores=yes# Not the Slurm default
TaskAffinity=no   # Slurm default
ConstrainRAMSpace=no  # Slurm default
ConstrainSwapSpace=no # Slurm default
ConstrainDevices=no   # Slurm default
AllowedRamSpace=100   # Slurm default
AllowedSwapSpace=0# Slurm default
MaxRAMPercent=100 # Slurm default
MaxSwapPercent=100# Slurm default
MinRAMSpace=30# Slurm default

> On Oct 7, 2019, at 11:55 AM, Jean-mathieu CHANTREIN 
>  wrote:
> 
> External Email Warning
> This email originated from outside the university. Please use caution when 
> opening attachments, clicking links, or responding to requests.
> Hello,
> 
> I tried using, in slurm.conf
> TaskPlugin=task/affinity, task/cgroup
> SelectTypeParameters=CR_CPU_Memory
> MemLimitEnforce=yes
> 
> and in cgroup.conf:
> CgroupAutomount=yes
> ConstrainCores=yes
> ConstrainRAMSpace=yes
> ConstrainSwapSpace=yes
> MaxSwapPercent=10
> TaskAffinity=no
> 
> But when the job reaches its limit, it passes alternately from R to D state 
> without being killed, even when it exceeds the 10% of swap partition allowed.
> 
> Do you have an idea to do this?
> 
> Regards,
> 
> Jean-Mathieu



[slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-07 Thread Jean-mathieu CHANTREIN
Hello, 

I tried using, in slurm.conf 
TaskPlugin=task/affinity, task/cgroup 
SelectTypeParameters=CR_CPU_Memory 
MemLimitEnforce=yes 

and in cgroup.conf: 
CgroupAutomount=yes 
ConstrainCores=yes 
ConstrainRAMSpace=yes 
ConstrainSwapSpace=yes 
MaxSwapPercent=10 
TaskAffinity=no 

But when the job reaches its limit, it passes alternately from R to D state 
without being killed, even when it exceeds the 10% of swap partition allowed. 

Do you have an idea to do this? 

Regards, 

Jean-Mathieu