On 27 November 2013 15:49, Peter Clapham <[email protected]> wrote:

> The enforcement of the memory limit has to date either been via wrapping
> jobs on startup by the scheduler with ulimit or via a local daemon sending
> a kill command when it notices that the job or job component exceeded the
> initial set limits.
>
> Both the above approaches have limitations which can confuse users. The
> CGROUP approach seems to effectively take on the roll of ulimits on
> steroids and allows for accurate memory tracking and enforcement. This
> ensures that the job output includes the actual memory usage when killed as
> well as ensuring that the job cannot break the set limits.
>
> I totally agree with what you say.
If you run jobs with cpusets, then you don't need to depend on limits or
the batch scheduler getting round and killing out-of-limits jobs
- if the job gets 'too big for its boots' the OOM killer deals with it.
Also collapsing a cpuset at the end of a job means that you don't have
stray processes being left running with badly behaved codes.
(Yes, I know you can and should run scripts after a job finishes to deal
with these - and indeed I do).
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to