On 27 November 2013 15:49, Peter Clapham <[email protected]> wrote:
> The enforcement of the memory limit has to date either been via wrapping > jobs on startup by the scheduler with ulimit or via a local daemon sending > a kill command when it notices that the job or job component exceeded the > initial set limits. > > Both the above approaches have limitations which can confuse users. The > CGROUP approach seems to effectively take on the roll of ulimits on > steroids and allows for accurate memory tracking and enforcement. This > ensures that the job output includes the actual memory usage when killed as > well as ensuring that the job cannot break the set limits. > > I totally agree with what you say. If you run jobs with cpusets, then you don't need to depend on limits or the batch scheduler getting round and killing out-of-limits jobs - if the job gets 'too big for its boots' the OOM killer deals with it. Also collapsing a cpuset at the end of a job means that you don't have stray processes being left running with badly behaved codes. (Yes, I know you can and should run scripts after a job finishes to deal with these - and indeed I do).
_______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
