[ 
https://issues.apache.org/jira/browse/MESOS-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839250#comment-13839250
 ] 

Ian Downes commented on MESOS-763:
----------------------------------

Additional details: A cgroup's page cache is counted towards its total memory 
usage. As the RSS + cache approaches the cgroup's memory limit the kernel 
should start flushing the cache to keep the cgroup under. However, we've 
observed (and can reproduce) sustained disk io resulting in the kernel being 
unable to flush sufficient pages, leading to the cgroup being OOM'ed.

The test can be as simple as a shell script that dd's chunks from /dev/zero, 
appended to a file at some rate, e.g. 1 MB repeated  every 100ms for approx 10 
MB/s.

> Implement a OOM test that does file io
> --------------------------------------
>
>                 Key: MESOS-763
>                 URL: https://issues.apache.org/jira/browse/MESOS-763
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Vinod Kone
>             Fix For: 0.17.0
>
>
> The test should make sure the job is not killed if the file io done by a task 
> eats up page cache. The kernel should be able to purge cache before OOM gets 
> invoked.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to