Ian Downes created MESOS-2978:
---------------------------------

             Summary: Provide more debug information when OOMing a container
                 Key: MESOS-2978
                 URL: https://issues.apache.org/jira/browse/MESOS-2978
             Project: Mesos
          Issue Type: Improvement
          Components: isolation
    Affects Versions: 0.22.1
            Reporter: Ian Downes
            Priority: Minor


Currently, the cgroup memory isolator will log the output of {{memory.stat}} if 
it detects the container has oom'ed. This information is of some use to see how 
different types of memory used contributed to the oom but it does not provide 
information about memory usage of specific processes.

We should log process (thread) information, e.g., something to the effect of:
{noformat}
[idownes@foobar]$ pwd
/sys/fs/cgroup/memory/mesos/XXXX
[idownes@foobar]$ cat tasks | xargs ps -o pid,tid,stat,time,rss,command -L -p
{noformat}

This output is of variable size (memory.stat is bounded) so measures should be 
taken to limit the amount logged.

Note: the oom notification from the kernel is asynchronous with the kernel's 
oom handler killing processes and observing the notification is asynchronous in 
Mesos. Logging of information is thus best effort and it may lack information 
about process(es) that have already been killed by the kernel or even may not 
be logged at all if Mesos reacts first to the executor terminating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to