[ 
https://issues.apache.org/jira/browse/MESOS-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536807#comment-14536807
 ] 

Ian Babrou commented on MESOS-2713:
-----------------------------------

I'm not sure if you could run docker containers without cgroups. Anyway, 
graceful fallback to existing stats instead of cgroups would be better.

Take a look:

web300 ~ # cat 
/sys/fs/cgroup/cpuacct/docker/944fe900f60595d37ce4db3c4c09c196be3b500c2d3e89dab59351da2c8b597d/cpuacct.stat
user 20964
system 1167
web300 ~ # curl -s http://web300:5051/monitor/statistics.json | jq .
[
  {
    "statistics": {
      "timestamp": 1431194945.15193,
      "mem_rss_bytes": 408150016,
      "mem_limit_bytes": 2181038080,
      "cpus_user_time_secs": 1.46,
      "cpus_system_time_secs": 0.35,
      "cpus_limit": 3.6
    },
    "source": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799",
    "framework_id": "20150126-100650-3909200064-5050-1-0007",
    "executor_name": "Command Executor (Task: 
topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799) (Command: sh -c 
'exec /sbin/m...')",
    "executor_id": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799"
  }
]

Now take another look, user time decreases:

web300 ~ # curl -s http://web300:5051/monitor/statistics.json | jq .
[
  {
    "statistics": {
      "timestamp": 1431195057.42133,
      "mem_rss_bytes": 428085248,
      "mem_limit_bytes": 2181038080,
      "cpus_user_time_secs": 4.56,
      "cpus_system_time_secs": 0.43,
      "cpus_limit": 3.6
    },
    "source": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799",
    "framework_id": "20150126-100650-3909200064-5050-1-0007",
    "executor_name": "Command Executor (Task: 
topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799) (Command: sh -c 
'exec /sbin/m...')",
    "executor_id": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799"
  }
]
web300 ~ # curl -s http://web300:5051/monitor/statistics.json | jq .
[
  {
    "statistics": {
      "timestamp": 1431195058.38549,
      "mem_rss_bytes": 335261696,
      "mem_limit_bytes": 2181038080,
      "cpus_user_time_secs": 0.73,
      "cpus_system_time_secs": 0.31,
      "cpus_limit": 3.6
    },
    "source": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799",
    "framework_id": "20150126-100650-3909200064-5050-1-0007",
    "executor_name": "Command Executor (Task: 
topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799) (Command: sh -c 
'exec /sbin/m...')",
    "executor_id": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799"
  }
]

> Docker resource usage 
> ----------------------
>
>                 Key: MESOS-2713
>                 URL: https://issues.apache.org/jira/browse/MESOS-2713
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, docker, isolation
>    Affects Versions: 0.22.1
>            Reporter: Ian Babrou
>
> Looks like resource usage for docker containers on slaves is not very 
> accurate (/monitor/statistics.json). For example, cpu usage is calculated by 
> travesing process tree and summing up cpu times. Resulting numbers are not 
> even close to real usage, CPU time can even decrease.
> What is the reason for this if you can use cgroup data directly? Reading 
> cgroup location from pid of docker container is pretty straighforward.
> Another similar question: what is the reason to set isolation to posix 
> instead of cgroups by default? Looks like it suffers from the same issues as 
> docker containerizer (incorrect stats). More docs on this topic would be 
> great.
> Posix isolation also leads to bigger CPU usage from mesos slave process 
> (higher usage — posix isolation): http://i.imgur.com/jepk5m6.png



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to