[ https://issues.apache.org/jira/browse/MESOS-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536807#comment-14536807 ]
Ian Babrou commented on MESOS-2713: ----------------------------------- I'm not sure if you could run docker containers without cgroups. Anyway, graceful fallback to existing stats instead of cgroups would be better. Take a look: web300 ~ # cat /sys/fs/cgroup/cpuacct/docker/944fe900f60595d37ce4db3c4c09c196be3b500c2d3e89dab59351da2c8b597d/cpuacct.stat user 20964 system 1167 web300 ~ # curl -s http://web300:5051/monitor/statistics.json | jq . [ { "statistics": { "timestamp": 1431194945.15193, "mem_rss_bytes": 408150016, "mem_limit_bytes": 2181038080, "cpus_user_time_secs": 1.46, "cpus_system_time_secs": 0.35, "cpus_limit": 3.6 }, "source": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799", "framework_id": "20150126-100650-3909200064-5050-1-0007", "executor_name": "Command Executor (Task: topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799) (Command: sh -c 'exec /sbin/m...')", "executor_id": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799" } ] Now take another look, user time decreases: web300 ~ # curl -s http://web300:5051/monitor/statistics.json | jq . [ { "statistics": { "timestamp": 1431195057.42133, "mem_rss_bytes": 428085248, "mem_limit_bytes": 2181038080, "cpus_user_time_secs": 4.56, "cpus_system_time_secs": 0.43, "cpus_limit": 3.6 }, "source": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799", "framework_id": "20150126-100650-3909200064-5050-1-0007", "executor_name": "Command Executor (Task: topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799) (Command: sh -c 'exec /sbin/m...')", "executor_id": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799" } ] web300 ~ # curl -s http://web300:5051/monitor/statistics.json | jq . [ { "statistics": { "timestamp": 1431195058.38549, "mem_rss_bytes": 335261696, "mem_limit_bytes": 2181038080, "cpus_user_time_secs": 0.73, "cpus_system_time_secs": 0.31, "cpus_limit": 3.6 }, "source": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799", "framework_id": "20150126-100650-3909200064-5050-1-0007", "executor_name": "Command Executor (Task: topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799) (Command: sh -c 'exec /sbin/m...')", "executor_id": "topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799" } ] > Docker resource usage > ---------------------- > > Key: MESOS-2713 > URL: https://issues.apache.org/jira/browse/MESOS-2713 > Project: Mesos > Issue Type: Bug > Components: containerization, docker, isolation > Affects Versions: 0.22.1 > Reporter: Ian Babrou > > Looks like resource usage for docker containers on slaves is not very > accurate (/monitor/statistics.json). For example, cpu usage is calculated by > travesing process tree and summing up cpu times. Resulting numbers are not > even close to real usage, CPU time can even decrease. > What is the reason for this if you can use cgroup data directly? Reading > cgroup location from pid of docker container is pretty straighforward. > Another similar question: what is the reason to set isolation to posix > instead of cgroups by default? Looks like it suffers from the same issues as > docker containerizer (incorrect stats). More docs on this topic would be > great. > Posix isolation also leads to bigger CPU usage from mesos slave process > (higher usage — posix isolation): http://i.imgur.com/jepk5m6.png -- This message was sent by Atlassian JIRA (v6.3.4#6332)