[ https://issues.apache.org/jira/browse/MESOS-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543743#comment-16543743 ]
Stephan Erb commented on MESOS-8418: ------------------------------------ I have attached a profile [^mesos-agent.stacks.gz] gathered on a host with: * ~140 tasks, all running docker images via the Mesos containerizer * relevant isolators {{cgroups/cpu,cgroups/mem,filesystem/linux,docker/runtime,...'}} * agent monitoring endpoints {{/slave(1)/monitor/statistic}} and {{/slave(1)/state}} are scraped every 15s. In total, a scrape takes between 5-13sec * potentially related agent settings: ** --oversubscribed_resources_interval=45secs ** --qos_correction_interval_min=45secs ** --cgroups_cpu_enable_pids_and_tids_count The profile confirms that mount table reads take a significant amount of time !mesos-agent-flamegraph.png|width=800,height=450! > mesos-agent high cpu usage because of numerous /proc/mounts reads > ----------------------------------------------------------------- > > Key: MESOS-8418 > URL: https://issues.apache.org/jira/browse/MESOS-8418 > Project: Mesos > Issue Type: Improvement > Components: agent, containerization > Reporter: Stéphane Cottin > Priority: Major > Labels: containerizer, performance > Attachments: mesos-agent-flamegraph.png, mesos-agent.stacks.gz > > > /proc/mounts is read many, many times from > src/(linux/fs|linux/cgroups|slave/slave).cpp. > When using overlayfs, the /proc/mounts contents can become quite large. > As an example, one of our Q/A single node running ~150 tasks, have a 361 > lines/ 201299 chars /proc/mounts file. > This 200kB file is read on this node about 25 to 150 times per second. This > is a (huge) waste of cpu and I/O time. > Most of these calls are related to cgroups. > Please consider these proposals : > 1/ Is /proc/mounts mandatory for cgroups ? > We already have cgroup subsystems list from /proc/cgroups. > The only compelling information from /proc/mounts seems to be the root mount > point, > /sys/fs/cgroup/, which could be obtained by a unique read on agent start. > 2/ use /proc/self/mountstats > {noformat} > wc /proc/self/mounts /proc/self/mountstats > 361 2166 201299 /proc/self/mounts > 361 2888 50200 /proc/self/mountstats > {noformat} > {noformat} > grep cgroup /proc/self/mounts > cgroup /sys/fs/cgroup tmpfs rw,relatime,mode=755 0 0 > cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0 > cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0 > cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0 > cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0 > cgroup /sys/fs/cgroup/memory cgroup rw,relatime,memory 0 0 > cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0 > cgroup /sys/fs/cgroup/freezer cgroup rw,relatime,freezer 0 0 > cgroup /sys/fs/cgroup/net_cls cgroup rw,relatime,net_cls 0 0 > cgroup /sys/fs/cgroup/perf_event cgroup rw,relatime,perf_event 0 0 > cgroup /sys/fs/cgroup/net_prio cgroup rw,relatime,net_prio 0 0 > cgroup /sys/fs/cgroup/pids cgroup rw,relatime,pids 0 0 > {noformat} > {noformat} > grep cgroup /proc/self/mountstats > device cgroup mounted on /sys/fs/cgroup with fstype tmpfs > device cgroup mounted on /sys/fs/cgroup/cpuset with fstype cgroup > device cgroup mounted on /sys/fs/cgroup/cpu with fstype cgroup > device cgroup mounted on /sys/fs/cgroup/cpuacct with fstype cgroup > device cgroup mounted on /sys/fs/cgroup/blkio with fstype cgroup > device cgroup mounted on /sys/fs/cgroup/memory with fstype cgroup > device cgroup mounted on /sys/fs/cgroup/devices with fstype cgroup > device cgroup mounted on /sys/fs/cgroup/freezer with fstype cgroup > device cgroup mounted on /sys/fs/cgroup/net_cls with fstype cgroup > device cgroup mounted on /sys/fs/cgroup/perf_event with fstype cgroup > device cgroup mounted on /sys/fs/cgroup/net_prio with fstype cgroup > device cgroup mounted on /sys/fs/cgroup/pids with fstype cgroup > {noformat} > This file contains all the required information, and is 4x smaller > 3/ microcaching > Caching cgroups data for just 1 second would be a huge perfomance > improvement, but i'm not aware of the possible side effects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)