[ https://issues.apache.org/jira/browse/MESOS-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Garcia updated MESOS-5836: ------------------------------- Description: We've noticed an issue with kernel versions 4.2, 4.4, and 4.5 where memory cgroups are not cleaned up by the system. When the register fills up with 65336 cgroups, additional cgroups cannot be formed because there's no address space, and ENOSPC is returned. This is a concern for the Mesos project because no further containers can be created by Mesos in this state (and Docker containers will silently fail to build the memory isolator, resulting in rogue containers that are memory-unbound). h3. Steps to reproduce: *NOTE: Mesos is not required to reproduce this issue* - Start a new instance using kernel 4.2, 4.4, or 4.5 (CoreOS 766-1010, Ubuntu 16.04) - ssh to the machine - {{cat /proc/cgroups}} to determine the number of memory cgroups - Run several docker containers using the {{--memory}} or {{-m}} option to set a memory isolator, either in parallel or in series - Stop all containers - {{cat /proc/cgroups}} to review the number of memory cgroups and compare to previous run - Optional: Run 65,336 docker containers using memory isolation and then try to launch a Mesos container h3. Differential diagnosis: When the cgroup limit is exceeded, subsequent container terminations will draw the following error in {{dmesg}}: {code}idr_remove called for id=65536 which is not allocated.{code} Subsequent efforts to create a cgroup folder will fail: {code}/sys/fs/cgroup/memory/mesos $ df . Filesystem 1K-blocks Used Available Use% Mounted on cgroup 0 0 0 - /sys/fs/cgroup/memory /sys/fs/cgroup/memory/mesos $ sudo mkdir foo mkdir: cannot create directory 'foo': No space left on device{code} Subsequently launched Docker containers will fail to utilize memory isolation: {code}/sys/fs/cgroup/memory/mesos $ docker run -m 32m -d 10.1.13.1:9000/montana/busybox sleep 10000 ... /sys/fs/cgroup/memory/mesos $ docker ps | grep busybox 849c66081229 example/busybox "sleep 10000" 6 seconds ago Up 4 seconds suspicious_mahavira /sys/fs/cgroup/memory/mesos $ find /sys/fs/cgroup -name "*849c66081229*" /sys/fs/cgroup/blkio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/freezer/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/devices/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/cpu,cpuacct/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/cpuset/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/net_cls,net_prio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/systemd/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/memory/mesos $ {code} Mesos containerizer will fail with {{No space left on device}}: {code}E0707 20:17:29.091142 105665 slave.cpp:3802] Container 'ef5419cf-9d00-425a-a9ee-a848d330bfb2' for executor 'node-0_executor__42a4fafe-f64d-4b41-91d2-efc20a86a6a3' of framework d6ab251a-064a-46a0-a1c8-9ee559f3b44a-0023 failed to start: Failed to prepare isolator: Failed to create directory '/sys/fs/cgroup/memory/mesos/ef5419cf-9d00-425a-a9ee-a848d330bfb2': No space left on device {code} h3. Remediation Once a system is found to be affected, the following command can be used to drop all page caches, which allows the system to reap all of the old cgroups and return to normal operation. {code}echo 1 > /proc/sys/vm/drop_caches{code} We suspect that [patch 9184539|https://patchwork.kernel.org/patch/9184539/] could fix it, but we have not yet tested. was: We've noticed an issue with kernel versions 4.2, 4.4, and 4.5 where memory cgroups are not cleaned up by the system. When the register fills up with 65336 cgroups, additional cgroups cannot be formed because there's no address space, and ENOSPC is returned. This is a concern for the Mesos project because no further containers can be created by Mesos in this state (and Docker containers will silently fail to build the memory isolator, resulting in rogue containers that are memory-unbound). h3. Steps to reproduce: *NOTE: Mesos is not required to reproduce this issue* - Start a new instance using kernel 4.2, 4.4, or 4.5 (CoreOS 766-1010, Ubuntu 16.04) - ssh to the machine - {{cat /proc/cgroups}} to determine the number of memory cgroups - Run several docker containers using the {{--memory}} or {{-m}} option to set a memory isolator, either in parallel or in series - Stop all containers - {{cat /proc/cgroups}} to review the number of memory cgroups and compare to previous run - Optional: Run 65,336 docker containers using memory isolation and then try to launch a Mesos container h3. Differential diagnosis: When the cgroup limit is exceeded, subsequent container terminations will draw the following error in {{dmesg}}: {code}idr_remove called for id=65536 which is not allocated.{code} Subsequent efforts to create a cgroup folder will fail: {code}/sys/fs/cgroup/memory/mesos $ df . Filesystem 1K-blocks Used Available Use% Mounted on cgroup 0 0 0 - /sys/fs/cgroup/memory /sys/fs/cgroup/memory/mesos $ sudo mkdir foo mkdir: cannot create directory 'foo': No space left on device{code} Subsequently launched Docker containers will fail to utilize memory isolation: {code}/sys/fs/cgroup/memory/mesos $ docker run -m 32m -d 10.1.13.1:9000/montana/busybox sleep 10000 ... /sys/fs/cgroup/memory/mesos $ docker ps | grep busybox 849c66081229 example/busybox "sleep 10000" 6 seconds ago Up 4 seconds suspicious_mahavira /sys/fs/cgroup/memory/mesos $ find /sys/fs/cgroup -name "*849c66081229*" /sys/fs/cgroup/blkio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/freezer/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/devices/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/cpu,cpuacct/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/cpuset/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/net_cls,net_prio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/systemd/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope /sys/fs/cgroup/memory/mesos $ {code} Mesos containerizer will fail with {{No space left on device}}: {code}E0707 20:17:29.091142 105665 slave.cpp:3802] Container 'ef5419cf-9d00-425a-a9ee-a848d330bfb2' for executor 'node-0_executor__42a4fafe-f64d-4b41-91d2-efc20a86a6a3' of framework d6ab251a-064a-46a0-a1c8-9ee559f3b44a-0023 failed to start: Failed to prepare isolator: Failed to create directory '/sys/fs/cgroup/memory/mesos/ef5419cf-9d00-425a-a9ee-a848d330bfb2': No space left on device {code} h3. Remediation Once a system is found to be affected, the following command can be used to drop all page caches, which allows the system to reap all of the old cgroups and return to normal operation. {code}echo 1 > /proc/sys/vm/drop_caches{code} We suspect that [patch 9184539|https://patchwork.kernel.org/patch/9184539/] could fix it , but we have not yet tested. > Cgroup Leakage in 4.2, 4.4, 4.5 kernels > --------------------------------------- > > Key: MESOS-5836 > URL: https://issues.apache.org/jira/browse/MESOS-5836 > Project: Mesos > Issue Type: Bug > Components: containerization > Affects Versions: 0.28.1, 0.28.2, 1.0.0, 1.1.0 > Reporter: John Garcia > Labels: mesosphere > > We've noticed an issue with kernel versions 4.2, 4.4, and 4.5 where memory > cgroups are not cleaned up by the system. When the register fills up with > 65336 cgroups, additional cgroups cannot be formed because there's no address > space, and ENOSPC is returned. This is a concern for the Mesos project > because no further containers can be created by Mesos in this state (and > Docker containers will silently fail to build the memory isolator, resulting > in rogue containers that are memory-unbound). > h3. Steps to reproduce: > *NOTE: Mesos is not required to reproduce this issue* > - Start a new instance using kernel 4.2, 4.4, or 4.5 (CoreOS 766-1010, Ubuntu > 16.04) > - ssh to the machine > - {{cat /proc/cgroups}} to determine the number of memory cgroups > - Run several docker containers using the {{--memory}} or {{-m}} option to > set a memory isolator, either in parallel or in series > - Stop all containers > - {{cat /proc/cgroups}} to review the number of memory cgroups and compare to > previous run > - Optional: Run 65,336 docker containers using memory isolation and then try > to launch a Mesos container > h3. Differential diagnosis: > When the cgroup limit is exceeded, subsequent container terminations will > draw the following error in {{dmesg}}: > {code}idr_remove called for id=65536 which is not allocated.{code} > Subsequent efforts to create a cgroup folder will fail: > {code}/sys/fs/cgroup/memory/mesos $ df . > Filesystem 1K-blocks Used Available Use% Mounted on > cgroup 0 0 0 - /sys/fs/cgroup/memory > /sys/fs/cgroup/memory/mesos $ sudo mkdir foo > mkdir: cannot create directory 'foo': No space left on device{code} > Subsequently launched Docker containers will fail to utilize memory > isolation: {code}/sys/fs/cgroup/memory/mesos $ docker run -m 32m -d > 10.1.13.1:9000/montana/busybox sleep 10000 > ... > /sys/fs/cgroup/memory/mesos $ docker ps | grep busybox > 849c66081229 example/busybox > "sleep 10000" 6 seconds ago Up 4 seconds > > suspicious_mahavira > /sys/fs/cgroup/memory/mesos $ find /sys/fs/cgroup -name "*849c66081229*" > /sys/fs/cgroup/blkio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope > /sys/fs/cgroup/freezer/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope > /sys/fs/cgroup/devices/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope > /sys/fs/cgroup/cpu,cpuacct/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope > /sys/fs/cgroup/cpuset/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope > /sys/fs/cgroup/net_cls,net_prio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope > /sys/fs/cgroup/systemd/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope > /sys/fs/cgroup/memory/mesos $ {code} > Mesos containerizer will fail with {{No space left on device}}: > {code}E0707 20:17:29.091142 105665 slave.cpp:3802] Container > 'ef5419cf-9d00-425a-a9ee-a848d330bfb2' for executor > 'node-0_executor__42a4fafe-f64d-4b41-91d2-efc20a86a6a3' of framework > d6ab251a-064a-46a0-a1c8-9ee559f3b44a-0023 failed to start: Failed to prepare > isolator: Failed to create directory > '/sys/fs/cgroup/memory/mesos/ef5419cf-9d00-425a-a9ee-a848d330bfb2': No space > left on device > {code} > h3. Remediation > Once a system is found to be affected, the following command can be used to > drop all page caches, which allows the system to reap all of the old cgroups > and return to normal operation. > {code}echo 1 > /proc/sys/vm/drop_caches{code} > We suspect that [patch 9184539|https://patchwork.kernel.org/patch/9184539/] > could fix it, but we have not yet tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)