[ 
https://issues.apache.org/jira/browse/MESOS-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Garcia updated MESOS-5836:
-------------------------------
    Description: 
We've noticed an issue with kernel versions 4.2, 4.4, and 4.5 where memory 
cgroups are not cleaned up by the system. When the register fills up with 65336 
cgroups, additional cgroups cannot be formed because there's no IDs for the new 
cgroup, and ENOSPC is returned. This is a concern for the Mesos project because 
no further containers can be created by Mesos in this state. We tested Docker 
1.8.3, and Docker 1.8.3 will silently fail to build the memory cgroup, 
resulting in rogue containers that are memory-unbound.

h3. Steps to reproduce:
*NOTE: Mesos is not required to reproduce this issue*

- Start a new instance using kernel 4.2, 4.4, or 4.5 (CoreOS 766-1010, Ubuntu 
16.04) 
- ssh to the machine
- {{cat /proc/cgroups}} to determine the number of memory cgroups
- Run several docker containers using the {{--memory}} or {{-m}} option to set 
a memory isolator, either in parallel or in series
- Stop all containers
- {{cat /proc/cgroups}} to review the number of memory cgroups and compare to 
previous run
- Optional: Run 65,336 docker containers using memory isolation and then try to 
launch a Mesos container

h3. Differential diagnosis:

When the cgroup limit is exceeded, subsequent container terminations will draw 
the following error in {{dmesg}}:
{code}idr_remove called for id=65536 which is not allocated.{code}
Subsequent efforts to create a cgroup folder will fail:
{code}/sys/fs/cgroup/memory/mesos $ df .
Filesystem     1K-blocks  Used Available Use% Mounted on
cgroup                 0     0         0    - /sys/fs/cgroup/memory
/sys/fs/cgroup/memory/mesos $ sudo mkdir foo
mkdir: cannot create directory 'foo': No space left on device{code}
Subsequently launched Docker containers will fail to utilize memory isolation: 
{code}/sys/fs/cgroup/memory/mesos $ docker run -m 32m -d example/busybox sleep 
10000

...

/sys/fs/cgroup/memory/mesos $ docker ps | grep busybox
849c66081229        example/busybox                                             
            "sleep 10000"            6 seconds ago       Up 4 seconds           
                                                                         
suspicious_mahavira

/sys/fs/cgroup/memory/mesos $ find /sys/fs/cgroup -name "*849c66081229*"
/sys/fs/cgroup/blkio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/freezer/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/devices/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/cpu,cpuacct/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/cpuset/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/net_cls,net_prio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/systemd/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/memory/mesos $ {code}
Mesos containerizer will fail with {{No space left on device}}:
{code}E0707 20:17:29.091142 105665 slave.cpp:3802] Container 
'ef5419cf-9d00-425a-a9ee-a848d330bfb2' for executor 
'node-0_executor__42a4fafe-f64d-4b41-91d2-efc20a86a6a3' of framework 
d6ab251a-064a-46a0-a1c8-9ee559f3b44a-0023 failed to start: Failed to prepare 
isolator: Failed to create directory 
'/sys/fs/cgroup/memory/mesos/ef5419cf-9d00-425a-a9ee-a848d330bfb2': No space 
left on device
{code}

h3. Remediation

Once a system is found to be affected, the following command can be used to 
drop all page caches, which allows the system to reap all of the old cgroups 
and return to normal operation.
{code}echo 1 > /proc/sys/vm/drop_caches{code}

We suspect that [patch 9184539|https://patchwork.kernel.org/patch/9184539/] 
could fix it, but we have not yet tested.

  was:
We've noticed an issue with kernel versions 4.2, 4.4, and 4.5 where memory 
cgroups are not cleaned up by the system. When the register fills up with 65336 
cgroups, additional cgroups cannot be formed because there's no IDs for the new 
cgroup, and ENOSPC is returned. This is a concern for the Mesos project because 
no further containers can be created by Mesos in this state. We tested Docker 
1.8.3, and Docker 1.8.3 will silently fail to build the memory cgroup, 
resulting in rogue containers that are memory-unbound.

h3. Steps to reproduce:
*NOTE: Mesos is not required to reproduce this issue*

- Start a new instance using kernel 4.2, 4.4, or 4.5 (CoreOS 766-1010, Ubuntu 
16.04) 
- ssh to the machine
- {{cat /proc/cgroups}} to determine the number of memory cgroups
- Run several docker containers using the {{--memory}} or {{-m}} option to set 
a memory isolator, either in parallel or in series
- Stop all containers
- {{cat /proc/cgroups}} to review the number of memory cgroups and compare to 
previous run
- Optional: Run 65,336 docker containers using memory isolation and then try to 
launch a Mesos container

h3. Differential diagnosis:

When the cgroup limit is exceeded, subsequent container terminations will draw 
the following error in {{dmesg}}:
{code}idr_remove called for id=65536 which is not allocated.{code}
Subsequent efforts to create a cgroup folder will fail:
{code}/sys/fs/cgroup/memory/mesos $ df .
Filesystem     1K-blocks  Used Available Use% Mounted on
cgroup                 0     0         0    - /sys/fs/cgroup/memory
/sys/fs/cgroup/memory/mesos $ sudo mkdir foo
mkdir: cannot create directory 'foo': No space left on device{code}
Subsequently launched Docker containers will fail to utilize memory isolation: 
{code}/sys/fs/cgroup/memory/mesos $ docker run -m 32m -d 
10.1.13.1:9000/montana/busybox sleep 10000

...

/sys/fs/cgroup/memory/mesos $ docker ps | grep busybox
849c66081229        example/busybox                                             
            "sleep 10000"            6 seconds ago       Up 4 seconds           
                                                                         
suspicious_mahavira

/sys/fs/cgroup/memory/mesos $ find /sys/fs/cgroup -name "*849c66081229*"
/sys/fs/cgroup/blkio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/freezer/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/devices/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/cpu,cpuacct/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/cpuset/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/net_cls,net_prio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/systemd/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
/sys/fs/cgroup/memory/mesos $ {code}
Mesos containerizer will fail with {{No space left on device}}:
{code}E0707 20:17:29.091142 105665 slave.cpp:3802] Container 
'ef5419cf-9d00-425a-a9ee-a848d330bfb2' for executor 
'node-0_executor__42a4fafe-f64d-4b41-91d2-efc20a86a6a3' of framework 
d6ab251a-064a-46a0-a1c8-9ee559f3b44a-0023 failed to start: Failed to prepare 
isolator: Failed to create directory 
'/sys/fs/cgroup/memory/mesos/ef5419cf-9d00-425a-a9ee-a848d330bfb2': No space 
left on device
{code}

h3. Remediation

Once a system is found to be affected, the following command can be used to 
drop all page caches, which allows the system to reap all of the old cgroups 
and return to normal operation.
{code}echo 1 > /proc/sys/vm/drop_caches{code}

We suspect that [patch 9184539|https://patchwork.kernel.org/patch/9184539/] 
could fix it, but we have not yet tested.


> Cgroup Leakage in 4.2, 4.4, 4.5 kernels
> ---------------------------------------
>
>                 Key: MESOS-5836
>                 URL: https://issues.apache.org/jira/browse/MESOS-5836
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 0.28.1, 0.28.2, 1.0.0, 1.1.0
>            Reporter: John Garcia
>              Labels: mesosphere
>
> We've noticed an issue with kernel versions 4.2, 4.4, and 4.5 where memory 
> cgroups are not cleaned up by the system. When the register fills up with 
> 65336 cgroups, additional cgroups cannot be formed because there's no IDs for 
> the new cgroup, and ENOSPC is returned. This is a concern for the Mesos 
> project because no further containers can be created by Mesos in this state. 
> We tested Docker 1.8.3, and Docker 1.8.3 will silently fail to build the 
> memory cgroup, resulting in rogue containers that are memory-unbound.
> h3. Steps to reproduce:
> *NOTE: Mesos is not required to reproduce this issue*
> - Start a new instance using kernel 4.2, 4.4, or 4.5 (CoreOS 766-1010, Ubuntu 
> 16.04) 
> - ssh to the machine
> - {{cat /proc/cgroups}} to determine the number of memory cgroups
> - Run several docker containers using the {{--memory}} or {{-m}} option to 
> set a memory isolator, either in parallel or in series
> - Stop all containers
> - {{cat /proc/cgroups}} to review the number of memory cgroups and compare to 
> previous run
> - Optional: Run 65,336 docker containers using memory isolation and then try 
> to launch a Mesos container
> h3. Differential diagnosis:
> When the cgroup limit is exceeded, subsequent container terminations will 
> draw the following error in {{dmesg}}:
> {code}idr_remove called for id=65536 which is not allocated.{code}
> Subsequent efforts to create a cgroup folder will fail:
> {code}/sys/fs/cgroup/memory/mesos $ df .
> Filesystem     1K-blocks  Used Available Use% Mounted on
> cgroup                 0     0         0    - /sys/fs/cgroup/memory
> /sys/fs/cgroup/memory/mesos $ sudo mkdir foo
> mkdir: cannot create directory 'foo': No space left on device{code}
> Subsequently launched Docker containers will fail to utilize memory 
> isolation: {code}/sys/fs/cgroup/memory/mesos $ docker run -m 32m -d 
> example/busybox sleep 10000
> ...
> /sys/fs/cgroup/memory/mesos $ docker ps | grep busybox
> 849c66081229        example/busybox                                           
>               "sleep 10000"            6 seconds ago       Up 4 seconds       
>                                                                              
> suspicious_mahavira
> /sys/fs/cgroup/memory/mesos $ find /sys/fs/cgroup -name "*849c66081229*"
> /sys/fs/cgroup/blkio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/freezer/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/devices/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/cpu,cpuacct/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/cpuset/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/net_cls,net_prio/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/systemd/system.slice/docker-849c6608122989f1bc9ae39a5c70281228a304092baa0d73d9430ed94223f554.scope
> /sys/fs/cgroup/memory/mesos $ {code}
> Mesos containerizer will fail with {{No space left on device}}:
> {code}E0707 20:17:29.091142 105665 slave.cpp:3802] Container 
> 'ef5419cf-9d00-425a-a9ee-a848d330bfb2' for executor 
> 'node-0_executor__42a4fafe-f64d-4b41-91d2-efc20a86a6a3' of framework 
> d6ab251a-064a-46a0-a1c8-9ee559f3b44a-0023 failed to start: Failed to prepare 
> isolator: Failed to create directory 
> '/sys/fs/cgroup/memory/mesos/ef5419cf-9d00-425a-a9ee-a848d330bfb2': No space 
> left on device
> {code}
> h3. Remediation
> Once a system is found to be affected, the following command can be used to 
> drop all page caches, which allows the system to reap all of the old cgroups 
> and return to normal operation.
> {code}echo 1 > /proc/sys/vm/drop_caches{code}
> We suspect that [patch 9184539|https://patchwork.kernel.org/patch/9184539/] 
> could fix it, but we have not yet tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to