> On Янв. 13, 2020, 3:21 д.п., Qian Zhang wrote: > > src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp > > Line 459 (original), 480 (patched) > > <https://reviews.apache.org/r/71965/diff/1/?file=2199231#file2199231line480> > > > > I think now we need to do this for nested containers whose > > `share_cgroups` is false, because they may need to create cgroups under > > their own nested cgroups. > > Andrei Budnik wrote: > Do we allow creating cgroups under nested cgroups of those nested > containers? Since we do not use a `mesos` separator, the cgroups isolator may > pick up those underlying cgroups during recovery, which is not what expected. > > Qian Zhang wrote: > Actually we are not able to disallow that, it is up to the task itself, > if the task wants to create a child cgroup under its own cgroup, then I think > it can just do it. But yes I agree that may bring trouble to recovery, maybe > we should use a `mesos` separator so that the cgroups created by the tasks > will not affect our recovery, how do you think? > > Andrei Budnik wrote: > After enabling `mesos` separator between cgroups, launching a nested > container with `share_cgroup=false` fails with "No space left on device" > error. The root cause of this failure is described in the comment to our > function `cloneCpusetCpusMems` and in this SO article: > https://stackoverflow.com/questions/28348627/echo-tasks-gives-no-space-left-on-device-when-trying-to-use-cpuset > > > `CgroupsIsolatorProcess::prepare()` calls `cgroups::create()` which calls > `cloneCpusetCpusMems()` function. > > > ``` > // Copies the value of 'cpuset.cpus' and 'cpuset.mems' from a parent > // cgroup to a child cgroup so the child cgroup can actually run tasks > // (otherwise it gets the error 'Device or resource busy'). > ... > static Try<Nothing> cloneCpusetCpusMems( > ``` > > > When we call `cgroups::create()` for a nested cgroup, > `cloneCpusetCpusMems()` tries to copy those “cpuset” files from a parent > container that has a suffix `<path to parent container>/mesos` to a child > container with suffix `<path to parent container>/mesos/childContainerId`. > Since those "cpuset" files are not filled/initialized for the parent that > ends with `/mesos` suffix, `cloneCpusetCpusMems()` copies those empty files > to the child cgroup of `/mesos/childContainerId`. Therefore, the cgroups > isolator fails to assign a PID to the cgroup for the child container during > `isolate()` for the `cpuset` cgroup (for the reason described in the > beggining of this comment). > > I'm not sure about the best way to fix/workaround this issue. I decided > to get rid of a separator between cgroups. AFAIK, k8s does not use any > separator between containers within the POD. > > Do you happen to know any better approach to the problem described above? > > Qian Zhang wrote: > Good to know! > > It seems a bug of `cgroups::create()` to me, I mean if > `cgroups::create()` is used to create a cgroup recursively (like: > `mesos/childContainerId`), it should clone `cpuset.cpus` and `cpuset.mems` > recursively too (both `mesos` and `childContainerId`) rather than just clone > them for the leaf cgroup (`childContainerId`), right? > > If we do not want to change the code of `cgroups::create()`, then a > workaround could be calling `cgroups::create()` non-recursively, i.e., call > this function twice, one for `mesos` and another for > `mesos/childContainerId`. But I would suggest to fix `cgroups::create()` if > possible.
Updated the comment. - Andrei ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71965/#review219227 ----------------------------------------------------------- On Фев. 12, 2020, 3:47 п.п., Andrei Budnik wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71965/ > ----------------------------------------------------------- > > (Updated Фев. 12, 2020, 3:47 п.п.) > > > Review request for mesos, Greg Mann and Qian Zhang. > > > Bugs: MESOS-10076 > https://issues.apache.org/jira/browse/MESOS-10076 > > > Repository: mesos > > > Description > ------- > > This patch adds support for nested cgroups for nested containers. > Nested cgroups are created only for a nested container with explicitly > enabled `share_cgroups` flag. The cgroups isolator stores info about > nested cgroups in the `infos` class variable, which is used to > determine whether a nested container has its nested cgroup. > > > Diffs > ----- > > src/slave/containerizer/mesos/isolators/cgroups/cgroups.hpp > 4bd3d6dad37dee031660c15e957cc36f63e21fcb > src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp > b12b73d8e0161d448075378765e77867521de04e > > > Diff: https://reviews.apache.org/r/71965/diff/2/ > > > Testing > ------- > > > Thanks, > > Andrei Budnik > >