> On June 12, 2016, 1:09 p.m., Qian Zhang wrote: > > src/slave/containerizer/mesos/isolators/cgroups/subsystem.cpp, lines 674-677 > > <https://reviews.apache.org/r/45363/diff/8/?file=1347662#file1347662line674> > > > > I see here we call `lambda::bind()`, but in the original `cgroups/mem` > > isolator, we call defer(): > > > > https://github.com/apache/mesos/blob/0.28.2/src/slave/containerizer/mesos/isolators/cgroups/mem.cpp#L605:L609 > > > > That means `MemorySubsystem::oomWaited()` will be executed in a > > separate thread. So my concern is there may be some multi-thread race > > condition issue for your way, e.g., what if `MemorySubsystem::oomWaited()` > > is triggered after the container exits (i.e, we have cleaned it up)? I > > think in the original `cgroups/mem` isolator, we have already considered > > this issue, please refer to this comment for details: > > > > https://github.com/apache/mesos/blob/0.28.2/src/slave/containerizer/mesos/isolators/cgroups/mem.cpp#L634:L637 > > haosdent huang wrote: > For > ``` > // It is likely that process exited is executed before this > // function (e.g. The kill and OOM events happen at the same > // time, and the process exit event arrives first.) Therefore, we > // should not report a fatal error here. > ``` > It is because eventfd would be triggered when hierarchy destruction. It > should be fixed in https://reviews.apache.org/r/46299/ > > haosdent huang wrote: > I try to avoid to make `Subsystem` extends from `Process` before, but I > think it should be necessary, let me update here. > > Qian Zhang wrote: > I think we need to handle this multi-thread issue anyway (even > https://reviews.apache.org/r/46299/ is fixed), because when container uses > the memory more than the limit, it will be killed (so > `CgroupsMemIsolatorProcess::cleanup()` will be called) and at the same time > OOM event will happen (so `CgroupsMemIsolatorProcess::oom()` will be called), > but the order can not be guaranteed, if `CgroupsMemIsolatorProcess::oom()` is > called after `CgroupsMemIsolatorProcess::cleanup()` which will delete `info`, > then we will not find `info` by the containerId. I think that's why we have > the following code in the `cgroups/mem` isolator. > ``` > if (!infos.contains(containerId)) { > // It is likely that process exited is executed before this > // function (e.g. The kill and OOM events happen at the same > // time, and the process exit event arrives first.) Therefore, we > // should not report a fatal error here. > LOG(INFO) << "OOM detected for an already terminated executor"; > return; > } > ```
Yes, need extends from `Process` and use `defer` to avoid threads competition. - haosdent ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/45363/#review137168 ----------------------------------------------------------- On June 19, 2016, 10:31 a.m., haosdent huang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/45363/ > ----------------------------------------------------------- > > (Updated June 19, 2016, 10:31 a.m.) > > > Review request for mesos, Gilbert Song, Guangya Liu, Ian Downes, Jie Yu, > Kevin Klues, and Qian Zhang. > > > Bugs: MESOS-5045 > https://issues.apache.org/jira/browse/MESOS-5045 > > > Repository: mesos > > > Description > ------- > > Add `MemorySubsystem` for cgroups unified isolator. > > > Diffs > ----- > > src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp PRE-CREATION > src/slave/containerizer/mesos/isolators/cgroups/subsystem.hpp PRE-CREATION > src/slave/containerizer/mesos/isolators/cgroups/subsystem.cpp PRE-CREATION > > Diff: https://reviews.apache.org/r/45363/diff/ > > > Testing > ------- > > > Thanks, > > haosdent huang > >