Re: Review Request 69684: Fixed the FD leak if containerizer::_launch() failed or discarded.

Andrei Budnik Tue, 08 Jan 2019 06:29:24 -0800


> On Jan. 8, 2019, 8:22 a.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/containerizer.cpp
> > Lines 1372 (patched)
> > <https://reviews.apache.org/r/69684/diff/2/?file=2118319#file2118319line1372>
> >
> >     Two cases:
> >     
> >     1) if `prepare(containerId, None())` fails, this `.repair` works, since 
> > it would trigger `FDWrapper` to destructs itself.
> >     2) but if `_launch()` fails, `.repair` is no-op. it does not hurt but 
> > hard for contributors to understand since those FD close are implicit in 
> > destructors.
> >     
> >     If we do not care `_launch()` is failed or not, we just want to 
> > safeguard the `READY/FAILURE` cases for `prepare(containerId, None())`, 
> > should we move the .repair up (e.g., right after `prepare(containerId, 
> > None())`)?


I've moved `.repair` handler right after the callback that returns 
`ioSwitchboard->extractContainerIO(containerId)`.
> right after prepare(containerId, None())

In this case, when the future is discarded after calling `defer(self(), 
<prepare callback>)`, but before `prepare callback` is executed, then neither 
`prepare callback` nor `.repair callback` is called.


- Andrei


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69684/#review211756
-----------------------------------------------------------


On Jan. 7, 2019, 3:27 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69684/
> -----------------------------------------------------------
> 
> (Updated Jan. 7, 2019, 3:27 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6632
>     https://issues.apache.org/jira/browse/MESOS-6632
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> For the period between IOSB server is up and container process exec,
> if the containerizer launch returns failure or discarded, the FD used
> for signaling from the container process to the IOSB finish redirect
> will be leaked, which would cause the IOSB stuck at `io::redirect`
> forever. It would block the containerizer from cleaning up this
> container at IOSB.
> 
> This issue could be exposed if there are frequent check containers
> launch on an agent with heavy loads.
> 
> This patch fixes the issue by handling discard of a `launch` future,
> so the container IO is cleaned up and therefore all FDs are closed.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/containerizer.cpp 
> a5cf2da55c046c5c45e0c2ca3400f64de12de62b 
> 
> 
> Diff: https://reviews.apache.org/r/69684/diff/3/
> 
> 
> Testing
> -------
> 
> Manual testing:
> 
> Add `return Failure("Test!");` to `IOSwitchboard::_prepare()`: 
> https://github.com/apache/mesos/blob/7a378597d6a4359de0b1eabb563d4c12cdb4290e/src/slave/containerizer/mesos/io/switchboard.cpp#L657
> 
> 1) Compile without this patch applied
> 2) Run `src/mesos-tests --verbose 
> --gtest_filter=ContentType/AgentAPITest.AttachContainerInputRepeat/0`
> 3) This test fails due to a hanging IOSB (15 seconds timeout) after trying to 
> destroy IOSB on the HTTP error "Test!"
> 
> 1) Compile with this patch applied
> 2) Run `src/mesos-tests --verbose 
> --gtest_filter=ContentType/AgentAPITest.AttachContainerInputRepeat/0`
> 3) This test fails due to the HTTP error "Test!" and terminates immediately 
> (as expected)
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>

Re: Review Request 69684: Fixed the FD leak if containerizer::_launch() failed or discarded.

Reply via email to