> On Dec. 5, 2016, 9:11 p.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/io/switchboard.cpp, lines 537-547
> > <https://reviews.apache.org/r/54355/diff/3/?file=1576209#file1576209line537>
> >
> >     It's likely that the io switchboard server has been forked, but the 
> > agent crashes before it was able to checkpoint the pid.
> >     
> >     If that happens, during recovery, we will not maintain Info for that 
> > container. As a result, we won't try to cleanup the socket file potentially 
> > created?
> >     
> >     I think we probably need to createa directory for io switchboard 
> > related files (sock and pid files). When we create the directory, it 
> > indicates that the io switchboard server might or might not be created. 
> > During recovery, if we find the directory exists, but pid file does not 
> > exist, we should still create the Info with pid set to None(), and cleanup 
> > the socket file in 'cleanup' method.
> >     
> >     Thoughts?
> 
> Kevin Klues wrote:
>     That seems reasonable, what should we call the directory? As below?
>     ```
>     io_switchboard
>     |-pid
>     -socket
>     ```

That said, note that the io-switchboard itself creates the sock file, so the 
only time this would ever happen is if an agent restarted between successfully 
launching the io-switchboard and checkpointing its pid.


- Kevin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54355/#review158038
-----------------------------------------------------------


On Dec. 5, 2016, 9:46 a.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54355/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2016, 9:46 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6688
>     https://issues.apache.org/jira/browse/MESOS-6688
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added implementation of `recover()` to the IOSwitchboard isolator.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/io/switchboard.hpp 
> 839665a22aca9b1c1c1cf4992406bc924ee2b065 
>   src/slave/containerizer/mesos/io/switchboard.cpp 
> d5211b98616e72a27ca6b472a5ee83505c227f22 
> 
> Diff: https://reviews.apache.org/r/54355/diff/
> 
> 
> Testing
> -------
> 
> GTEST_FILTER="" make -j check
> sudo src/mesos-tests
> 
> Test added in follow-on patch.
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>

Reply via email to