> On Dec. 5, 2016, 9:11 p.m., Jie Yu wrote: > > src/slave/containerizer/mesos/io/switchboard.cpp, lines 537-547 > > <https://reviews.apache.org/r/54355/diff/3/?file=1576209#file1576209line537> > > > > It's likely that the io switchboard server has been forked, but the > > agent crashes before it was able to checkpoint the pid. > > > > If that happens, during recovery, we will not maintain Info for that > > container. As a result, we won't try to cleanup the socket file potentially > > created? > > > > I think we probably need to createa directory for io switchboard > > related files (sock and pid files). When we create the directory, it > > indicates that the io switchboard server might or might not be created. > > During recovery, if we find the directory exists, but pid file does not > > exist, we should still create the Info with pid set to None(), and cleanup > > the socket file in 'cleanup' method. > > > > Thoughts? > > Kevin Klues wrote: > That seems reasonable, what should we call the directory? As below? > ``` > io_switchboard > |-pid > -socket > ```
That said, note that the io-switchboard itself creates the sock file, so the only time this would ever happen is if an agent restarted between successfully launching the io-switchboard and checkpointing its pid. - Kevin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54355/#review158038 ----------------------------------------------------------- On Dec. 5, 2016, 9:46 a.m., Kevin Klues wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54355/ > ----------------------------------------------------------- > > (Updated Dec. 5, 2016, 9:46 a.m.) > > > Review request for mesos and Jie Yu. > > > Bugs: MESOS-6688 > https://issues.apache.org/jira/browse/MESOS-6688 > > > Repository: mesos > > > Description > ------- > > Added implementation of `recover()` to the IOSwitchboard isolator. > > > Diffs > ----- > > src/slave/containerizer/mesos/io/switchboard.hpp > 839665a22aca9b1c1c1cf4992406bc924ee2b065 > src/slave/containerizer/mesos/io/switchboard.cpp > d5211b98616e72a27ca6b472a5ee83505c227f22 > > Diff: https://reviews.apache.org/r/54355/diff/ > > > Testing > ------- > > GTEST_FILTER="" make -j check > sudo src/mesos-tests > > Test added in follow-on patch. > > > Thanks, > > Kevin Klues > >