> On Jan. 24, 2014, 7:47 a.m., Benjamin Hindman wrote: > > src/launcher/executor.cpp, line 298 > > <https://reviews.apache.org/r/17305/diff/1/?file=447761#file447761line298> > > > > I think it makes sense to have a "global" reaper just like we do with > > statistics. Having multiple libprocess processes be delaying every 1 second > > to call waitpid is a bit wasteful. > > Ben Mahler wrote: > That's true, at the same time we would be doing a 1 second no-op event > loop on all libprocess binaries, unless we added the ability to start/stop > the Reaper. > > I was thinking of a library-style API: > > namespace process { > Future<Option<int> > reap(pid_t); > } > > This would use a global Reaper under the hood, I'll be updating the diffs > with this cleaner interface. > > Ben Mahler wrote: > One side-effect of the global reaper that I discovered while testing is > that when running the reaping related tests in repetition, they gradually > become slower. This is because each test that advances time for the global > reaper to run it's wait() loop pushes the next wait() loop further into the > future, requiring more and more calls to Clock::advance + Clock::settle. > > Before these changes, since a Reaper was re-constructed across each test > and each component, we were immune to this. I'll leave a note about this in > the code. We can fix this if we were to implement the approach where we have > a thread per pid, each blocking on waitpid(). > > Benjamin Hindman wrote: > Why isn't resuming the clock sufficient for this?
Because the global ReaperProcess lives across the tests. >From its point of view, it is scheduling every 1 second. However, we're >locally pausing and advancing to force a schedule of the wait() function. Once >wait() is run, it will get re-scheduled 1 second further from when it was run >**in the advanced time-frame**. So the next test that wants to pause and >advance the clock, now must reach ~2 seconds into the advanced time-frame >rather than 1 second. Resuming would be sufficient if it made time appear to continue in the advanced time frame, but it will actually result in time going _back_ to what it would have been had the clock had never been paused in the first place. Sorry this is a bit tricky to explain clearly. Let me know if I'm missing something or if it's still not clear. - Ben ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17305/#review32706 ----------------------------------------------------------- On Jan. 27, 2014, 11:25 p.m., Ben Mahler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17305/ > ----------------------------------------------------------- > > (Updated Jan. 27, 2014, 11:25 p.m.) > > > Review request for mesos, Benjamin Hindman and Vinod Kone. > > > Bugs: MESOS-943 > https://issues.apache.org/jira/browse/MESOS-943 > > > Repository: mesos-git > > > Description > ------- > > This removes the Mesos Reaper in preference of using the libprocess Reaper, > which no longer reaps non-monitored processes. > > > Diffs > ----- > > src/Makefile.am d58b46e99e0a041cf2a26abe44bbd1504a9539c0 > src/launcher/executor.cpp b73ab479500a7347a38ba53acecfab9229f1080d > src/slave/cgroups_isolator.hpp e86062e9abaaa263c32c55e9dbfefd700f605886 > src/slave/cgroups_isolator.cpp 80155a3f0cd684a6aad9a37f61c5204d86032735 > src/slave/process_isolator.hpp 4ae093fe65775a2b9bec42071961dd58aa0c3d8b > src/slave/process_isolator.cpp 0bc698f04f7c8eaad166dc9d646e13310129dd01 > src/slave/reaper.hpp 9a31c754475ecbce5299d8f18f38253c542404e5 > src/slave/reaper.cpp 5eabbc3911584cf47c353bcf4ca660c47c2c17be > src/tests/environment.cpp 6edce4552ef9a12b7b58cefea97ebacc9224ab04 > src/tests/reaper_tests.cpp 608ec0eff4eaae115d75621937a39b22e3bdb068 > src/tests/slave_recovery_tests.cpp 5a4c4fc4f687a37409d1afbda4c0d07fcdc3a4c7 > > Diff: https://reviews.apache.org/r/17305/diff/ > > > Testing > ------- > > make check > > I've also added an orphan check in the testing environment tear down. > > > Thanks, > > Ben Mahler > >