----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30514/#review71021 -----------------------------------------------------------
Very nice! First high level pass, haven't looked at the tests yet. src/master/master.cpp <https://reviews.apache.org/r/30514/#comment116535> Short-circuiting with a pre-increment seems pretty tricky, how about: ``` if (pinged) { // No pong was received within the timeout! timeouts++; if (timeouts >= MAX_SLAVE_PING_TIMEOUTS) { shutdown(); } } ``` src/master/master.cpp <https://reviews.apache.org/r/30514/#comment116545> It looks like we only log when we're shutting down the slave, when there's no rate limiter? Have you considered having a single shutdown code path? This gets you a single dispatch, and a single log statement and it looks like it's less control flow? ``` void shutdown() { if (shuttingDown.isNone()) { Future<Nothing> future = Nothing(); if (limiter.isSome()) { LOG(INFO) << "Scheduling shutdown of slave " << slaveId << " due to health check timeout"; future = limiter.get()->acquire(); } shuttingDown = future.onAny(defer(self(), &Self::_shutdown)); ++metrics->slave_shutdowns_scheduled; } } void _shutdown() { CHECK_SOME(shuttingDown); const Future<Nothing>& future = shuttingDown.get(); CHECK(!future.isFailed()) << future.failure(); if (future.isReady()) { LOG(INFO) << "Shutting down slave " << slaveId << " due to health check timeout"; dispatch(master, &Master::shutdownSlave, slaveId, "health check timed out"); } else if (future.isDiscarded()) { // Shutdown was canceled. LOG(INFO) << "Canceling shutdown of slave " << slaveId << " since a pong is received!"; ++metrics->slave_shutdowns_canceled; } shuttingDown = None(); } ``` src/master/master.cpp <https://reviews.apache.org/r/30514/#comment116548> Whoops, indentation? - Ben Mahler On Feb. 4, 2015, 1:51 a.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30514/ > ----------------------------------------------------------- > > (Updated Feb. 4, 2015, 1:51 a.m.) > > > Review request for mesos, Ben Mahler, David Robinson, and Jie Yu. > > > Bugs: MESOS-1148 > https://issues.apache.org/jira/browse/MESOS-1148 > > > Repository: mesos > > > Description > ------- > > The algorithm is simple. All the slave observers share a rate limiter whose > rate is configured via command line. When a slave times out on health check, > a permit is acquired to shutdown the slave *but* the pings are continued to > be sent. If a pong arrives before the permit is satisifed, the shutdown is > cancelled. > > > Diffs > ----- > > src/master/flags.hpp 6c18a1af625311ef149b5877b08f63c2b12c040d > src/master/master.hpp cd37ee9d3c57bcd91f08cd402ec1e4a09d9e42ee > src/master/master.cpp d04b2c4041d8fe8978b877f07579a6f907903e1b > src/tests/partition_tests.cpp fea78016268b007590516798eb30ff423fd0ae58 > src/tests/slave_tests.cpp e7e2af63da785644f3f7e6e23607c02be962a2c6 > > Diff: https://reviews.apache.org/r/30514/diff/ > > > Testing > ------- > > make check > > Ran the new tests 100 times > > > Thanks, > > Vinod Kone > >
