> On Feb. 4, 2015, 8:01 p.m., Ben Mahler wrote:
> > src/master/master.cpp, lines 197-246
> > <https://reviews.apache.org/r/30514/diff/5/?file=847032#file847032line197>
> >
> > It looks like we only log when we're shutting down the slave, when
> > there's no rate limiter?
> >
> > Have you considered having a single shutdown code path? This gets you a
> > single dispatch, and a single log statement and it looks like it's less
> > control flow?
> >
> > ```
> > void shutdown()
> > {
> > if (shuttingDown.isNone()) {
> > Future<Nothing> future = Nothing();
> >
> > if (limiter.isSome()) {
> > LOG(INFO) << "Scheduling shutdown of slave " << slaveId
> > << " due to health check timeout";
> >
> > future = limiter.get()->acquire();
> > }
> >
> > shuttingDown = future.onAny(defer(self(), &Self::_shutdown));
> >
> > ++metrics->slave_shutdowns_scheduled;
> > }
> > }
> >
> > void _shutdown()
> > {
> > CHECK_SOME(shuttingDown);
> >
> > const Future<Nothing>& future = shuttingDown.get();
> >
> > CHECK(!future.isFailed()) << future.failure();
> >
> > if (future.isReady()) {
> > LOG(INFO) << "Shutting down slave " << slaveId
> > << " due to health check timeout";
> >
> > dispatch(master,
> > &Master::shutdownSlave,
> > slaveId,
> > "health check timed out");
> > } else if (future.isDiscarded()) {
> > // Shutdown was canceled.
> > LOG(INFO) << "Canceling shutdown of slave " << slaveId
> > << " since a pong is received!";
> >
> > ++metrics->slave_shutdowns_canceled;
> > }
> >
> > shuttingDown = None();
> > }
> > ```
>
> Jie Yu wrote:
> Having both 'future' and 'shuttingDown' in 'shutdown' method looks
> confusing to me:) You need to comment about why you need that future. Or, I
> think having dup code is not a too bad thing in this case as the logic is
> more clear IMO.
>
> Jie Yu wrote:
> Another option to avoid dup code is:
>
> 1) s/shutdown/scheduleShutdown
> 2) let 'shutdown' be the actual shutdown (sending message)
>
> Ben Mahler wrote:
> Imagine it is s/future/limited/:
>
> ```
> Future<Nothing> limited = Nothing();
>
> if (limiter.isSome()) {
> limited = limiter.get()->acquire();
> }
>
> shuttingDown = limited.onAny(defer(self(), &Self::_shutdown));
> ```
>
> Seems pretty clear? Can you flush out the second option you mentioned in
> a code snippet? I didn't follow.
>
> Jie Yu wrote:
> Yeah, that reads much better.
hmm. i actually liked the explicitness of the earlier version. needed less
cognitive overhead. but also liked the proposal of always calling
"_shutdown()". here's my proposal:
```
void shutdown()
{
if (shuttingDown.isSome()) {
// Shutdown is already in progress.
return;
}
if (limiter.isSome()) {
// If a rate limit is specified schedule the shutdown according
// to the rate limit.
LOG(INFO) << "Scheduling shutdown of slave " << slaveId
<< " due to health check timeout";
shuttingDown = limiter.get()->acquire();
} else {
// If no rate limit is specified shutdown right away.
shuttingDown = Future<Nothing>(Nothing()); // Ready future.
}
shuttingDown.get()
.onAny(defer(self(), &Self::_shutdown));
}
```
what do you guys think?
- Vinod
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30514/#review71021
-----------------------------------------------------------
On Feb. 4, 2015, 1:51 a.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30514/
> -----------------------------------------------------------
>
> (Updated Feb. 4, 2015, 1:51 a.m.)
>
>
> Review request for mesos, Ben Mahler, David Robinson, and Jie Yu.
>
>
> Bugs: MESOS-1148
> https://issues.apache.org/jira/browse/MESOS-1148
>
>
> Repository: mesos
>
>
> Description
> -------
>
> The algorithm is simple. All the slave observers share a rate limiter whose
> rate is configured via command line. When a slave times out on health check,
> a permit is acquired to shutdown the slave *but* the pings are continued to
> be sent. If a pong arrives before the permit is satisifed, the shutdown is
> cancelled.
>
>
> Diffs
> -----
>
> src/master/flags.hpp 6c18a1af625311ef149b5877b08f63c2b12c040d
> src/master/master.hpp cd37ee9d3c57bcd91f08cd402ec1e4a09d9e42ee
> src/master/master.cpp d04b2c4041d8fe8978b877f07579a6f907903e1b
> src/tests/partition_tests.cpp fea78016268b007590516798eb30ff423fd0ae58
> src/tests/slave_tests.cpp e7e2af63da785644f3f7e6e23607c02be962a2c6
>
> Diff: https://reviews.apache.org/r/30514/diff/
>
>
> Testing
> -------
>
> make check
>
> Ran the new tests 100 times
>
>
> Thanks,
>
> Vinod Kone
>
>