> On July 15, 2019, 11:14 a.m., Benjamin Bannier wrote: > > src/master/master.cpp > > Lines 6255-6260 (patched) > > <https://reviews.apache.org/r/71008/diff/4/?file=2154545#file2154545line6255> > > > > It seems we only do this check to make sure we can access the config > > below which introduces quite some coupling. Is there a reason we couldn't > > grab the config outside the lambda and capture it instead (i.e., do we want > > to support mutable drain configs)? That would allow us to reduce coupling > > between `Slave::draining` and `markGone`. > > Joseph Wu wrote: > This check is specifically to guard against an interleaving of the > `RemoveSlave` and `MarkAgentDrained` registry operations. There are a > variety of ways to trigger the `RemoveSlave`, one of which is shutting down > the agent (SIGUSR1). > > So imagine the following sequence of events: > 1) Agent sends the master a `UnregisterSlaveMessage`. > 2) Master starts the `RemoveSlave` operation. > 3) Final terminal ACK arrives at the master, which causes master to call > `checkAndTransitionDrainingAgent` and `MarkAgentDrained`. > 4) `RemoveSlave` completes. Master clears memory of that agent. > 5) `MarkAgentDrained` completes. Master no longer knows about that agent > and hits this LOG line.
That chain of event seems pretty clear, but I was after something else: right now we seem to perform this check here just so we can access the config; `markGone` asserts that `slaves.markingGone.contains(slaveId)` while we here ensure `slaves.draining.contains(slaveId)`. That seems like unnecessary and complicated coupling to me which I'd prefer we wouldn't introduce. In order to remove the need for checking `slaves.draining` we could capture the drain config by value into the closure (which would effectively require that drain configs are immutable) and would then invoke `markGone` regardless on whether an agent is in `slaves.draining`. For your point (5) we should instead perform a precondition check with something more closely related, e.g., check whether the agent is present in `slaves.markingGone`. - Benjamin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71008/#review216605 ----------------------------------------------------------- On July 15, 2019, 8:19 p.m., Joseph Wu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71008/ > ----------------------------------------------------------- > > (Updated July 15, 2019, 8:19 p.m.) > > > Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and > Vinod Kone. > > > Bugs: MESOS-9814 > https://issues.apache.org/jira/browse/MESOS-9814 > > > Repository: mesos > > > Description > ------- > > This adds logic in the master to detect when a DRAINING agent can > be transitioned into a DRAINED state. When this happens, the new > state is checkpointed into the registry and, if the agent is to be > marked "gone", the master will remove the agent. > > > Diffs > ----- > > src/master/http.cpp cd0f40cb7b966d6620e3fb49d4c08807185c9101 > src/master/master.hpp e8def83fe9bcee19772df9a9764852bc694c5247 > src/master/master.cpp 5247377c2e7e92b9843dd4c9d28f92ba679ad742 > > > Diff: https://reviews.apache.org/r/71008/diff/5/ > > > Testing > ------- > > See: https://reviews.apache.org/r/71069/ > > > Thanks, > > Joseph Wu > >