----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/64930/#review194946 -----------------------------------------------------------
The structure of the new `markUnreachable()` seems to be `[recover slave info depending on bool parameter] -> common code path to apply registry operation -> [notification/cleanup depending on bool parameter]`. Intuitively I'd try to get rid of the bool parameter and only put the shared part into `markUnreachable()`, but on the other hand the code might get even more complicated when you attempt to do this, so it should be fine to leave the general structure as is. src/master/master.hpp Lines 510 (patched) <https://reviews.apache.org/r/64930/#comment274049> The description doesn't seem to be completely accurate, as it will also return false if the agent was already marked as unreachable. Also, it would probably be helpful to add a quick explanation of what the `duringFailover` parameter is supposed to change inside the function (i.e. true -> transition from recovered to unreachable, false -> transition from registered to unreachable) src/master/master.hpp Lines 512 (patched) <https://reviews.apache.org/r/64930/#comment274054> Whats actually the reasoning behind crashing the master instead of returning a failed future? src/master/master.cpp Line 2009 (original), 2011 (patched) <https://reviews.apache.org/r/64930/#comment274043> The comment needs to be updated. src/master/master.cpp Line 2024 (original), 2025 (patched) <https://reviews.apache.org/r/64930/#comment274039> Double space after 'within'. src/master/master.cpp Lines 8147 (patched) <https://reviews.apache.org/r/64930/#comment274041> Double space after `agent` (same for most other new log messages) src/master/master.cpp Line 8277 (original), 8218 (patched) <https://reviews.apache.org/r/64930/#comment274040> Do we need an `onDiscarded()` handler on an `undiscardable()` future? src/master/master.cpp Line 8304 (original), 8235 (patched) <https://reviews.apache.org/r/64930/#comment274050> After a similar a change I got the following comment from @dzhuk, which I'll relay here because I think it's justified: > I think `CHECK` should be used to validate invariants that we have from some other context. but error here is something that can happen during normal operation. and the original version would produce a meaningful message in log src/master/master.cpp Lines 8251 (patched) <https://reviews.apache.org/r/64930/#comment274042> Not sure what our coding style says about this, but we should think about preferring `.find()` over `.contains()` to avoid the double lookup. (which gcc cannot optimize away even at `-O3`, according to some quick local testing) - Benno Evers On Jan. 3, 2018, 10:48 p.m., Benjamin Mahler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/64930/ > ----------------------------------------------------------- > > (Updated Jan. 3, 2018, 10:48 p.m.) > > > Review request for mesos, Benno Evers and Vinod Kone. > > > Repository: mesos > > > Description > ------- > > The logic for marking an agent unreachable in the master had two > very similar code paths that differed slightly across failover > and steady state cases. This patch uses a single code path. > > Unfortunately, some slight differences were necessary, and a > failover boolean was introduced to accomodate them. Specifically, > the failover and steady state cases expect the agent to reside > in the recovered and registered lists, respectively. > > > Diffs > ----- > > src/master/master.hpp 8fe9420dbe03ea2cefc6a40b0f64284aa9fe7915 > src/master/master.cpp 03eb178fa1af7d55ae387e6cb42cdc8d721a2196 > src/tests/slave_tests.cpp fb49077d7cb81db450d9fa24f75dbd9c79ef645c > > > Diff: https://reviews.apache.org/r/64930/diff/1/ > > > Testing > ------- > > make check > > > Thanks, > > Benjamin Mahler > >