----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48387/#review136869 -----------------------------------------------------------
Thanks for taking this on. Looks pretty good minus a small bug that can lead to a crash (see review comments). src/scheduler/constants.hpp (line 28) <https://reviews.apache.org/r/48387/#comment201950> I won't call it a backoff since it's just a _delay_ i.e. a wait time before the scheduler library tries to establish a connection with the master. How about: `DEFAULT_CONNECTION_DELAY_MAX` src/scheduler/flags.hpp (lines 35 - 36) <https://reviews.apache.org/r/48387/#comment201962> How about: ```cpp The maximum amount of time to wait before trying to initiate a connection with the master. The library waits for a random amount of time between [0, b], where `b = connection_delay_max` before initiating a (re-)connection attempt with the master. ``` src/scheduler/flags.hpp (line 40) <https://reviews.apache.org/r/48387/#comment201951> s/connection_backoff_max/connection_delay_max src/scheduler/scheduler.cpp (line 73) <https://reviews.apache.org/r/48387/#comment201963> Let's reorder these as per comments from @haosdent. src/scheduler/scheduler.cpp (line 196) <https://reviews.apache.org/r/48387/#comment201974> Why do we need this variable? Can't we directly use `flags.connection_delay_max`? src/scheduler/scheduler.cpp (lines 475 - 476) <https://reviews.apache.org/r/48387/#comment201979> How about: ``` // Wait for a random duration between 0 and `flags.connection_delay_max` before (re-)connecting with the master. ``` src/scheduler/scheduler.cpp (line 480) <https://reviews.apache.org/r/48387/#comment201977> How about: ```cpp VLOG(1) << "Waiting for " << delay << " before initiating a (re-)connection attempt with the master"; ``` src/scheduler/scheduler.cpp (line 481) <https://reviews.apache.org/r/48387/#comment201985> hmmm, this can crash the library in its current form due to a failed invariant check: Consider this scenario: - A new master is detected. You introduce a random amount of delay in connecting with this master. - The new master fails over and a new master is detected before the `delay` can be invoked. - The `delay` for the second master is lesser and the library connects with the master. - The original `delay` now fires and we fail this `CHECK` invariant on: https://github.com/apache/mesos/blob/master/src/scheduler/scheduler.cpp#L320 Here is how to get around this: Let's initialize `connectionId` here and not in `connect()` on L474. The signature of `connect()` would then change to something like: `void connect(const UUID& _connectionId)` This is how a modified `connect` would look like: ```cpp void connect(const UUID& _connectionId) { // It is possible that a new master was detected while we were waiting to establish a connection with the old master. if (connectionId != _connectionId) { VLOG(1) << "Ignoring connection attempt from stale connection"; return; } .... // existing content of `connect()` thereafter. ``` Makes sense? src/scheduler/scheduler.cpp (line 739) <https://reviews.apache.org/r/48387/#comment201975> Kill this. - Anand Mazumdar On June 8, 2016, 8:53 a.m., Jose Guilherme Vanz wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48387/ > ----------------------------------------------------------- > > (Updated June 8, 2016, 8:53 a.m.) > > > Review request for mesos, Anand Mazumdar, haosdent huang, and Vinod Kone. > > > Repository: mesos > > > Description > ------- > > The scheduler library has been updated to wait a little deley before > initiate a connection with the master. The maximum amount of time waited > by the scheduler is defined by a flag: CONNECTION_BACKOFF_MAX. After > the master has been detected the scheduler picks a random delay that > can be between 0 and the CONNECTION_BACKOFF_MAX value. MESOS-5359 > > > Diffs > ----- > > src/scheduler/constants.hpp PRE-CREATION > src/scheduler/flags.hpp PRE-CREATION > src/scheduler/scheduler.cpp c79837c93e7329dbfa22e4288b44237f33408ba9 > > Diff: https://reviews.apache.org/r/48387/diff/ > > > Testing > ------- > > > Thanks, > > Jose Guilherme Vanz > >