----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27315/#review59102 -----------------------------------------------------------
Ship it! Looks good, just once concern about bounding for large failover timeouts and one note about cleanup. src/sched/constants.hpp <https://reviews.apache.org/r/27315/#comment100418> newline here? src/sched/sched.cpp <https://reviews.apache.org/r/27315/#comment100428> What is the (s) conveying here? Could you just say, "similar to the slave"? src/sched/sched.cpp <https://reviews.apache.org/r/27315/#comment100698> How about the following? We can do this for the slave too to make it a bit easier (it's a bit hard to understand currently with 'duration' as a name): ``` void doReliableRegistration(Duration maxBackoff) { ... maxBackoff = std::min(maxBackoff, scheduler:: REGISTRATION_RETRY_INTERVAL_DEFAULT_MAX); if (framework.has_failover_timeout()) { ... // Don't approach the failover timeout too closely. maxBackoff = std::min(maxBackoff, timeout.get() / 10); } Duration delay = std::min( duration * ((double) ::random() / RAND_MAX), maxBackoff); VLOG(1) << "will retry registration in " << delay << " if necessary"; process::delay(delay, self(), &Self::doReliableRegistration, maxBackoff * 2); } ``` Notice that we don't need to store duration_ at all since it's the responsibility of doReliableRegistration to bound the maximum anyway! Could we do this cleanup in the slave too? src/sched/sched.cpp <https://reviews.apache.org/r/27315/#comment100699> How about s/duration/timeout/ ? src/sched/sched.cpp <https://reviews.apache.org/r/27315/#comment100700> We should bound this as well, to ensure that we don't backoff *too* much! - Ben Mahler On Oct. 29, 2014, 11:16 p.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27315/ > ----------------------------------------------------------- > > (Updated Oct. 29, 2014, 11:16 p.m.) > > > Review request for mesos and Ben Mahler. > > > Bugs: MESOS-1903 > https://issues.apache.org/jira/browse/MESOS-1903 > > > Repository: mesos-git > > > Description > ------- > > Uses the same backoff (except no initial backoff) strategy used by the slave > during registration. > > > Diffs > ----- > > src/Makefile.am 374f284e1ac839fbcd8a28171b1ff4fbe8a17bd4 > src/sched/constants.hpp PRE-CREATION > src/sched/constants.cpp PRE-CREATION > src/sched/flags.hpp PRE-CREATION > src/sched/sched.cpp 0fb8c7bda75545389f8024489b3c76ae115111f4 > src/tests/fault_tolerance_tests.cpp > a18a41a3e34ff112e04e693447d757403e5013bd > > Diff: https://reviews.apache.org/r/27315/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Vinod Kone > >