> On Feb. 18, 2015, 7:38 p.m., Niklas Nielsen wrote: > > Let's get tests wired up before committing this :) > > Adam B wrote: > Sure thing. Adding tests in my subsequent patch where we will pass the > master's timeout values on to the slave. Will post that very soon.
Can you do it in one patch? This patch in isolation looks a bit dangerous per our conversation above. Also, please carefully consider whether your approach will be safe to do in a single version. i.e. What happens when there are old slaves running against a new master? And vice versa. - Ben ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29507/#review72992 ----------------------------------------------------------- On Feb. 19, 2015, 12:33 a.m., Adam B wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/29507/ > ----------------------------------------------------------- > > (Updated Feb. 19, 2015, 12:33 a.m.) > > > Review request for mesos and Niklas Nielsen. > > > Bugs: MESOS-2110 > https://issues.apache.org/jira/browse/MESOS-2110 > > > Repository: mesos > > > Description > ------- > > Added new --slave_ping_timeout and --max_slave_ping_timeouts flags > to mesos-master to supplement the DEFAULT_SLAVE_PING_TIMEOUT (15secs) > and DEFAULT_MAX_SLAVE_PING_TIMEOUTS (5). > > These can be extended if slaves are expected/allowed to be down for > longer than a minute or two. > > Beware that this affects recovery from network timeouts as well as > actual slave node/process failover. > > > Diffs > ----- > > src/master/constants.hpp c386eab > src/master/constants.cpp 9ee17e9 > src/master/flags.hpp 6c18a1a > src/master/master.cpp f4b6463 > src/slave/constants.hpp 761cfaf > src/slave/constants.cpp 83d9fc1 > src/slave/slave.cpp a8b2621 > src/tests/fault_tolerance_tests.cpp f927d4a > src/tests/partition_tests.cpp fea7801 > src/tests/slave_recovery_tests.cpp 7e2e63d > src/tests/slave_tests.cpp e7e2af6 > > Diff: https://reviews.apache.org/r/29507/diff/ > > > Testing > ------- > > Manually tested slave failover/shutdown with master using different > --slave_ping_timeout and --max_slave_ping_timeouts. > Ran unit tests with shorter non-default values for ping timeouts. > > > Thanks, > > Adam B > >
