> On Feb. 27, 2015, 11:13 p.m., Vinod Kone wrote: > > src/master/master.cpp, line 1249 > > <https://reviews.apache.org/r/31515/diff/1/?file=879515#file879515line1249> > > > > why not just "string"?
This was to avoid a non-POD static, I've just gone with non-static const string for simplicity. > On Feb. 27, 2015, 11:13 p.m., Vinod Kone wrote: > > src/master/master.cpp, line 1256 > > <https://reviews.apache.org/r/31515/diff/1/?file=879515#file879515line1256> > > > > I dont think this should be considered towards the shutdown metric, > > because this is just removal. > > > > s/slave_shutdowns_scheduled/slave_removals_scheduled/ Vinod and I ended up chatting quite a bit about this around how this stuff might change going forward, wanted to share the details here: * How would this look in the HTTP API? Would we use the same logic (and metrics) for slaves to re-register in time between steady-state vs. master failover? * If we wanted to re-use the SlaveObserver (it's safer since it prevents slave removal if the slave can health check), would we re-use the `"slave_shutdowns_*"` metrics? Probably. (We can't re-use SlaveObserver today because we need the slave's PID). Since we might re-use SlaveObserver for this, we'll keep the same metric names. - Ben ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31515/#review74583 ----------------------------------------------------------- On Feb. 27, 2015, 2:58 a.m., Ben Mahler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/31515/ > ----------------------------------------------------------- > > (Updated Feb. 27, 2015, 2:58 a.m.) > > > Review request for mesos and Vinod Kone. > > > Bugs: MESOS-2392 > https://issues.apache.org/jira/browse/MESOS-2392 > > > Repository: mesos > > > Description > ------- > > Much like we rate limit slave removals in the common path (MESOS-1148), we > need to rate limit slave removals that occur during master recovery. When a > master recovers and is using a strict registry, slaves that do not > re-register within a timeout will be removed. > > Currently there is a safeguard in place to abort when too many slaves have > not re-registered. However, in the case of a transient partition, we don't > want to remove large sections of slaves without rate limiting. > > > Diffs > ----- > > src/master/master.hpp 8c44d6ed57ad1b94a17bef8142a5e6a15889a810 > src/master/master.cpp 76e217d16c03e587ea4c0afca94c58b2212f0f93 > > Diff: https://reviews.apache.org/r/31515/diff/ > > > Testing > ------- > > make check > > Added tests in subsequent review. > > > Thanks, > > Ben Mahler > >
