> On Feb. 27, 2015, 11:13 p.m., Vinod Kone wrote:
> > src/master/master.cpp, line 1249
> > <https://reviews.apache.org/r/31515/diff/1/?file=879515#file879515line1249>
> >
> >     why not just "string"?

This was to avoid a non-POD static, I've just gone with non-static const string 
for simplicity.


> On Feb. 27, 2015, 11:13 p.m., Vinod Kone wrote:
> > src/master/master.cpp, line 1256
> > <https://reviews.apache.org/r/31515/diff/1/?file=879515#file879515line1256>
> >
> >     I dont think this should be considered towards the shutdown metric, 
> > because this is just removal.
> >     
> >     s/slave_shutdowns_scheduled/slave_removals_scheduled/

Vinod and I ended up chatting quite a bit about this around how this stuff 
might change going forward, wanted to share the details here:

* How would this look in the HTTP API? Would we use the same logic (and 
metrics) for slaves to re-register in time between steady-state vs. master 
failover?
* If we wanted to re-use the SlaveObserver (it's safer since it prevents slave 
removal if the slave can health check), would we re-use the 
`"slave_shutdowns_*"` metrics? Probably. (We can't re-use SlaveObserver today 
because we need the slave's PID).

Since we might re-use SlaveObserver for this, we'll keep the same metric names.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31515/#review74583
-----------------------------------------------------------


On Feb. 27, 2015, 2:58 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31515/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2015, 2:58 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-2392
>     https://issues.apache.org/jira/browse/MESOS-2392
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Much like we rate limit slave removals in the common path (MESOS-1148), we 
> need to rate limit slave removals that occur during master recovery. When a 
> master recovers and is using a strict registry, slaves that do not 
> re-register within a timeout will be removed.
> 
> Currently there is a safeguard in place to abort when too many slaves have 
> not re-registered. However, in the case of a transient partition, we don't 
> want to remove large sections of slaves without rate limiting.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 8c44d6ed57ad1b94a17bef8142a5e6a15889a810 
>   src/master/master.cpp 76e217d16c03e587ea4c0afca94c58b2212f0f93 
> 
> Diff: https://reviews.apache.org/r/31515/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Added tests in subsequent review.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>

Reply via email to