[ 
https://issues.apache.org/jira/browse/MESOS-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039590#comment-14039590
 ] 

Timothy Chen commented on MESOS-1503:
-------------------------------------

I'm thinking about the design of the single slave observer, and here is my 
current thoughts:

1, Have a single SlavesObserver protobuf process, holds a map of all the slaves 
to be pinged, a map of promises for the current ping, and current ping 
generation. 

It can have the following interface:

void registerSlave(SlaveID, UPID) : add a new slave to be health checked

void unregisterSlave(SlaveID) : remove slave to be health checked

void pingAllSlaves(): This sends out a ping to all slaves reigistered, which 
creates a promise for each slave ping and holds it. In the end it collects all 
the from the promises futures into one future and defer it to completePing(). 
It increments the generation id and sends that via message body to all slaves.

void pong(UPID from, string body): response callback from slave ping. body is 
the current ping generation id which the slave simply replies from the ping 
body. We also verify that the pong is sent for the current ping generation, as 
if the pong is delayed and we received an old pong we skip it. We also skip 
unregistered slave pongs too.

void timeout(UPID from): The timeout for the slave ping that just sets the 
promise to false.

void completePing(): In the end we look at all remaining futures and collect 
the failed ones, verify that it still registered and send them all to the 
master for termination. We can either have the opportunity to throttle or do 
more decisions based on all the failures at once. (We can also move the logic 
to master, haven't really know what's best yet).

One issue came in mind is that now we're sending all the pings at once, and I 
wonder if it can cause a burst of messages especially large amount of slaves. 
One way is to group slaves to be pinged in different intervals, but could be 
something further in the future.



> Improve slave health checking to prevent rapid widespread slave removals.
> -------------------------------------------------------------------------
>
>                 Key: MESOS-1503
>                 URL: https://issues.apache.org/jira/browse/MESOS-1503
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master
>            Reporter: Benjamin Mahler
>            Assignee: Timothy Chen
>              Labels: reliability
>
> Per some discussions with [~tweingartner] and [~vinodkone].
> Currently the master uses a SlaveObserver for each registered slave. Each 
> SlaveObserver operates independently and makes decisions about whether the 
> slave is healthy.
> The independence of these observers means that in some very rare events (e.g. 
> masters are partitioned from 75% of slaves), the master can very rapidly 
> remove a large portion of the slaves in the cluster. Ideally such an event 
> could be deemed dangerous and throttled accordingly through a more 
> intelligent notion of overall cluster health.
> It may be nice to have a single observer that is responsible for health 
> checking all the slaves. This will allow us to make safer decisions as to 
> when to determine that slaves are unhealthy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to