[ 
https://issues.apache.org/jira/browse/MESOS-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037144#comment-15037144
 ] 

Klaus Ma commented on MESOS-4048:
---------------------------------

My understanding is that: {{max_slave_ping_timeouts}} + {{slave_ping_timeout}} 
is used to trigger TCP disconnected event; so master wait 
{{slave_reregister_timeout}} for slave to re-register. If master got TCP 
disconnected event, it should not ping Slave by {{max_slave_ping_timeouts}} + 
{{slave_ping_timeout}}.

{{max_slave_ping_timeouts}} + {{slave_ping_timeout}} is used to simulate 
TCP-KeepAlive which is not well supported in some OS.

> Consider unifying slave timeout behavior between steady state and master 
> failover
> ---------------------------------------------------------------------------------
>
>                 Key: MESOS-4048
>                 URL: https://issues.apache.org/jira/browse/MESOS-4048
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master, slave
>            Reporter: Neil Conway
>            Priority: Minor
>              Labels: mesosphere
>
> Currently, there are two timeouts that control what happens when an agent is 
> partitioned from the master:
> 1. {{max_slave_ping_timeouts}} + {{slave_ping_timeout}} controls how long the 
> master waits before declaring a slave to be dead in the "steady state"
> 2. {{slave_reregister_timeout}} controls how long the master waits for a 
> slave to reregister after master failover.
> It is unclear whether these two cases really merit being treated differently 
> -- it might be simpler for operators to configure a single timeout that 
> controls how long the master waits before declaring that a slave is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to