[ 
https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303509#comment-15303509
 ] 

Jay Guo commented on MESOS-5468:
--------------------------------

To reproduce:
* Start master and agent
* Run long-lived-framework
* Issue {{# iptables -A OUTPUT -p tcp -d <master-ip> --dport 5050 -j DROP}} on 
framework machine to emulate network partition
* Wait till master deactivates the framework
* Remove iptables rule added above to emulate network rejoin
* See log of both long-lived-framework and master. {{netstat -tpn}} also shows 
enormous {{TIME_WAIT}} sockets which is the result of re-detection

> Add logic to long-lived-framework to handle HEARTBEAT timeout
> -------------------------------------------------------------
>
>                 Key: MESOS-5468
>                 URL: https://issues.apache.org/jira/browse/MESOS-5468
>             Project: Mesos
>          Issue Type: Bug
>          Components: framework, master
>            Reporter: Jay Guo
>
> Currently long-lived-framework does not handle HEARTBEAT timeout. If master 
> teardown the framework without framework being aware of it (network 
> partition), the framework keeps waiting for {{Event}} until reconnected.
> *On the other hand*, should we close TCP socket on master side when teardown 
> a framework? Currently the tcp socket is left alive even framework has been 
> deactivated. This results in framework sending invalid {{Call}} to master and 
> re-detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to