[ https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303537#comment-15303537 ]
Anand Mazumdar commented on MESOS-5468: --------------------------------------- [~guoger] I edited the JIRA description a bit. Let me know if it does not align with your observations. Also, we do close the socket on the master's side upon a framework disconnect/teardown. https://github.com/apache/mesos/blob/master/src/master/master.cpp#L2795 Can you confirm on your end if you are not seeing this behavior and some steps to reproduce it? > Add logic in long-lived-framework to handle network partitions. > --------------------------------------------------------------- > > Key: MESOS-5468 > URL: https://issues.apache.org/jira/browse/MESOS-5468 > Project: Mesos > Issue Type: Task > Components: framework, master > Reporter: Jay Guo > > Currently long-lived-framework does not handle network partitions i.e > explicitly trying to {{reconnect}} with the master upon not receiving > {{HEARTBEAT}} events for a prolonged amount of time. If the master > disconnects a framework without the framework being aware of it (one way > partition), the framework should explicitly issue a {{reconnect}} request via > the scheduler library after a certain period of time. > *On the other hand*, should we close TCP socket on master side when teardown > a framework? Currently the tcp socket is left alive even framework has been > deactivated. This results in framework sending invalid {{Call}} to master and > re-detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)