[ https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303587#comment-15303587 ]
Jay Guo commented on MESOS-5468: -------------------------------- See steps to reproduce in my first comment. > Add logic in long-lived-framework to handle network partitions. > --------------------------------------------------------------- > > Key: MESOS-5468 > URL: https://issues.apache.org/jira/browse/MESOS-5468 > Project: Mesos > Issue Type: Task > Components: framework, master > Reporter: Jay Guo > > Currently long-lived-framework does not handle network partitions i.e > explicitly trying to {{reconnect}} with the master upon not receiving > {{HEARTBEAT}} events for a prolonged amount of time. If the master > disconnects a framework without the framework being aware of it (one way > partition), the framework should explicitly issue a {{reconnect}} request via > the scheduler library after a certain period of time. > *On the other hand*, should we close TCP socket on master side when teardown > a framework? Currently the tcp socket is left alive even framework has been > deactivated. This results in framework sending invalid {{Call}} to master and > re-detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)