[ https://issues.apache.org/jira/browse/MESOS-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969497#comment-14969497 ]
Megha commented on MESOS-3545: ------------------------------ Here's the first draft of the design for Persistent Tasks. Looking forward to feedback and comments. https://docs.google.com/document/d/1l7goeISpYmCjM03l20lmjZ6_BMfdxBs31znEBRtzsuU/edit?usp=sharing > Investigate restoring tasks/executors after machine reboot. > ----------------------------------------------------------- > > Key: MESOS-3545 > URL: https://issues.apache.org/jira/browse/MESOS-3545 > Project: Mesos > Issue Type: Improvement > Components: slave > Reporter: Benjamin Hindman > Labels: mesosphere > > If a task/executor is restartable (see MESOS-3544) it might make sense to > force an agent to restart these tasks/executors _before_ after a machine > reboot in the event that the machine is network partitioned away from the > master (or the master has failed) but we'd like to get these services running > again. Assuming the agent(s) running on the machine has not been disconnected > from the master for longer than the master's agent re-registration timeout > the agent should be able to re-register (i.e., after a network partition is > resolved) without a problem. However, in the same way that a framework would > be interested in knowing that it's tasks/executors were restarted we'd want > to send something like a TASK_RESTARTED status update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)