[ https://issues.apache.org/jira/browse/MESOS-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592587#comment-15592587 ]
Megha commented on MESOS-3545: ------------------------------ Updated design doc for Restartable tasks. Looking forward to feedback and comments. https://docs.google.com/document/d/1YS_EBUNLkzpSru0dwn_hPUIeTATiWckSaosXSIaHUCo/edit#heading=h.tlevdyt3yv0a > Investigate restoring tasks/executors after machine reboot. > ----------------------------------------------------------- > > Key: MESOS-3545 > URL: https://issues.apache.org/jira/browse/MESOS-3545 > Project: Mesos > Issue Type: Epic > Components: slave > Reporter: Benjamin Hindman > Assignee: Megha > > If a task/executor is restartable (see MESOS-3544) it might make sense to > force an agent to restart these tasks/executors _before_ after a machine > reboot in the event that the machine is network partitioned away from the > master (or the master has failed) but we'd like to get these services running > again. Assuming the agent(s) running on the machine has not been disconnected > from the master for longer than the master's agent re-registration timeout > the agent should be able to re-register (i.e., after a network partition is > resolved) without a problem. However, in the same way that a framework would > be interested in knowing that it's tasks/executors were restarted we'd want > to send something like a TASK_RESTARTED status update. -- This message was sent by Atlassian JIRA (v6.3.4#6332)