[ https://issues.apache.org/jira/browse/MESOS-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088571#comment-15088571 ]
Joseph Wu commented on MESOS-4306: ---------------------------------- For random outages, the {{/maintenance/status}} won't change, since only the operator can trigger these changes. When the framework goes to check the machine's status, the machine will either: # Not show up, if it hasn't been scheduled for maintenance # Show up as {{DRAINING}}, if it has been scheduled for maintenance, but not taken down by the operator yet. > AGENT_DEAD Message > ------------------ > > Key: MESOS-4306 > URL: https://issues.apache.org/jira/browse/MESOS-4306 > Project: Mesos > Issue Type: Task > Reporter: Gabriel Hartmann > > Frameworks currently receive SLAVE_LOST messages when an Agent fails or is > behind a network partition for some period of time. However frameworks and > indeed Mesos cannot differentiate between an Agent being temporarily or > permanently lost. > It would be good to have a message indicating that an Agent is lost and won't > be returning. This would require human intervention so an endpoint should be > exposed to induce the sending of this message. > This is particularly helpful for frameworks which are waiting for the return > of persistent volumes. In the case where an Agent hosting significant data > (multi terabyte) the framework may be willing to wait a significant amount of > time before repairing its replication factor (for example). Explicit human > provided information about the permanent state of Agents and therefore their > resources would allow these kinds of frameworks to accelerate their recovery > timelines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)