[ https://issues.apache.org/jira/browse/MESOS-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benjamin Bannier reassigned MESOS-10018: ---------------------------------------- Shepherd: Benno Evers Sprint: Foundations: RI-19 57 Assignee: Benjamin Bannier > Duplicate tasks if agent partitioned during maintenance down period > ------------------------------------------------------------------- > > Key: MESOS-10018 > URL: https://issues.apache.org/jira/browse/MESOS-10018 > Project: Mesos > Issue Type: Bug > Reporter: Benjamin Bannier > Assignee: Benjamin Bannier > Priority: Major > > When the master starts maintenance for a node it > (1) sends a {{ShutdownMessage}} message to agent, and > (2) removes the slave which transitions all tasks to {{TASK_LOST}} and moves > them > to the completed task set. > If the {{ShutdownMessage}} isn't fully processed on the agent (e.g., message > dropped between (1) and (2), or agent process killed before the executor has > shut down), the agent could come back with the lost task running. It would > report the task on registration with the master, which would add it to the > list of active tasks. With that the same task could be both completed and > active. -- This message was sent by Atlassian Jira (v8.3.4#803005)