Benjamin Bannier created MESOS-10018:
----------------------------------------

             Summary: Duplicate tasks if agent partitioned during maintenance 
down period
                 Key: MESOS-10018
                 URL: https://issues.apache.org/jira/browse/MESOS-10018
             Project: Mesos
          Issue Type: Bug
            Reporter: Benjamin Bannier


When the master starts maintenance for a node it

(1) sends a {{ShutdownMessage}} message to agent, and
(2) removes the slave which transitions all tasks to {{TASK_LOST}} and moves 
them
to the completed task set.

If the {{ShutdownMessage}} isn't fully processed on the agent (e.g., message 
dropped between (1) and (2), or agent process killed before the executor has 
shut down), the agent could come back with the lost task running. It would 
report the task on registration with the master, which would add it to the list 
of active tasks. With that the same task could be both completed and active.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to