[ https://issues.apache.org/jira/browse/MESOS-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139261#comment-16139261 ]
Benjamin Mahler commented on MESOS-7865: ---------------------------------------- {noformat} commit 0b9c3dedb04e9bf2c3d1f1663cf9cd4f47cb674b Author: Benjamin Mahler <bmah...@apache.org> Date: Thu Aug 10 18:34:15 2017 -0700 Fixed a bug where the agent kills and still launches a task. The following race leads to the agent both killing and launching a task: (1) Slave::__run completes, task is now within Executor::queuedTasks. (2) Slave::killTask locates the executor based on the task ID residing in queuedTasks, calls Slave::statusUpdate() with TASK_KILLED. (3) Slave::___run assumes that killed tasks have been removed from Executor::queuedTasks, but this now occurs asynchronously in Slave::_statusUpdate. So, the agent still sees the queued task and delivers it and adds the task to Executor::launchedTasks. (3) Slave::_statusUpdate runs, removes the task from Executor::launchedTasks and adds it to Executor::terminatedTasks. The fix applied here is to synchronously transition queued tasks to a terminal state when statusUpdate is called. This can be done because for queued tasks, we do not need to retrieve the container status (the task never reached the container). Review: https://reviews.apache.org/r/61639 {noformat} > Agent may process a kill task and still launch the task. > -------------------------------------------------------- > > Key: MESOS-7865 > URL: https://issues.apache.org/jira/browse/MESOS-7865 > Project: Mesos > Issue Type: Bug > Components: agent > Reporter: Benjamin Mahler > Assignee: Benjamin Mahler > Priority: Critical > Fix For: 1.5.0 > > > Based on the investigation of MESOS-7744, the agent has a race in which > "queued" tasks can still be launched after the agent has processed a kill > task for them. This race was introduced when {{Slave::statusUpdate}} was made > asynchronous: > (1) {{Slave::__run}} completes, task is now within {{Executor::queuedTasks}} > (2) {{Slave::killTask}} locates the executor based on the task ID residing in > queuedTasks, calls {{Slave::statusUpdate()}} with {{TASK_KILLED}} > (3) {{Slave::___run}} assumes that killed tasks have been removed from > {{Executor::queuedTasks}}, but this now occurs asynchronously in > {{Slave::_statusUpdate}}. So, the executor still sees the queued task and > delivers it and adds the task to {{Executor::launchedTasks}}. > (3) {{Slave::_statusUpdate}} runs, removes the task from > {{Executor::launchedTasks}} and adds it to {{Executor::terminatedTasks}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)