[ 
https://issues.apache.org/jira/browse/MESOS-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139261#comment-16139261
 ] 

Benjamin Mahler commented on MESOS-7865:
----------------------------------------

{noformat}
commit 0b9c3dedb04e9bf2c3d1f1663cf9cd4f47cb674b
Author: Benjamin Mahler <bmah...@apache.org>
Date:   Thu Aug 10 18:34:15 2017 -0700

    Fixed a bug where the agent kills and still launches a task.

    The following race leads to the agent both killing and launching a task:

      (1) Slave::__run completes, task is now within Executor::queuedTasks.
      (2) Slave::killTask locates the executor based on the task ID residing
          in queuedTasks, calls Slave::statusUpdate() with TASK_KILLED.
      (3) Slave::___run assumes that killed tasks have been removed from
          Executor::queuedTasks, but this now occurs asynchronously in
          Slave::_statusUpdate. So, the agent still sees the queued task
          and delivers it and adds the task to Executor::launchedTasks.
      (3) Slave::_statusUpdate runs, removes the task from
          Executor::launchedTasks and adds it to Executor::terminatedTasks.

    The fix applied here is to synchronously transition queued tasks to
    a terminal state when statusUpdate is called. This can be done because
    for queued tasks, we do not need to retrieve the container status (the
    task never reached the container).

    Review: https://reviews.apache.org/r/61639
{noformat}

> Agent may process a kill task and still launch the task.
> --------------------------------------------------------
>
>                 Key: MESOS-7865
>                 URL: https://issues.apache.org/jira/browse/MESOS-7865
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>            Reporter: Benjamin Mahler
>            Assignee: Benjamin Mahler
>            Priority: Critical
>             Fix For: 1.5.0
>
>
> Based on the investigation of MESOS-7744, the agent has a race in which 
> "queued" tasks can still be launched after the agent has processed a kill 
> task for them. This race was introduced when {{Slave::statusUpdate}} was made 
> asynchronous:
> (1) {{Slave::__run}} completes, task is now within {{Executor::queuedTasks}}
> (2) {{Slave::killTask}} locates the executor based on the task ID residing in 
> queuedTasks, calls {{Slave::statusUpdate()}} with {{TASK_KILLED}}
> (3) {{Slave::___run}} assumes that killed tasks have been removed from 
> {{Executor::queuedTasks}}, but this now occurs asynchronously in 
> {{Slave::_statusUpdate}}. So, the executor still sees the queued task and 
> delivers it and adds the task to {{Executor::launchedTasks}}.
> (3) {{Slave::_statusUpdate}} runs, removes the task from 
> {{Executor::launchedTasks}} and adds it to {{Executor::terminatedTasks}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to