Greg Mann created MESOS-8574:
--------------------------------

             Summary: Docker executor makes no progress when 'docker inspect' 
hangs
                 Key: MESOS-8574
                 URL: https://issues.apache.org/jira/browse/MESOS-8574
             Project: Mesos
          Issue Type: Improvement
          Components: docker, executor
    Affects Versions: 1.5.0
            Reporter: Greg Mann


In the Docker executor, many calls later in the executor's lifecycle are gated 
on an initial {{docker inspect}} call returning: 
https://github.com/apache/mesos/blob/bc6b61bca37752689cffa40a14c53ad89f24e8fc/src/docker/executor.cpp#L223

If that first call to {{docker inspect}} never returns, the executor becomes 
stuck in a state where it makes no progress and cannot be killed.

It's tempting for the executor to simply commit suicide after a timeout, but we 
must be careful of the case in which the executor's Docker container is 
actually running successfully, but the Docker daemon is unresponsive. In such a 
case, we do not want to send TASK_FAILED or TASK_KILLED if the task's container 
is running successfully.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to