Andrei Budnik created MESOS-9230:
------------------------------------

             Summary: Docker executor may stuck in infinite loop when `docker 
run` hangs.
                 Key: MESOS-9230
                 URL: https://issues.apache.org/jira/browse/MESOS-9230
             Project: Mesos
          Issue Type: Bug
          Components: docker, executor
    Affects Versions: 1.6.0, 1.5.1, 1.4.2, 1.2.3
            Reporter: Andrei Budnik


This issue happens due to a very slow/unresponsive Docker daemon.

Observed behaviour of the Docker executor:
 # Agent launches the Docker executor, which calls `docker run` to launch a 
container.
 # `docker inspect` hangs each time it's called, so the docker executor 
[retries in a 
loop|https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L244-L275]
 without success.
 # After 5 minutes, a framework (Marathon) sends first `killTask` message, 
which 
[interrupts|https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L543-L550]
 the previous `docker inspect` loop.
 # Then, `killTask()` launches the very first `docker stop`, which hangs.
 # The framework sends the second `killTask()` after 20 seconds which 
[interrupts|https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L599-L607]
 the first `docker stop` command.
 # The framework continues to send `killTask()` every 20 seconds, but `docker 
stop` always immediately returns an error: "Error response from daemon: No such 
container: mesos-some-UID".

Since `docker run` 
[hangs|https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L242],
 `reaped()` 
[callback|https://github.com/apache/mesos/blob/master/src/docker/executor.cpp#L664-L693]
 is never called. Thus, the Docker executor gets stuck in an infinite `docker 
stop` loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to