Greg Mann created MESOS-8574: -------------------------------- Summary: Docker executor makes no progress when 'docker inspect' hangs Key: MESOS-8574 URL: https://issues.apache.org/jira/browse/MESOS-8574 Project: Mesos Issue Type: Improvement Components: docker, executor Affects Versions: 1.5.0 Reporter: Greg Mann
In the Docker executor, many calls later in the executor's lifecycle are gated on an initial {{docker inspect}} call returning: https://github.com/apache/mesos/blob/bc6b61bca37752689cffa40a14c53ad89f24e8fc/src/docker/executor.cpp#L223 If that first call to {{docker inspect}} never returns, the executor becomes stuck in a state where it makes no progress and cannot be killed. It's tempting for the executor to simply commit suicide after a timeout, but we must be careful of the case in which the executor's Docker container is actually running successfully, but the Docker daemon is unresponsive. In such a case, we do not want to send TASK_FAILED or TASK_KILLED if the task's container is running successfully. -- This message was sent by Atlassian JIRA (v7.6.3#76005)