Greg Mann created MESOS-8538: -------------------------------- Summary: Consider adding a timeout to Docker executor task launch Key: MESOS-8538 URL: https://issues.apache.org/jira/browse/MESOS-8538 Project: Mesos Issue Type: Improvement Reporter: Greg Mann
In order to be more resilient to an unresponsive Docker daemon on an agent, the Docker executor could utilize a timeout for its task launches. If its initial {{docker inspect}} call fails to return within this timeout, the executor could commit suicide. However, we must be careful to properly clean up in such a case. For example, if the executor's {{docker run}} command was successful, but then {{docker inspect}} failed to return, we would want to be sure that the Docker containerizer would destroy the running container in this case. Furthermore, it's possible that it could lead to a state where the executor terminates, then a TASK_FAILED is forwarded to the master, but the task container continues to run on the agent until the daemon becomes responsive again. If a launch timeout is implemented, care should be taken to avoid such inconsistent states. -- This message was sent by Atlassian JIRA (v7.6.3#76005)