[jira] [Updated] (MESOS-6743) Docker executor hangs forever if `docker stop` fails.

Alexander Rukletsov (JIRA) Thu, 10 Aug 2017 15:27:14 -0700

     [ 
https://issues.apache.org/jira/browse/MESOS-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexander Rukletsov updated MESOS-6743:
---------------------------------------
    Fix Version/s: 1.1.3

{noformat}
Commit: 4d2afc50c88afff1c197720fa507637def4d2f20 [4d2afc5]
Author: Andrei Budnik [email protected]
Date: 10 August 2017 at 18:52:51 GMT+2
Committer: Alexander Rukletsov [email protected]
Commit Date: 10 August 2017 at 22:46:35 GMT+2

Added logging in docker executor on docker stop failure.

Review: https://reviews.apache.org/r/61435/
{noformat}
{noformat}
Commit: 06dcbd7b7c876a1f90934a679e2514d012df4d37 [06dcbd7]
Author: Andrei Budnik [email protected]
Date: 10 August 2017 at 18:53:03 GMT+2
Committer: Alexander Rukletsov [email protected]
Commit Date: 10 August 2017 at 22:46:35 GMT+2

Enabled retries for killTask in docker executor.

Previously, after docker stop command failure, docker executor
neither allowed a scheduler to retry killTask command, nor retried
killTask when task kill was triggered by a failed health check.

Review: https://reviews.apache.org/r/61530/
{noformat}

> Docker executor hangs forever if `docker stop` fails.
> -----------------------------------------------------
>
>                 Key: MESOS-6743
>                 URL: https://issues.apache.org/jira/browse/MESOS-6743
>             Project: Mesos
>          Issue Type: Bug
>          Components: docker
>    Affects Versions: 1.0.1, 1.1.0, 1.2.1, 1.3.0
>            Reporter: Alexander Rukletsov
>            Assignee: Andrei Budnik
>            Priority: Critical
>              Labels: mesosphere, reliability
>             Fix For: 1.1.3
>
>
> If {{docker stop}} finishes with an error status, the executor should catch 
> this and react instead of indefinitely waiting for {{reaped}} to return.
> An interesting question is _how_ to react. Here are possible solutions.
> 1. Retry {{docker stop}}. In this case it is unclear how many times to retry 
> and what to do if {{docker stop}} continues to fail.
> 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. 
> However, in this case it is unclear what status updates we should send: 
> {{TASK_KILLING}} for every kill retry? an extra update when we failed to kill 
> a task? or set a specific reason in {{TASK_KILLING}}?
> 3. Clean up and exit. In this case we should make sure the task container is 
> killed or notify the framework and the operator that the container may still 
> be running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (MESOS-6743) Docker executor hangs forever if `docker stop` fails.

Reply via email to