[ https://issues.apache.org/jira/browse/MESOS-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Rukletsov updated MESOS-6743: --------------------------------------- Fix Version/s: 1.1.3 {noformat} Commit: 4d2afc50c88afff1c197720fa507637def4d2f20 [4d2afc5] Author: Andrei Budnik abud...@mesosphere.com Date: 10 August 2017 at 18:52:51 GMT+2 Committer: Alexander Rukletsov al...@apache.org Commit Date: 10 August 2017 at 22:46:35 GMT+2 Added logging in docker executor on docker stop failure. Review: https://reviews.apache.org/r/61435/ {noformat} {noformat} Commit: 06dcbd7b7c876a1f90934a679e2514d012df4d37 [06dcbd7] Author: Andrei Budnik abud...@mesosphere.com Date: 10 August 2017 at 18:53:03 GMT+2 Committer: Alexander Rukletsov al...@apache.org Commit Date: 10 August 2017 at 22:46:35 GMT+2 Enabled retries for killTask in docker executor. Previously, after docker stop command failure, docker executor neither allowed a scheduler to retry killTask command, nor retried killTask when task kill was triggered by a failed health check. Review: https://reviews.apache.org/r/61530/ {noformat} > Docker executor hangs forever if `docker stop` fails. > ----------------------------------------------------- > > Key: MESOS-6743 > URL: https://issues.apache.org/jira/browse/MESOS-6743 > Project: Mesos > Issue Type: Bug > Components: docker > Affects Versions: 1.0.1, 1.1.0, 1.2.1, 1.3.0 > Reporter: Alexander Rukletsov > Assignee: Andrei Budnik > Priority: Critical > Labels: mesosphere, reliability > Fix For: 1.1.3 > > > If {{docker stop}} finishes with an error status, the executor should catch > this and react instead of indefinitely waiting for {{reaped}} to return. > An interesting question is _how_ to react. Here are possible solutions. > 1. Retry {{docker stop}}. In this case it is unclear how many times to retry > and what to do if {{docker stop}} continues to fail. > 2. Unmark task as {{killed}}. This will allow frameworks to retry the kill. > However, in this case it is unclear what status updates we should send: > {{TASK_KILLING}} for every kill retry? an extra update when we failed to kill > a task? or set a specific reason in {{TASK_KILLING}}? > 3. Clean up and exit. In this case we should make sure the task container is > killed or notify the framework and the operator that the container may still > be running. -- This message was sent by Atlassian JIRA (v6.4.14#64029)