[
https://issues.apache.org/jira/browse/MESOS-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813390#comment-16813390
]
Andrei Budnik commented on MESOS-9709:
--------------------------------------
It's a Linux kernel bug: [https://github.com/lxc/lxc/issues/2141]
> Docker executor can become stuck terminating
> --------------------------------------------
>
> Key: MESOS-9709
> URL: https://issues.apache.org/jira/browse/MESOS-9709
> Project: Mesos
> Issue Type: Bug
> Components: containerization
> Affects Versions: 1.8.0
> Reporter: Greg Mann
> Priority: Major
> Labels: containerization, mesosphere
> Attachments: docker-executor-stuck.txt
>
>
> See attached agent log; the executor container ID is
> {{d2bfec33-f6bd-44ee-9345-b5710780bb59}} and the executor ID contains the
> string {{819f7ef7-4f42-11e9-a566-72ec67496045}}.
> After launching the executor, we see
> {code}
> Mar 29 18:23:36 int-mountvolumeagent9-soak113s.testing.mesosphe.re
> mesos-agent[10238]: I0329 18:23:36.967316 10257 slave.cpp:3550] Launching
> container d2bfec33-f6bd-44ee-9345-b5710780bb59 for executor
> 'datastax-dse.instance-819f7ef7-4f42-11e9-a566-72ec67496045._app.339' of
> framework a221eeb3-b9c0-4e92-ae20-1e1d4af25321-0000
> Mar 29 18:23:36 int-mountvolumeagent9-soak113s.testing.mesosphe.re
> mesos-agent[10238]: I0329 18:23:36.968968 10253 docker.cpp:1161] No container
> info found, skipping launch
> {code}
> I'm not sure why the container info was not set. Once the executor
> reregistration timeout elapses, the agent attempts to terminate the executor
> but it does not seem to be successful. The scheduler continues to try to kill
> the task but we repeatedly see
> {code}
> Mar 29 18:35:19 int-mountvolumeagent9-soak113s.testing.mesosphe.re
> mesos-agent[10238]: W0329 18:35:19.855063 10253 slave.cpp:3823] Ignoring kill
> task datastax-dse.instance-819f7ef7-4f42-11e9-a566-72ec67496045._app.339
> because the executor
> 'datastax-dse.instance-819f7ef7-4f42-11e9-a566-72ec67496045._app.339' of
> framework a221eeb3-b9c0-4e92-ae20-1e1d4af25321-0000 is terminating
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)