[ https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209541#comment-16209541 ]
Andrei Budnik commented on MESOS-7506: -------------------------------------- All failing tests have the same error message in logs like: {{E0922 00:38:40.509032 31034 slave.cpp:5398] Termination of executor '1' of framework 83bd1613-70d9-4c3e-b490-4aa60dd26e22-0000 failed: Failed to kill all processes in the container: Timed out after 1mins}} The container termination future is triggered by [MesosContainerizerProcess::___destroy|https://github.com/apache/mesos/blob/b361801f2c78043459199dab3e0defe9a0b4c1aa/src/slave/containerizer/mesos/containerizer.cpp#L2361]. Agent subscribes to this future by calling [containerizer->wait()|https://github.com/apache/mesos/blob/b361801f2c78043459199dab3e0defe9a0b4c1aa/src/slave/slave.cpp#L5280]. Triggering this future leads to calling of {{Slave::executorTerminated}}, which sends {{TASK_FAILED}} status update. Typical test (e.g. {{SlaveTest.ShutdownUnregisteredExecutor}}) waits for {code} // Ensure that the slave times out and kills the executor. Future<Nothing> destroyExecutor = FUTURE_DISPATCH(_, &MesosContainerizerProcess::destroy); {code} After that, the test waits for {{TASK_FAILED}} status update. So, this test completes successfully and slave's destructor is called, [which fails|https://github.com/apache/mesos/blob/b361801f2c78043459199dab3e0defe9a0b4c1aa/src/tests/cluster.cpp#L580], because {{MesosContainerizerProcess::___destroy}} doesn't erase container from the hashmap. > Multiple tests leave orphan containers. > --------------------------------------- > > Key: MESOS-7506 > URL: https://issues.apache.org/jira/browse/MESOS-7506 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: Ubuntu 16.04 > Fedora 23 > other Linux distros > Reporter: Alexander Rukletsov > Assignee: Andrei Budnik > Labels: containerizer, flaky-test, mesosphere > > I've observed a number of flaky tests that leave orphan containers upon > cleanup. A typical log looks like this: > {noformat} > ../../src/tests/cluster.cpp:580: Failure > Value of: containers->empty() > Actual: false > Expected: true > Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 } > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)