[ https://issues.apache.org/jira/browse/MESOS-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036465#comment-15036465 ]
Joseph Wu commented on MESOS-4047: ---------------------------------- Note: {{MesosContainerizerSlaveRecoveryTest.ResourceStatistics}} has similar logic for restarting the agent, re-registering an executor, and [calling {{MesosContainerizer::usage}}|https://github.com/apache/mesos/blob/master/src/tests/slave_recovery_tests.cpp#L3267]. But this test is stable. The flaky test waits on: {code} Future<Nothing> _recover = FUTURE_DISPATCH(_, &Slave::_recover); Future<SlaveReregisteredMessage> slaveReregisteredMessage = FUTURE_PROTOBUF(SlaveReregisteredMessage(), _, _); {code} Whereas the stable test waits on: {code} // Set up so we can wait until the new slave updates the container's // resources (this occurs after the executor has re-registered). Future<Nothing> update = FUTURE_DISPATCH(_, &MesosContainerizerProcess::update); {code} > MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky > ----------------------------------------------------------- > > Key: MESOS-4047 > URL: https://issues.apache.org/jira/browse/MESOS-4047 > Project: Mesos > Issue Type: Bug > Components: test > Affects Versions: 0.26.0 > Environment: Ubuntu 14, gcc 4.8.4 > Reporter: Joseph Wu > Labels: flaky, flaky-test > > {code:title=Output from passed test} > [----------] 1 test from MemoryPressureMesosTest > 1+0 records in > 1+0 records out > 1048576 bytes (1.0 MB) copied, 0.000430889 s, 2.4 GB/s > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery > I1202 11:09:14.319327 5062 exec.cpp:134] Version: 0.27.0 > I1202 11:09:14.333317 5079 exec.cpp:208] Executor registered on slave > bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0 > Registered executor on ubuntu > Starting task 4e62294c-cfcf-4a13-b699-c6a4b7ac5162 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 5085 > I1202 11:09:14.391739 5077 exec.cpp:254] Received reconnect request from > slave bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0 > I1202 11:09:14.398598 5082 exec.cpp:231] Executor re-registered on slave > bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0 > Re-registered executor on ubuntu > Shutting down > Sending SIGTERM to process tree at pid 5085 > Killing the following process trees: > [ > -+- 5085 sh -c while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done > \--- 5086 dd count=512 bs=1M if=/dev/zero of=./temp > ] > [ OK ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (1096 ms) > {code} > {code:title=Output from failed test} > [----------] 1 test from MemoryPressureMesosTest > 1+0 records in > 1+0 records out > 1048576 bytes (1.0 MB) copied, 0.000404489 s, 2.6 GB/s > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery > I1202 11:09:15.509950 5109 exec.cpp:134] Version: 0.27.0 > I1202 11:09:15.568183 5123 exec.cpp:208] Executor registered on slave > 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0 > Registered executor on ubuntu > Starting task 14b6bab9-9f60-4130-bdc4-44efba262bc6 > Forked command at 5132 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > I1202 11:09:15.665498 5129 exec.cpp:254] Received reconnect request from > slave 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0 > I1202 11:09:15.670995 5123 exec.cpp:381] Executor asked to shutdown > Shutting down > Sending SIGTERM to process tree at pid 5132 > ../../src/tests/containerizer/memory_pressure_tests.cpp:283: Failure > (usage).failure(): Unknown container: ebe90e15-72fa-4519-837b-62f43052c913 > *** Aborted at 1449083355 (unix time) try "date -d @1449083355" if you are > using GNU date *** > {code} > Notice that in the failed test, the executor is asked to shutdown when it > tries to reconnect to the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)