[ https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496966#comment-15496966 ]
Vinod Kone commented on MESOS-6180: ----------------------------------- Looking at `CGROUPS_ROOT_PidNamespaceForward` the TASK_LOST is expected because the test doesn't wait for TASK_RUNNING update before terminating the agent. {quote} Future<Message> registerExecutorMessage = FUTURE_MESSAGE(Eq(RegisterExecutorMessage().GetTypeName()), _, _); driver.launchTasks(offers1.get()[0].id(), {task1}); AWAIT_READY(registerExecutorMessage); Future<hashset<ContainerID>> containers = containerizer->containers(); AWAIT_READY(containers); EXPECT_EQ(1u, containers.get().size()); ContainerID containerId = *(containers.get().begin()); // Stop the slave. slave.get()->terminate(); {quote} > Several tests are flaky, with futures timing out early > ------------------------------------------------------ > > Key: MESOS-6180 > URL: https://issues.apache.org/jira/browse/MESOS-6180 > Project: Mesos > Issue Type: Bug > Components: tests > Reporter: Greg Mann > Assignee: haosdent > Labels: mesosphere, tests > Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, > CGROUPS_ROOT_PidNamespaceForward.log, FetchAndStoreAndStoreAndFetch.log > > > Following the merging of a large patch chain, it was noticed on our internal > CI that several tests had become flaky, with a similar pattern in the > failures: the tests fail early when a future times out. Often, this occurs > when a test cluster is being spun up and one of the offer futures times out. > This has been observed in the following tests: > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward > * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch > * RoleTest.ImplicitRoleRegister > * SlaveRecoveryTest/0.MultipleFrameworks > * SlaveRecoveryTest/0.ReconcileShutdownFramework > * SlaveTest.ContainerizerUsageFailure > * MesosSchedulerDriverTest.ExplicitAcknowledgements > * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164) > * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165) > * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166) > See the linked JIRAs noted above for individual tickets addressing a couple > of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)