[ https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508838#comment-15508838 ]
Greg Mann commented on MESOS-6180: ---------------------------------- Another common error seen when this issue manifests is: {code} Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins {code} See the file {{RoleTest.ImplicitRoleRegister.txt}} for the full test log. [~haosd...@gmail.com], there is a review [here|https://reviews.apache.org/r/41665/] proposing the {{in_memory}} registry for tests. I'm currently trying to figure out whether this is a legitimate bug or simply the result of an unreasonable load put on the machine. > Several tests are flaky, with futures timing out early > ------------------------------------------------------ > > Key: MESOS-6180 > URL: https://issues.apache.org/jira/browse/MESOS-6180 > Project: Mesos > Issue Type: Bug > Components: tests > Reporter: Greg Mann > Assignee: haosdent > Labels: mesosphere, tests > Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, > CGROUPS_ROOT_PidNamespaceForward.log, FetchAndStoreAndStoreAndFetch.log, > RoleTest.ImplicitRoleRegister.txt, > flaky-containerizer-pid-namespace-backward.txt, > flaky-containerizer-pid-namespace-forward.txt > > > Following the merging of a large patch chain, it was noticed on our internal > CI that several tests had become flaky, with a similar pattern in the > failures: the tests fail early when a future times out. Often, this occurs > when a test cluster is being spun up and one of the offer futures times out. > This has been observed in the following tests: > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward > * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward > * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch > * RoleTest.ImplicitRoleRegister > * SlaveRecoveryTest/0.MultipleFrameworks > * SlaveRecoveryTest/0.ReconcileShutdownFramework > * SlaveTest.ContainerizerUsageFailure > * MesosSchedulerDriverTest.ExplicitAcknowledgements > * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164) > * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165) > * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166) > See the linked JIRAs noted above for individual tickets addressing a couple > of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)