[
https://issues.apache.org/jira/browse/MESOS-9920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897907#comment-16897907
]
Chun-Hung Hsiao commented on MESOS-9920:
----------------------------------------
A possible "fix" would be:
{noformat}
--- a/src/tests/slave_tests.cpp
+++ b/src/tests/slave_tests.cpp
@@ -11609,7 +11609,10 @@ TEST_F(
Future<UpdateSlaveMessage> updateSlaveMessage =
FUTURE_PROTOBUF(UpdateSlaveMessage(), _, _);
+ // Set the executor reregister timeout to a value greater than the default so
+ // the test is less likely to fail on slow CI machines. See MESOS-9920.
slave::Flags slaveFlags = CreateSlaveFlags();
+ slaveFlags.executor_reregistration_timeout = process::TEST_AWAIT_TIMEOUT;
// Use the same process ID so the executor can resubscribe.
string processId = process::ID::generate("slave");
{noformat}
But this would make this test super slow. Needs more thinking.
Advancing the clock after the executor subscribes won't repro MESOS-9711 so not
an option.
> Test `SlaveTest.AgentFailoverHTTPExecutorUsingResourceProviderResources` is
> flaky.
> ----------------------------------------------------------------------------------
>
> Key: MESOS-9920
> URL: https://issues.apache.org/jira/browse/MESOS-9920
> Project: Mesos
> Issue Type: Bug
> Components: test
> Reporter: Chun-Hung Hsiao
> Priority: Major
> Labels: flaky-test
> Attachments: consoleText.txt
>
>
> The test is flaky because the default executor reregistration timeout is 2
> seconds, which is too short on a slow computer: once the clock is resumed in
> the test, if the executor does not register within 2 seconds, the agent will
> ask the executor to shutdown itself.
> Full log attached.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)