[jira] [Commented] (MESOS-9920) Test `SlaveTest.AgentFailoverHTTPExecutorUsingResourceProviderResources` is flaky.

Chun-Hung Hsiao (JIRA) Thu, 01 Aug 2019 02:18:09 -0700


    [ 
https://issues.apache.org/jira/browse/MESOS-9920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897907#comment-16897907
 ]


Chun-Hung Hsiao commented on MESOS-9920:
----------------------------------------

A possible "fix" would be:
{noformat}
--- a/src/tests/slave_tests.cpp
+++ b/src/tests/slave_tests.cpp
@@ -11609,7 +11609,10 @@ TEST_F(
   Future<UpdateSlaveMessage> updateSlaveMessage =
     FUTURE_PROTOBUF(UpdateSlaveMessage(), _, _);

+  // Set the executor reregister timeout to a value greater than the default so
+  // the test is less likely to fail on slow CI machines. See MESOS-9920.
   slave::Flags slaveFlags = CreateSlaveFlags();
+  slaveFlags.executor_reregistration_timeout = process::TEST_AWAIT_TIMEOUT;

   // Use the same process ID so the executor can resubscribe.
   string processId = process::ID::generate("slave");
{noformat}
But this would make this test super slow. Needs more thinking.
Advancing the clock after the executor subscribes won't repro MESOS-9711 so not 
an option.

> Test `SlaveTest.AgentFailoverHTTPExecutorUsingResourceProviderResources` is 
> flaky.
> ----------------------------------------------------------------------------------
>
>                 Key: MESOS-9920
>                 URL: https://issues.apache.org/jira/browse/MESOS-9920
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Chun-Hung Hsiao
>            Priority: Major
>              Labels: flaky-test
>         Attachments: consoleText.txt
>
>
> The test is flaky because the default executor reregistration timeout is 2 
> seconds, which is too short on a slow computer: once the clock is resumed in 
> the test, if the executor does not register within 2 seconds, the agent will 
> ask the executor to shutdown itself.
> Full log attached.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (MESOS-9920) Test `SlaveTest.AgentFailoverHTTPExecutorUsingResourceProviderResources` is flaky.

Reply via email to