[ https://issues.apache.org/jira/browse/MESOS-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506706#comment-16506706 ]
Chun-Hung Hsiao commented on MESOS-6231: ---------------------------------------- Observed a 1-hr hang: [^consoleText.txt] > Scheduler driver metrics can hang Metrics() in tests > ---------------------------------------------------- > > Key: MESOS-6231 > URL: https://issues.apache.org/jira/browse/MESOS-6231 > Project: Mesos > Issue Type: Bug > Components: test > Reporter: Neil Conway > Priority: Major > Labels: mesosphere > Attachments: consoleText.txt > > > * {{SchedulerProcess}} has a field, {{metrics}}, whose constructor registers > two metrics, {{event_queue_messages}} and {{event_queue_dispatches}}. > * These metrics are implemented by {{defer}}'ing a message to > {{SchedulerProcess}}. > * If {{MesosSchedulerDriver}} is started and then stopped (but not > destructed), {{SchedulerProcess}} is terminated but not destroyed. > Hence, if a scheduler driver is started and then stopped, fetching the metric > will hang. This means a test case that fetches {{Metrics()}} after stopping a > scheduler driver will hang. > For example, the following patch will hang > {{SlaveTest.MetricsSlaveLaunchErrors}}. > {noformat} > diff --git a/src/tests/slave_tests.cpp b/src/tests/slave_tests.cpp > index 3471314..f323bb9 100644 > --- a/src/tests/slave_tests.cpp > +++ b/src/tests/slave_tests.cpp > @@ -1408,12 +1408,12 @@ TEST_F(SlaveTest, MetricsSlaveLaunchErrors) > AWAIT_READY(failureUpdate); > ASSERT_EQ(TASK_FAILED, failureUpdate.get().state()); > + driver.stop(); > + driver.join(); > + > // After failure injection, metrics should report a single failure. > snapshot = Metrics(); > EXPECT_EQ(1, snapshot.values["slave/container_launch_errors"]); > - > - driver.stop(); > - driver.join(); > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)