[ https://issues.apache.org/jira/browse/HIVE-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562282#comment-15562282 ]
Aihua Xu commented on HIVE-14839: --------------------------------- +1. The patch looks good to me. > Improve the stability of TestSessionManagerMetrics > -------------------------------------------------- > > Key: HIVE-14839 > URL: https://issues.apache.org/jira/browse/HIVE-14839 > Project: Hive > Issue Type: Bug > Components: Test > Affects Versions: 2.1.0 > Reporter: Marta Kuczora > Assignee: Marta Kuczora > Priority: Minor > Attachments: HIVE-14839.patch > > > The TestSessionManagerMetrics fails occasionally with the following error: > {noformat} > org.junit.ComparisonFailure: expected:<[0]> but was:<[1]> > at > org.apache.hive.service.cli.session.TestSessionManagerMetrics.testThreadPoolMetrics(TestSessionManagerMetrics.java:98) > Failed tests: > TestSessionManagerMetrics.testThreadPoolMetrics:98 expected:<[0]> but > was:<[1]> > {noformat} > This test starts four background threads with a "wait" call in their run > method. The threads are using the common "barrier" object as lock. > The expected behaviour is that two threads will be in the async pool (because > the hive.server2.async.exec.threads is set to 2) and the other two thread > will be waiting in the queue. This condition is checked like this: > {noformat} > MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.GAUGE, > MetricsConstant.EXEC_ASYNC_POOL_SIZE, 2); > MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.GAUGE, > MetricsConstant.EXEC_ASYNC_QUEUE_SIZE, 2); > {noformat} > > Then a notifyAll is called on the lock object, so the two threads in the pool > should "wake up" and complete and the other two threads should go from the > queue to the pool. This is checked like this in the test: > {noformat} > MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.GAUGE, > MetricsConstant.EXEC_ASYNC_POOL_SIZE, 2); > MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.GAUGE, > MetricsConstant.EXEC_ASYNC_QUEUE_SIZE, 0); > {noformat} > > There are two use cases which can cause error in this test: > # The notifyAll call happens before both threads in the pool are up and > running and in the "wait" phase. > In this case the thread which is not up in time will stuck in the pool, so > the other two threads can not move from the queue to the pool. > # After the notifyAll call, the threads in the pool "wake up" with some > delay. So they don't complete and removed from the pool and the other two > threads are not moved from the queue to the pool until the metrics are > checked. Therefore the check fails, since the queue is not empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)