Marta Kuczora created HIVE-14839:
------------------------------------
Summary: Improve the stability of TestSessionManagerMetrics
Key: HIVE-14839
URL: https://issues.apache.org/jira/browse/HIVE-14839
Project: Hive
Issue Type: Bug
Components: Test
Affects Versions: 2.1.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora
Priority: Minor
The TestSessionManagerMetrics fails occasionally with the following error:
{noformat}
org.junit.ComparisonFailure: expected:<[0]> but was:<[1]>
at
org.apache.hive.service.cli.session.TestSessionManagerMetrics.testThreadPoolMetrics(TestSessionManagerMetrics.java:98)
Failed tests:
TestSessionManagerMetrics.testThreadPoolMetrics:98 expected:<[0]> but
was:<[1]>
{noformat}
This test starts four background threads with a "wait" call in their run
method. The threads are using the common "barrier" object as lock.
The expected behaviour is that two threads will be in the async pool (because
the hive.server2.async.exec.threads is set to 2) and the other two thread will
be waiting in the queue. This condition is checked like this:
{noformat}
MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.GAUGE,
MetricsConstant.EXEC_ASYNC_POOL_SIZE, 2);
MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.GAUGE,
MetricsConstant.EXEC_ASYNC_QUEUE_SIZE, 2);
{noformat}
Then a notifyAll is called on the lock object, so the two threads in the pool
should "wake up" and complete and the other two threads should go from the
queue to the pool. This is checked like this in the test:
{noformat}
MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.GAUGE,
MetricsConstant.EXEC_ASYNC_POOL_SIZE, 2);
MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.GAUGE,
MetricsConstant.EXEC_ASYNC_QUEUE_SIZE, 0);
{noformat}
There are two use cases which can cause error in this test:
# The notifyAll call happens before both threads in the pool are up and running
and in the "wait" phase.
In this case the thread which is not up in time will stuck in the pool, so the
other two threads can not move from the queue to the pool.
# After the notifyAll call, the threads in the pool "wake up" with some delay.
So they don't complete and removed from the pool and the other two threads are
not moved from the queue to the pool until the metrics are checked. Therefore
the check fails, since the queue is not empty.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)