[ https://issues.apache.org/jira/browse/IMPALA-13388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884120#comment-17884120 ]
ASF subversion and git services commented on IMPALA-13388: ---------------------------------------------------------- Commit d2cd9b51a03dbd8b2e485ee446bf7530656ab214 in impala's branch refs/heads/master from wzhou-code [ https://gitbox.apache.org/repos/asf?p=impala.git;h=d2cd9b51a ] IMPALA-13388: fix unit-tests of Statestore HA for UBSAN builds Sometimes in UBSAN builds, unit-tests of Statestore HA failed due to Thrift RPC receiving timeout. Standby statestored failed to send heartbeats to its subscribers so that failover was not triggered. The Thrift RPC failures still happened after increasing TCP timeout for Thrift RPCs between statestored and its subscribers. This patch adds a metric for number of subscribers which recevied heartbeats from statestored in a monitoring period. Unit-tests of Statestored HA for UBSAN build will be skipped if statestored failed to send heartbeats to more than half of subscribers. For other builds, throw exception with error message which complain Thrift RPC failure if statestored failed to send heartbeats to more than half of subscribers. Also fixed a bug which calls SecondsSinceHeartbeat() but compares the retutned value with time value in milli-seconds. Filed following up JIRA IMPALA-13399 to track the very root cause. Testing: - Looped to run test_statestored_ha.py for 100 times in UBSAN build without failed case, but 4 iterations out of 100 have skipped test cases. - Verified that the issue did not happen for ASAN build by running test_statestored_ha.py for 100 times in ASAN build. - Passed core test. Change-Id: Ie59d1e93c635411723f7044da52e4ab19c7d2fac Reviewed-on: http://gerrit.cloudera.org:8080/21820 Reviewed-by: Riza Suminto <riza.sumi...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > TestStatestoredHA.test_statestored_auto_failover failed in UBSAN ARM > -------------------------------------------------------------------- > > Key: IMPALA-13388 > URL: https://issues.apache.org/jira/browse/IMPALA-13388 > Project: IMPALA > Issue Type: Bug > Components: Test > Reporter: Riza Suminto > Assignee: Wenzhe Zhou > Priority: Major > Labels: flaky > Fix For: Impala 4.5.0 > > > TestStatestoredHA.test_statestored_auto_failover failed in UBSAN ARM > environment. Here is the stack trace: > {code:java} > Stacktrace > custom_cluster/test_statestored_ha.py:340: in test_statestored_auto_failover > self.__test_statestored_auto_failover() > custom_cluster/test_statestored_ha.py:259: in __test_statestored_auto_failover > "statestore.active-status", expected_value=True, timeout=120) > common/impala_service.py:144: in wait_for_metric_value > self.__metric_timeout_assert(metric_name, expected_value, timeout) > common/impala_service.py:213: in __metric_timeout_assert > assert 0, assert_string > E AssertionError: Metric statestore.active-status did not reach value True > in 120s. > E Dumping debug webpages in JSON format... > E Dumped memz JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240917_00:07:51/json/memz.json > E Dumped metrics JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240917_00:07:51/json/metrics.json > E Dumped queries JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240917_00:07:51/json/queries.json > E Dumped sessions JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240917_00:07:51/json/sessions.json > E Dumped threadz JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240917_00:07:51/json/threadz.json > E Dumped rpcz JSON to > $IMPALA_HOME/logs/metric_timeout_diags_20240917_00:07:51/json/rpcz.json > E Dumping minidumps for impalads/catalogds... > E Dumped minidump for Impalad PID 3729004 > E Dumped minidump for Impalad PID 3729007 > E Dumped minidump for Impalad PID 3729011 > E Dumped minidump for Catalogd PID 3728915 > {code} > Maybe 120 seconds timeout is not enough in UBSAN ARM environment? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org