Wenzhe Zhou has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20657


Change subject: IMPALA-12525: Fix flaky test test_statestored_manual_failover
......................................................................

IMPALA-12525: Fix flaky test test_statestored_manual_failover

In test_statestored_manual_failover, statestore service failover is not
triggered sometimes when the network of active statestored is disabled
after manually forced failover.
During test, the network of active statestored could be disabled before
all subscribers re-registered with restarted statestored. This caused
not all of subscribers receive the notification of active statestored
change so that they could not correctly report connection states for
the requests from standby statestored.

This patch made following changes:
1) Updated the test case test_statestored_manual_failover to disable
the network of active statestored after all subscribers re-registering
with the restarted statestored.

2) Defined a new mutex active_lock_ in class StatestoreStub to protect
is_active_ since the mutex lock_ could be held for long time if the
subscriber lose the connection with statestored and enter recovery
mode.

3) Found one case that was not handled on Statestore subscribers. The
subscribers could be started before both statestore instances are
ready to accept registraion requests. This caused impalad hit DCHECK.
Changed code to handle this case in this patch.
Added test cases to inject a real delay in statestored startup and
verify impalads and catalogd are able to tolerate this delay.

4) Updated address of active catalogd in the metrics of statestored
after statestore service failover.

Testing:
 - Repeatedly ran test_statestored_manual_failover on Jenkins for
   hundreds of times.
 - Repeatedly ran test_statestored_manual_failover on local machine for
   thousand times without failure.
 - Passed core tests

Change-Id: If03bf09d22a2875d2c1eec8a4f62eeefc5d855dc
---
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-subscriber.h
M be/src/statestore/statestore.cc
M tests/custom_cluster/test_statestored_ha.py
4 files changed, 145 insertions(+), 31 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/20657/4
--
To view, visit http://gerrit.cloudera.org:8080/20657
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If03bf09d22a2875d2c1eec8a4f62eeefc5d855dc
Gerrit-Change-Number: 20657
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>

Reply via email to