Wenzhe Zhou has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/20372 )
Change subject: IMPALA-12156: Support High Availability for Statestore ...................................................................... IMPALA-12156: Support High Availability for Statestore To support statestore HA, we allow two statestored instances in an Active-Passive HA pair to be added to an Impala cluster. We add the preemptive behavior for statestored. When HA is enabled, the preemptive behavior allows the statestored with the higher priority to become active and the paired statestored becomes standby. The active statestored acts as the owner of Impala cluster and provides statestore service for the cluster members. To enable catalog HA for a cluster, two statestoreds in the HA pair and all subscribers must be started with starting flag "enable_statestored_ha" as true. This patch makes following changes: - Defined new service for Statestore HA. - Statestored negotiates the role for HA with its peer statestore instance on startup. - Create HA monitor thread: Active statestored sends heartbeat to standby statestored. Standby statestored monitors peer's connection states with their subscribers. - Standby statestored sends heartbeat to subscribers with request for connection state between active statestore and subscribers. Standby statestored saves the connection state as failure detecter. - When standby statestored lost connection with active statestore, it checks the connection states for active statestore, and takes over active role if majority of subscribers lost connections with active statestore. - New active statestored sends RPC notification to all subscribers for new active statestored and active catalogd elected by the new active statestored. - New active statestored starts sending heartbeat to its peer when it receives handshake from its peer. - Active statestored enters recovery mode if it lost connections with its peer statestored and all subscribers. It keeps sending HA handshake to its peer until receiving response. - All subscribers (impalad/catalogd/admissiond) register to two statestoreds. - Subscribers report connection state for the requests from standby statestore. - Subscribers switch to new active statestore when receiving RPC notifications from new active statestored. - Only active statestored sends topic update messages to subscribers. - Add a new option "enable_statestored_ha" in script bin/start-impala-cluster.py for starting Impala mini-cluster with statestored HA enabled. - Add a new Thrift API in statestore service to disable network for statestored. It's only used for unit-test to simulate network failure. For safety, it's only working when the debug action is set in starting flags. Testings: - Added end-to-end unit tests for statestored HA. - Passed core tests Change-Id: Ibd2c814bbad5c04c1d50c2edaa5b910c82a6fd87 --- M be/generated-sources/gen-cpp/CMakeLists.txt M be/src/catalog/catalog-server.cc M be/src/common/global-flags.cc M be/src/rpc/thrift-server-test.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/scheduling/admissiond-env.cc M be/src/statestore/statestore-service-client-wrapper.h M be/src/statestore/statestore-subscriber-catalog.cc M be/src/statestore/statestore-subscriber-catalog.h M be/src/statestore/statestore-subscriber-client-wrapper.h M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore-test.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M bin/start-impala-cluster.py M common/thrift/StatestoreService.thrift M common/thrift/metrics.json M tests/common/impala_cluster.py M tests/common/impala_service.py A tests/custom_cluster/test_statestored_ha.py 23 files changed, 2,538 insertions(+), 112 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/20372/12 -- To view, visit http://gerrit.cloudera.org:8080/20372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibd2c814bbad5c04c1d50c2edaa5b910c82a6fd87 Gerrit-Change-Number: 20372 Gerrit-PatchSet: 12 Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Reviewer: Yida Wu <wydbaggio...@gmail.com>