Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14111 to look at the new patch set (#3). Change subject: KUDU-2069 pt 1: add a maintenance mode ...................................................................... KUDU-2069 pt 1: add a maintenance mode When tablet server T is put in maintenance mode, replicas will not be placed onto T, and failures of T will not be considered when determining whether a given tablet is under-replicated. This patch adds this mode with the following changes: - A new master-side endpoint that enters maintenance mode is added: - It plumbs in-memory maintenance states through the TSManager and the TSDescriptors. - It also writes a new kind of entry in the master system catalog for maintenance states (for now, there's only maintenance mode, but this could be useful for decommissioning). - When a master becomes leader, it scans the on-disk state and rebuilds the in-memory maintenance state. - When determining whether a replica needs to be added, we may now consider a "whitelist" of UUIDs that can be in a bad state while not counting towards being under-replicated. - When determining where to place new replicas, tablet servers in maintenance mode are not considered. - The same master-side endpoint is used to exit maintenance mode. - To ensure that replicas that actually need to be replicated get replicated upon finishing maintenance mode, when a tablet server is removed from maintenance mode, the master will mark all tablet servers as needing a full tablet report, triggering re-processing of tablet reports. This patch only introduces the master endpoints and the underlying behavior. A later patch will introduce a way to set maintenance mode via CLI. I considered implementing maintenance mode by blocking master->tserver RPCs, but opted to use this approach since it seems more intuitive for the stopping of replica movement to exist in placement logic, (i.e. what servers are available to host new replicas and what replicas needs to be replaced), rather than the placement mechanism, (i.e. the handful of RPCs that would need to be considered). Change-Id: Ia857668b87560cdd451c2e7f90d72f8217ba5a4b --- M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/quorum_util-test.cc M src/kudu/consensus/quorum_util.cc M src/kudu/consensus/quorum_util.h M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/maintenance_mode-itest.cc M src/kudu/master/CMakeLists.txt M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h A src/kudu/master/maintenance_state-test.cc M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/master/sys_catalog.cc M src/kudu/master/sys_catalog.h M src/kudu/master/ts_descriptor-test.cc M src/kudu/master/ts_descriptor.cc M src/kudu/master/ts_descriptor.h M src/kudu/master/ts_manager.cc M src/kudu/master/ts_manager.h 20 files changed, 1,090 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/14111/3 -- To view, visit http://gerrit.cloudera.org:8080/14111 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia857668b87560cdd451c2e7f90d72f8217ba5a4b Gerrit-Change-Number: 14111 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)