Hello Tidy Bot, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14111

to look at the new patch set (#3).

Change subject: KUDU-2069 pt 1: add a maintenance mode
......................................................................

KUDU-2069 pt 1: add a maintenance mode

When tablet server T is put in maintenance mode, replicas will not be
placed onto T, and failures of T will not be considered when determining
whether a given tablet is under-replicated.

This patch adds this mode with the following changes:
- A new master-side endpoint that enters maintenance mode is added:
  - It plumbs in-memory maintenance states through the TSManager and the
    TSDescriptors.
  - It also writes a new kind of entry in the master system catalog for
    maintenance states (for now, there's only maintenance mode, but this
    could be useful for decommissioning).
    - When a master becomes leader, it scans the on-disk state and
      rebuilds the in-memory maintenance state.
- When determining whether a replica needs to be added, we may now
  consider a "whitelist" of UUIDs that can be in a bad state while not
  counting towards being under-replicated.
- When determining where to place new replicas, tablet servers in
  maintenance mode are not considered.
- The same master-side endpoint is used to exit maintenance mode.
  - To ensure that replicas that actually need to be replicated get
    replicated upon finishing maintenance mode, when a tablet server is
    removed from maintenance mode, the master will mark all tablet
    servers as needing a full tablet report, triggering re-processing of
    tablet reports.

This patch only introduces the master endpoints and the underlying
behavior. A later patch will introduce a way to set maintenance mode via
CLI.

I considered implementing maintenance mode by blocking master->tserver
RPCs, but opted to use this approach since it seems more intuitive for
the stopping of replica movement to exist in placement logic, (i.e. what
servers are available to host new replicas and what replicas needs to
be replaced), rather than the placement mechanism, (i.e. the handful of
RPCs that would need to be considered).

Change-Id: Ia857668b87560cdd451c2e7f90d72f8217ba5a4b
---
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/quorum_util-test.cc
M src/kudu/consensus/quorum_util.cc
M src/kudu/consensus/quorum_util.h
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/maintenance_mode-itest.cc
M src/kudu/master/CMakeLists.txt
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
A src/kudu/master/maintenance_state-test.cc
M src/kudu/master/master.proto
M src/kudu/master/master_service.cc
M src/kudu/master/master_service.h
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
M src/kudu/master/ts_descriptor-test.cc
M src/kudu/master/ts_descriptor.cc
M src/kudu/master/ts_descriptor.h
M src/kudu/master/ts_manager.cc
M src/kudu/master/ts_manager.h
20 files changed, 1,090 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/14111/3
--
To view, visit http://gerrit.cloudera.org:8080/14111
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia857668b87560cdd451c2e7f90d72f8217ba5a4b
Gerrit-Change-Number: 14111
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

Reply via email to