Hello David Ribeiro Alves, Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/6170 to look at the new patch set (#26). Change subject: [catalog_manager] categorization of rw operation failures ...................................................................... [catalog_manager] categorization of rw operation failures This changelist introduces the categorization of the system catalog's read and write operation failures which happen on leader post-election callback. There are two categories of errors: fatal and non-fatal. If an operation against the system catalog fails in between terms of the catalog leadership, the error is considered non-fatal. In case of a non-fatal error the leader post-election task bails out: the catalog is no longer the leader at the original term and the task should be executed by the new leader upon ElectedAsLeaderCb. If an operation against the system catalog fails at the same term of the catalog leadership, the error is considered fatal and that causes the master process to crash. This is to avoid possible inconsistency when working with the tables/tablets metadata, the IPKI certificate authority information and TSKs (Token Signing Keys). Any failure of a read or write operation against the system catalog happened during the catalog's shutdown is ignored and the leader post-election task bails out once detecting such failure. The same policy applies to other (i.e. not specific to read and write operations against the system catalog) errors which might happen while working with the IPKI certificate authority information and TokenSigner. The rationale is the same as for handling the system catalog operation failures: in case of an error, the leader has no consistent information to work with, meanwhile a non-leader does not use the information affected by the failure at all and can safely ignore the error. Added a test to verify that the master server does not crash if change of leadership detected while trying to persist a newly generated TSK (Token Signing Key). Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8 --- M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/catalog_manager_tsk-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master-test.cc M src/kudu/master/master_service.cc M src/kudu/master/sys_catalog-test.cc 7 files changed, 486 insertions(+), 160 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/6170/26 -- To view, visit http://gerrit.cloudera.org:8080/6170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8 Gerrit-PatchSet: 26 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org>