Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/6134 to look at the new patch set (#4). Change subject: [catalog manager] fixed deadlock on catalog shutdown ...................................................................... [catalog manager] fixed deadlock on catalog shutdown Fixed deadlock on system catalog manager shutdown in case of multi-master Kudu cluster. Prior to the fix, the leader master often hung in its 'elected-as-a-leader' callback while trying to write into the system table. It was awaiting for completion of the system table operations, but those were retried indefinitely since system catalog table's Raft quorum was not available (other masters were shutdown). Prior to the fix, the deadlock happened pretty often while running the master_MasterReplicationTest.TestCycleThroughAllMasters scenario in master_replication-itest (DEBUG build). This bug manifested itself in other tests where multi-master Kudu mini-cluster is used. The problem manifested itself the following way: after outputting something like I0224 18:25:16.760793 1964126208 raft_consensus.cc:1569] T 00000000000000000000000000000000 P bd5cf976e19f4843b81cd02f14c6c87a [term 1 FOLLOWER]: Raft consensus shutting down. I0224 18:25:16.760815 1964126208 raft_consensus.cc:1585] T 00000000000000000000000000000000 P bd5cf976e19f4843b81cd02f14c6c87a [term 1 FOLLOWER]: Raft consensus is shut down! I0224 18:25:16.773479 1964126208 master.cc:214] Master@127.0.0.1:11011 shutdown complete. I0224 18:25:16.774673 1964126208 master.cc:210] Master@127.0.0.1:11012 shutting down... the test continued to run indefinitely, spitting messages like W0224 18:25:21.246805 62234624 consensus_peers.cc:357] T 00000000000000000000000000000000 P 51eb32e67c014327b965ae3e6f4993e1 -> Peer 14cb97657cb4407fab1ce3e097d7a71b (127.0.0.1:11010): Couldn't send request to peer 14cb97657cb4407fab1ce3e097d7a71b for tablet 00000000000000000000000000000000. Status: Network error: Client connection negotiation failed: client connection to 127.0.0.1:11010: connect: Connection refused (error 61). Retrying in the next heartbeat period. Already tried 14 times. Change-Id: I10ad66fe33d4696adf2a02a09e2790afa8869583 --- M src/kudu/master/catalog_manager.cc 1 file changed, 49 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/6134/4 -- To view, visit http://gerrit.cloudera.org:8080/6134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I10ad66fe33d4696adf2a02a09e2790afa8869583 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org>