Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8764 )
Change subject: KUDU-1097: add test for replacement replica after cluster restart ...................................................................... Patch Set 1: (10 comments) http://gerrit.cloudera.org:8080/#/c/8764/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8764/1//COMMIT_MSG@7 PS1, Line 7: > nit: add test for Done http://gerrit.cloudera.org:8080/#/c/8764/1//COMMIT_MSG@12 PS1, Line 12: promoted > are promoted Done http://gerrit.cloudera.org:8080/#/c/8764/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc File src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc: http://gerrit.cloudera.org:8080/#/c/8764/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1512 PS1, Line 1512: > nit: the Done http://gerrit.cloudera.org:8080/#/c/8764/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1515 PS1, Line 1515: promoted > nit: are promoted Done http://gerrit.cloudera.org:8080/#/c/8764/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1524 PS1, Line 1524: WhileReplacing > naming nit: how about: WithNonVoter, since replacing is an overloaded term Good idea! Done. http://gerrit.cloudera.org:8080/#/c/8764/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1535 PS1, Line 1535: // Need some extra tablet servers to ensure the catalog manager does not : // replace almost all the replicas, even if it could. : FLAGS_num_tablet_servers = 2 * kReplicasNum + 1; > I wonder if instead of doing this, we could just shut down the master after Yes, we could. However, I wanted to make sure the catalog manager is not about to replace everything it could. For the current scenario, it should not try to do so since only the leader replica and the newly added non-voter are on-line while tablet copy is in progress. http://gerrit.cloudera.org:8080/#/c/8764/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1607 PS1, Line 1607: const auto it = tablet_servers_.find(new_replica_uuid); : ASSERT_NE(tablet_servers_.end(), it); : TServerDetails* ts_new_replica = it->second; > nit: can replace this with: Done http://gerrit.cloudera.org:8080/#/c/8764/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1618 PS1, Line 1618: e.first == tablet_id_ > nit: perhaps make this a DCHECK or even assert on it outside the loop? This Done http://gerrit.cloudera.org:8080/#/c/8764/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1655 PS1, Line 1655: //NO_FATALS(WaitForReplicasAndUpdateLocations(table_->name())); > ? It's a left-over from one of the previous versions; removed. http://gerrit.cloudera.org:8080/#/c/8764/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1670 PS1, Line 1670: EXPECT_TRUE(IsRaftConfigVoter(new_replica_uuid, cstate.committed_config())) > I'm a bit concerned about all those extra replicas and stuff. Is this test Yes, it's more or less reliable at least if running with --stress-cpu-threads=16 using dist-test: http://dist-test.cloudera.org//job?job_id=aserbin.1513135277.111085 -- To view, visit http://gerrit.cloudera.org:8080/8764 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia3760d0e454c9215c0f88bd8f1bed68999935a1c Gerrit-Change-Number: 8764 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Comment-Date: Wed, 13 Dec 2017 03:28:22 +0000 Gerrit-HasComments: Yes