[kudu-CR] KUDU-2069 p4: stop replication from failed servers in maintenance mode

Andrew Wong (Code Review) Sat, 28 Sep 2019 23:27:05 -0700

Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14222 )


Change subject: KUDU-2069 p4: stop replication from failed servers in 
maintenance mode
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/consensus/quorum_util-test.cc
File src/kudu/consensus/quorum_util-test.cc:

http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/consensus/quorum_util-test.cc@645
PS5, Line 645:   // Even if the replica to replace is meant to be ignored on 
failure, we
             :   // should honor the replacement and try to add a replica.
             :   for (char health : { '+', '-', '?' }) {
             :     RaftConfigPB config;
             :     AddPeer(&config, "A", V, health, {{"REPLACE", true}});
             :     AddPeer(&config, "B", V, '+');
             :     AddPeer(&config, "C", V, '+');
             :     EXPECT_TRUE(ShouldAddReplica(config, 3, { "A" }));
             :   }
             :   {
             :     RaftConfigPB config;
             :     AddPeer(&config, "A", V, '+', {{"REPLACE", true}});
             :     AddPeer(&config, "B", V, '-');
             :     AddPeer(&config, "C", V, '+');
             :     // Ignoring failures shouldn't impede our ability to add a 
replica when the
             :
> nit: maybe, put this under for () cycle to iterate over the health status o
Done


http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/consensus/quorum_util-test.cc@675
PS5, Line 675: p
> Does it make sense to add coverage for other health statuses as well (unkno
Done


http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/consensus/quorum_util-test.cc@684
PS5, Line 684:       EXPECT_FALSE(ShouldAddReplica(config, 5, { "A", "B" }));
             :       // But when there is a failure that isn't suppose
> ditto
Done


http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/consensus/quorum_util.cc
File src/kudu/consensus/quorum_util.cc:

http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/consensus/quorum_util.cc@455
PS5, Line 455:     if (VLOG_IS_ON(1) && ignore_failure_for_underreplication) {
             :       VLOG(1) << Substitute("ignoring $0 if failed", peer_uuid);
             :     }
> Does it deserve VLOG(1) it's not important at all?
Done


http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/integration-tests/maintenance_mode-itest.cc
File src/kudu/integration-tests/maintenance_mode-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/integration-tests/maintenance_mode-itest.cc@269
PS5, Line 269:   ASSERT_OK(ChangeTServerState(maintenance_uuid, 
TServerStateChangePB::EXIT_MAINTENANCE_MODE));
> Thank you for adding this scenario.
Ack


http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/integration-tests/maintenance_mode-itest.cc@379
PS5, Line 379:   void SetUp() override {
             :     SKIP_IF_SLOW_NOT_ALLOWED();
             :     NO_FATALS(MaintenanceModeITest::SetUp());
             :     NO_FATALS(SetUpCluster(5));
> Could it happen that replicas at the server fell behind WAL GC threshold si
Good point. I've separated the concern by disabling log GC and testing for 
FAILED_UNRECOVERABLE sepcifically using disk failures.


http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/integration-tests/maintenance_mode-itest.cc@385
PS5, Line 385:     SKIP_IF_SLOW_NOT_ALLOWED();
> Does it make sense to verify the number of written rows after stopping the
Done



--
To view, visit http://gerrit.cloudera.org:8080/14222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a63b55011d16900c0d27eac0eb75880074204db
Gerrit-Change-Number: 14222
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 29 Sep 2019 06:18:15 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2069 p4: stop replication from failed servers in maintenance mode

Reply via email to