Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16936 )

Change subject: [master] KUDU-2181 Raft ChangeConfig request to remove a master
......................................................................


Patch Set 1:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc
File src/kudu/master/dynamic_multi_master-test.cc:

http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@346
PS1, Line 346:       if (master != HostPort()) {
             :         *req.mutable_rpc_addr() = HostPortToPB(master);
             :       }
What does it mean for this to be called with an unset 'master' field? Would be 
good to add a comment.

Ah, made my way down to the test that uses this. Would it make sense to instead 
provide a garbage HostPort value from HostPort::ParseString?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@474
PS1, Line 474: TransferMasterLeadershipAndVerify
nit: "verify" is a bit overloaded, especially when in a file that can 
misconstrue it with using the ClusterVerifier. How about calling this 
TransferMasterLeadership and calling TransferMasterLeadership 
TransferMasterLeadershipAsync? Or better yet, have them be the same method, but 
with an optional bool parameter that tells the method to wait for success.


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@787
PS1, Line 787: // When an ExternalMiniCluster is restarted after removal of a 
master then one of the
             :   // remaining masters can get reassigned to the same wal dir 
which was previously assigned
             :   // to the removed master. This causes problems during 
verification, so we always try to
             :   // remove the last master in terms of index for test purposes.
Does this problem go away if, rather than creating a new mini cluster, we just 
restart the old one after the call to RemoveMaster()?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@805
PS1, Line 805:   // We need at least 1 operation for the leader in current term 
before initiating ChangeConfig.
I thought this happens by replicating a no-op when first becoming leader. Or is 
this referring to something else?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@814
PS1, Line 814:    ASSERT_EVENTUALLY([&] {
             :       VerifyDeadMasterInSpecifiedState(master_to_remove_uuid, 
consensus::HealthReportPB::UNKNOWN);
             :     });
It seems possible we'd trigger a test failure if, by luck, the gaps in 
ASSERT_EVENTUALLY missed this state entirely. It'd be more robust to just 
assert we get to FAILED, given it's a terminal state.


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@834
PS1, Line 834: ASSERT_FALSE(ContainsKey(actual_master_hps, master_to_remove));
nit: while I appreciate the dedicated checking, especially when it comes to 
longer tests like these, we can probably do without assertions that are easily 
verified by reading a handful of lines of nearby, relatively trivial code. In 
this case, this is redundant with the L833 check since we've removed 
`master_to_remove` from `expected_master_hps`.


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@853
PS1, Line 853:   cluster_.reset();
Does

 cluster_->Shutdown();
 ASSERT_OK(cluster_->Restart());

not work? I think it'd be much less surprising to further users of the EMC who 
may want to dynamically update the masters.


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@930
PS1, Line 930:  /* Omitting "--master_support_change_config" */
nit: maybe just set it to false now so we don't have to update this test when 
we enable it by default?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@995
PS1, Line 995: It's possible there is a leadership change between querying for 
leader master and
             :   // sending the add master request to non-leader master and 
hence using ASSERT_EVENTUALLY.
Is this not a concern in other tests? Would setting 
--leader_failure_max_missed_heartbeat_periods to something very high help avoid 
this case?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@1009
PS1, Line 1009:     
ASSERT_OK(cluster_->master_proxy(non_leader_master_idx)->RemoveMaster(req, 
&resp, &rpc));
What if this succeeds because we happened to elect non_leader_master_idx as 
leader? Wouldn't the call to VerifyNumMastersAndGetAddresses never succeed? Or 
wouldn't we be unable to successfully remove another Master?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.cc
File src/kudu/master/master.cc:

http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master.cc@532
PS1, Line 532: Substitute
nit: nothing being substituted? Maybe add the leader's UUID?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master_service.cc
File src/kudu/master/master_service.cc:

http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/master_service.cc@277
PS1, Line 277:   if (!FLAGS_master_support_change_config) {
             :     rpc->RespondFailure(Status::NotSupported("Removing master is 
not supported"));
             :     return;
             :   }
Hrm, can you remind me why it's insufficient to rely on the SupportsFeature 
flags? A comment here and above would be nice.


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/mini-cluster/external_mini_cluster.h
File src/kudu/mini-cluster/external_mini_cluster.h:

http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/mini-cluster/external_mini_cluster.h@483
PS1, Line 483:   // Removes the master from the ExternalMiniCluster when the 
master has been removed
             :   // dynamically after bringing up the ExternalMiniCluster.
nit: just reading this, a few things aren't immediately clear:
- What do I, as a user of this function, need to do to the cluster before 
calling this? Exactly when is "when the master has been removed"?
- After calling this, what differences in behavior should I expect for further 
clients?


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/mini-cluster/external_mini_cluster.cc
File src/kudu/mini-cluster/external_mini_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/mini-cluster/external_mini_cluster.cc@964
PS1, Line 964: ) {
nit: Why not ++it here? Seems more idiomatic.


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/mini-cluster/external_mini_cluster.cc@966
PS1, Line 966: m
nit: if this is only used once, maybe just inline it as 
(*it)->bound_rpc_hostport()


http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/mini-cluster/external_mini_cluster.cc@972
PS1, Line 972:
             :   r
nit: drop the extra line



--
To view, visit http://gerrit.cloudera.org:8080/16936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I76c03b8850faef60b65f85184c0a4db7cc6759ee
Gerrit-Change-Number: 16936
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ban...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Jan 2021 01:32:10 +0000
Gerrit-HasComments: Yes

Reply via email to