Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11251 )

Change subject: KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 13:

(13 comments)

PeerMessageQueue::DetermineIfCaughtUp is called for every peer on update, and 
one condition under which it exits early during a leadership transfer period is 
when that period has a designated successor and the peer isn't it. But if 
there's no designated successor the function continues, so if there's no 
designated successor the first peer to be identified as caught up will be 
transferred leadership.

http://gerrit.cloudera.org:8080/#/c/11251/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11251/13//COMMIT_MSG@16
PS13, Line 16: beings
> begins
Done


http://gerrit.cloudera.org:8080/#/c/11251/13//COMMIT_MSG@42
PS13, Line 42: Still WIP
> Still still WIP?
Negatory.


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_peers.cc@501
PS13, Line 501:                                  const rpc::ResponseCallback& 
/*callback*/) {
              :   
controller->set_timeout(MonoDelta::FromMilliseconds(FLAGS_consensus_rpc_timeout_ms));
              :   consensus_proxy_->RunLeaderElection(*request, response, 
controller);
> we are mixing async and sync calls here but ignoring the output from the sy
Yes it's meant to be sync. The result of the call is handled by the caller, who 
owns the parameter.


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.h@556
PS13, Line 556: TODO(unknown)
> TODO(todd)
Done


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.cc@1071
PS13, Line 1071: DVLOG
> how about just VLOG
Done


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/consensus_queue.cc@1451
PS13, Line 1451:   WARN_NOT_OK(raft_pool_observers_token_->SubmitClosure(
> add: DCHECK(queue_lock_.is_locked());
Done


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@598
PS13, Line 598: new_leader_uuid.get()
> nit: it's more idiomatic to use *new_leader_uuid with optionals vs. get(),
Done


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@601
PS13, Line 601:       return Status::OK();
> Is there a use case where we want to support this? Otherwise it seems like
I don't think it's illegal- the tool moves leadership to a requested peer. If 
that peer is already leader, that's good, not an error. It also makes coding 
tests harder because the "oops I was already leader so I gave you an error but 
everything is actually fine" case would need to be handled.


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@608
PS13, Line 608:       
resp->mutable_error()->set_code(TabletServerErrorPB::UNKNOWN_ERROR);
              :       StatusToPB(Status::InvalidArgument(msg), 
resp->mutable_error()->mutable_status());
              :       // We return OK so that the tablet service won't 
overwrite the error code.
              :       return Status::OK();
> why not just return Status::InvalidArgument in this case and let TabletServ
Done


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@614
PS13, Line 614:   BeginLeaderTransferPeriodUnlocked(new_leader_uuid);
> Is there something that prevents us from clobbering an existing leadership
Nope, the new one replaces the old one. I guess it might be nice to only allow 
one to occur at once (like your code change below suggests).


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@622
PS13, Line 622: transfer_period_timer_->Snooze()
> What is the use case for this?
None once we only allow one LT at a time.


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/consensus/raft_consensus.cc@631
PS13, Line 631:   leader_transfer_in_progress_.Store(true, kMemOrderAcquire);
> how about:
Done


http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/11251/13/src/kudu/tools/kudu-admin-test.cc@1448
PS13, Line 1448: TEST_F(AdminCliTest, 
TestSimultaneousLeaderTransferAndAbruptStepdown) {
> This test could use an explanation -- I'm a little confused about the purpo
Explanation added.



--
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 13
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Oct 2018 17:20:53 +0000
Gerrit-HasComments: Yes

Reply via email to