Mike Percy has posted comments on this change.

Change subject: KUDU-871. Support tombstoned voting
......................................................................


Patch Set 4:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

Line 188: Status RaftConsensus::Init() {
> I think the DCHECK on state here was useful, even if it needs to be broaden
Done


PS4, Line 1493: tombstone_last_logged_opid ? *tombstone_last_logged_opid :
              :                                                                 
    GetLatestOpIdFromLog();
> could we simplify by always using 'max()' of the entry from the metadata an
I don't think so; see my response to the same question in tablet_service.cc

However, I do think we should use the latest opid in the log if Consensus is 
currently running, since it is guaranteed to be correct in that case.


Line 1497:     LOG_WITH_PREFIX_UNLOCKED(INFO) << "voting while tombstoned";
> any way to improve this with a bit more info?
I'll add the tombstone_last_logged_opid here but I can't think of a reason to 
log more than that because we already log the details whenever we vote.


Line 2004:       state_ = new_state;
> given this is in every case, move it down below the switch?
Done


Line 2513: Status RaftConsensus::CheckSafeToVoteUnlocked(optional<OpId> 
tombstone_last_logged_opid) const {
> warning: the parameter 'tombstone_last_logged_opid' is copied for each invo
Done


PS4, Line 2515: (!tombstone_last_logged_opid
> not 100% following why this portion of the condition is necessary
if we are voting while tombstoned then we don't have to be running


PS4, Line 2520: state_ == kShutdown
> should we instead be checking the expected state (ie kStopped)? aren't ther
We can tombstoned vote in kInit and kStopped state, even in kRunning actually 
due to a race between the check in TabletService and execution in 
RaftConsensus. It's not possible to expose RaftConsensus in kNew state due to 
code in TabletReplica however it would be better to encapsulate that guarantee 
within this class so I'll add a Create() factory method.


Line 2634:   return kMinimumTerm;
> under what circumstances do we hit this case? It seems like cmeta_ should b
Good catch; this was left over from a previous attempt on this patch and is not 
possible with the current rev. However I agree with you that a factory method 
for RaftConsensus would be preferable here.


Line 2685:   if (cmeta_) {
> same
removed


http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/integration-tests/cluster_itest_util.h
File src/kudu/integration-tests/cluster_itest_util.h:

Line 237:                    const int64_t candidate_term,
> warning: parameter 'candidate_term' is const-qualified in the function decl
Done


PS4, Line 239:                    boost::optional<bool> ignore_live_leader,
             :                    boost::optional<bool> is_pre_election,
> what's the point of making these optional? don't they have well-defined def
I thought it was nice to pass them as optional so that if you don't specify 
them you get the default behavior as specified in the proto file.


http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/integration-tests/tombstoned_voting-stress-test.cc
File src/kudu/integration-tests/tombstoned_voting-stress-test.cc:

Line 38: using kudu::consensus::MakeOpId;
> warning: using decl 'MakeOpId' is unused [misc-unused-using-decls]
Done


Line 39: using kudu::consensus::LeaderStepDownResponsePB;
> warning: using decl 'LeaderStepDownResponsePB' is unused [misc-unused-using
Done


Line 41: using kudu::consensus::RaftConsensus;
> warning: using decl 'RaftConsensus' is unused [misc-unused-using-decls]
Done


Line 42: using kudu::consensus::RaftPeerPB;
> warning: using decl 'RaftPeerPB' is unused [misc-unused-using-decls]
Done


Line 47: using kudu::tablet::TabletStatePB;
> warning: using decl 'TabletStatePB' is unused [misc-unused-using-decls]
Done


Line 48: using kudu::tserver::TabletServerErrorPB;
> warning: using decl 'TabletServerErrorPB' is unused [misc-unused-using-decl
Done


http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/metadata.proto
File src/kudu/tablet/metadata.proto:

Line 135: // Tablet states are sent in TabletReports and kept in TabletReplica.
> what effect will these changes have if there is a rolling upgrade with an o
Based on a grep through the code, the master only knows about tablet::RUNNING 
and tablet::FAILED so it shouldn't even notice.


http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_metadata.h
File src/kudu/tablet/tablet_metadata.h:

Line 216:   void SetPreFlushCallback(StatusClosure callback) { 
pre_flush_callback_ = callback; }
> warning: parameter 'callback' is passed by value and only copied once; cons
Done


http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

Line 133:   set_state(INITIALIZED);
> maybe moot based on the suggestion of a factory method, but if a method lik
Done


Line 247:   // TODO: KUDU-183: Keep track of the pending tasks and send an 
"abort" message.
> warning: missing username/bug in TODO [google-readability-todo]
Done


Line 272:     // Release mem tracker resources.
> is this comment stale?
Maybe stale? Either way it seems pretty irrelevant. I'll remove it.


PS4, Line 407:   LOG_WITH_PREFIX(INFO) << "Fetched status for addr " << &state_
             :                         << ": " << 
SecureDebugString(*status_pb_out);
> leftover debug print?
yep. removed


Line 427:   CHECK_EQ(INITIALIZED, state_);
> redundant with the DCHECK in set_state
Done


http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

PS4, Line 75:   TabletReplica(scoped_refptr<TabletMetadata> meta,
            :                 
scoped_refptr<consensus::ConsensusMetadataManager> cmeta_manager,
            :                 consensus::RaftPeerPB local_peer_pb,
            :                 ThreadPool* apply_pool,
            :                 Callback<void(const std::string& reason)> 
mark_dirty_clbk);
            : 
            :   // Initializes RaftConsensus.
            :   Status Init(ThreadPool* raft_pool);
> instead of having this split "construct and then init" how about a factory 
The problem with a monolithic construction and initialization of a 
TabletReplica is that it prevents replicas in certain failure states from being 
visible to the master or the web UI.

For example, replicas that fail to Init() at startup time due to a missing or 
corrupted cmeta should still be reported on the web UI and to the master -- at 
least, I'm pretty sure that's what we want. I suppose an alternative would be 
to automatically tombstone a tablet that failed to start up but that seems 
awfully aggressive, especially in the case where there is only 1 replica per 
tablet.


PS4, Line 96: tombstoned 
> voting in general, right? ie what would happen if you called Stop() but the
The log would be closed so we wouldn't be able to vote on a 
stopped-but-not-tombstoned replica.


PS4, Line 169:   // If any peers in the consensus configuration lack permanent 
uuids, get them via an
             :   // RPC call and update.
             :   // TODO: move this to raft_consensus.h.
             :   Status UpdatePermanentUuids();
> can you remove this while you're here? seems to be dead
Done


Line 277:   // Wait until the TabletReplica is fully in STOPPED state.
> or SHUTDOWN
Done


http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

Line 136: using kudu::tablet::TabletStatusPB;
> warning: using decl 'TabletStatusPB' is unused [misc-unused-using-decls]
Done


Line 947:   tablet::TabletDataState data_state = 
replica->tablet_metadata()->tablet_data_state();
> aren't all these state checks racy? after you grab this here, there's nothi
Yeah, it's racy, but it's safe because we're checking after we have a reference 
to the replica, and the replica can only have one consensus object, which will 
be shut down if the tablet starts copying again, for example.

I'll add a block comment here to explain; LMK what you think.

I plan to move the last-logged opid from TabletMetadata to ConsensusMetadata in 
a follow-up change.


PS4, Line 966:     if (OpIdBiggerThan(tmp_last_logged_opid, MinimumOpId())) {
> when would this equal MinimumOpId? if it's never been tombstoned?
> when would this equal MinimumOpId? if it's never been tombstoned?

That's right.

> Would it be safe to just pass this in always as a non-optional, and then in 
> RequestVote always just take max(log_->GetLastLoggedOpId(), 
> tombstone_last_logged_opid())?

I don't think so. The case I am worried about is if a tablet fails to start up, 
so it doesn't have a log running, and then it gets tombstoned by the master. In 
that case we will not currently store the last-logged opid at DeleteTablet() 
time. So if we don't know what our last-logged opid was (even though we did 
actually have one) then I don't think it's safe for us to simply start voting 
based on a last-logged opid of (0,0).

Additionally, just using max() seems wrong in case there was a log truncation 
involved. We want to use our latest opid state.


http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tserver/tserver-path-handlers.cc
File src/kudu/tserver/tserver-path-handlers.cc:

Line 460:   if (!consensus || !consensus->IsRunning()) {
> instead of this, can you make Consensus::DumpStatusHtml dump something reas
good catch


-- 
To view, visit http://gerrit.cloudera.org:8080/6960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to