Mike Percy has posted comments on this change.

Change subject: KUDU-871. Support tombstoned voting
......................................................................


Patch Set 6:

(36 comments)

http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS6, Line 1485: && state_ == kRunning
> why is this condition necesary? if we've just been tombstoned it still make
Fair point. This isn't required so I removed it.


PS6, Line 1509: CheckSafeToVoteUnlocked
> the duplicate logic here with CheckSafeToVoteUnlocked is a little unnerving
good idea. done


PS6, Line 1514: DCHECK(tombstone_last_logged_opid) << LogPrefixUnlocked() << 
"expected last-logged opid";
              :     local_last_logged_opid = *tombstone_last_logged_opid;
> if this DCHECK fails, would we crash anyway trying to dereference the boost
Done


PS6, Line 1519:     LOG_WITH_PREFIX_UNLOCKED(INFO) << "voting while tombstoned 
based on last-logged opid "
              :                                    << local_last_logged_opid;
> move this to the else block above and thus avoid the extra if?
Done


PS6, Line 2538: const optional<OpId>& tombstone_last_logged_opid
> whats the point of passing an opid here if you're only checking for its pre
I guess the phrasing was a bit indirect; I meant "CheckSafeToVoteUnlocked() 
guarantees that tombstoned_last_logged_opid is set". However I took Todd's 
advice and inlined this with some other logic in RequestVote().


http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS6, Line 124: static Status Create(ConsensusOptions options,
             :                        RaftPeerPB local_peer_pb,
             :                        scoped_refptr<ConsensusMetadataManager> 
cmeta_manager,
             :                        ThreadPool* raft_pool,
             :                        std::shared_ptr<RaftConsensus>* 
consensus_out);
> didn't you delete this a while back?
Yeah, that was part of a bunch of refactoring that paid off some tech debt that 
had accumulated while adding features since the early days of Consensus. While 
I was refactoring I thought that for tombstoned voting we would now need to 
handle the case where we couldn't load cmeta but I think we can handle that at 
a higher level (TabletReplica or TSTabletManager) so putting this factory 
method back as a thin wrapper around { new; Init() } to avoid having to handle 
a non-ininitialized cmeta in RaftConsensus is reasonable, I think.


Line 130:   RaftConsensus(ConsensusOptions options,
> can the ctor be made private? (may need ALLOW_MAKE_SHARED)
Done


PS6, Line 271: term
> I think CurrentTerm or Term would be better since this takes lock_ and that
Done


PS6, Line 292: Tombstoned voting
> agreed. Maybe better here to just say that it synchronously transitions to 
I filled in some details in RequestVote, SetStateUnlocked(), and here.


Line 297:   void Shutdown();
> same
Done


Line 339:   enum State {
> it would be nice to have a state transition diagram of sorts here. Or alter
Done


PS6, Line 340:     // RaftConsensus has been freshly constructed.
             :     kNew,
> is this state ever externally exposed, considering we now have the factory 
Done


PS6, Line 343:     // RaftConsensus has been initialized.
             :     kInitialized,
> in terms of behavior, how is this state different than Stopped? eg let's sa
kInitialized is very similar to kStopped. The difference is whether a tablet 
was tombstoned when it first came up (kInitialized) or whether it was 
tombstoned after having been running already (kStopped).

I didn't document that, though, I only documented that voting is allowed in 
kInitialized and kStopped states in the RequestVote() doc comment.


PS6, Line 354: RaftConsensus has been stopped.
> maybe explain what this means in the context of the other states? something
Done


PS6, Line 391:   // Initializes the RaftConsensus object. This should be called 
before
             :   // publishing this object to any thread other than the thread 
that invoked
             :   // the constructor.
             :   Status Init();
> since the constructor is only invoked by the factory method, maybe not nece
Done


PS6, Line 647:   // existence of a known last logged opid.
             :   Status CheckSafeToVoteUnlocked(const boost::optional<OpId>& 
tombstone_last_logged_opid)
             :  
> similar to David's comment on the implementation: I think it would make mor
inlined this based on a comment in the .cc file


Line 783:   AtomicBool stopped_;
> it seems it was already like this, but what's the distinction between this 
Added a comment. Also, this should actually be shutdown_, not stopped_, so 
reverted the name change.


http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/integration-tests/tombstoned_voting-itest.cc
File src/kudu/integration-tests/tombstoned_voting-itest.cc:

Line 48: 
> Replicated masters can have tombstoned replicas too, is that right? Does th
Hmm. This should be very rare but could be possible. I'll look into it.


PS6, Line 80: call Stop() on
> you mean 'manually tombstone' right? I was confused by reading this comment
Ah, right.


PS6, Line 195:   // Ask TS1 for a vote that should be denied (old last-logged 
opid).
             :   s = itest::RequestVote(ts1_ets, tablet_id, "B", current_term, 
old_opid,
             :                          /*ignore_live_leader=*/ true, 
/*is_pre_election=*/ false, kTimeout);
> can you add a test for same candidate same term but with an opid that is gr
Maybe I'm misunderstanding. It sounds like you're asking for a test of 
re-voting for (A, current_term, last_logged_opid + 1) but in that case we will 
return an automatic "yes" because we already voted for "A" this term due to 
this code path in raft_consensus.cc:

  // We already voted this term.
  if (request->candidate_term() == CurrentTermUnlocked() &&
      HasVotedCurrentTermUnlocked()) {

    // Already voted for the same candidate in the current term.
    if (GetVotedForCurrentTermUnlocked() == request->candidate_uuid()) {
      return RequestVoteRespondVoteAlreadyGranted(request, response);
    }

Also note that that a bunch of election scenarios are tested in 
leader_election-test.cc and this test isn't really supposed to exhaustively 
test all possible voting scenarios.


http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/integration-tests/tombstoned_voting-stress-test.cc
File src/kudu/integration-tests/tombstoned_voting-stress-test.cc:

PS6, Line 146: LOG(WARNING)
> Why warning?  Is it something non-regular or non-expected?  Would INFO be m
I just liked that it highlights the lines so it's easier to follow along. But 
no worries, I changed it to INFO.


PS6, Line 148: LOG(FATAL)
> nit: would FAIL() be more idiomatic in this case (given it's a test)?
Can't do that; this runs on a separate thread.


PS6, Line 159: WARNING
> INFO?  But I might be missing something here.
Done


PS6, Line 197: 2
> nit: maybe cluster_->num_tablet_servers() ?
Done


PS6, Line 228: voter_thread
> What happens to the thread if the test fails in one of the ASSERT_OK() belo
good catch; done


PS6, Line 232: SleepFor(MonoDelta::FromMilliseconds(500));
> Could you add some comments explaining why this delay is necessary?  What w
I wanted to give us some time in this state. Since this is supposed to be a 
stress test and run in slow mode only it seemed appropriate to give a 
reasonable amount of time for looping to take place.

If we only gave it 50ms I don't think there would be a lot of activity during 
that time in the thread running the votes because we fsync votes to disk.

Added a comment.


PS6, Line 240: SleepFor(MonoDelta::FromMilliseconds(500));
> ditto
added a comment


http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

PS6, Line 142:   consensus_ = std::move(consensus);
             :   set_state(INITIALIZED);
             :  
> i'm surprised these lines don't need to take lock_? is this method always c
yes, it's always called before publishing


Line 315:     case STOPPED:
> hrm, no case for STOPPING?
I didn't think it was really important but I'll add one. Done


http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

PS6, Line 81:   // Initializes RaftConsensus.
            :   Status Init(ThreadPool* raft_pool);
> left a similar question in the implementation: is it expected that this be 
Yes, it should be called before publishing. I'll add a comment.


PS6, Line 96: tombstoned voting
> I think it's better to generalize this a bit. Is this _only_ used in the ca
Yeah I suppose calling out tombstoned voting here is overly specific. I changed 
this to note that RaftConsensus also gets transitioned to a stopped state. 
RaftConsensus has its own documentation about what that means.


Line 102:   // Shut down this tablet replica permanently.
> does this require that Stop be called first? again would be nice to have so
Done


PS6, Line 272:  // Wait until the TabletReplica is fully in STOPPED or SHUTDOWN 
state.
> does this have some requirements (does the replica need to be in "stopping"
It doesn't have any such requirements for correctness, but if the replica isn't 
being stopped then this call will hang.


http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

PS6, Line 979: replica->tablet_metadata()->tombstone_last_logged_opid
> maybe cleaner for it to just return an optional<OpId> itself rather than th
Done, in a separate patch.


http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

PS6, Line 525: tombtone
> not from this patch, but typo
Done


Line 663:   Status s = DeleteTabletData(replica->tablet_metadata(), 
cmeta_manager_, delete_type,
> per discussion on Slack, we may need to modify this so that, if there was n
Done, in a separate patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/6960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Edward Fancher <e...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to