Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting ......................................................................
Patch Set 6: (36 comments) http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS6, Line 1485: && state_ == kRunning > why is this condition necesary? if we've just been tombstoned it still make Fair point. This isn't required so I removed it. PS6, Line 1509: CheckSafeToVoteUnlocked > the duplicate logic here with CheckSafeToVoteUnlocked is a little unnerving good idea. done PS6, Line 1514: DCHECK(tombstone_last_logged_opid) << LogPrefixUnlocked() << "expected last-logged opid"; : local_last_logged_opid = *tombstone_last_logged_opid; > if this DCHECK fails, would we crash anyway trying to dereference the boost Done PS6, Line 1519: LOG_WITH_PREFIX_UNLOCKED(INFO) << "voting while tombstoned based on last-logged opid " : << local_last_logged_opid; > move this to the else block above and thus avoid the extra if? Done PS6, Line 2538: const optional<OpId>& tombstone_last_logged_opid > whats the point of passing an opid here if you're only checking for its pre I guess the phrasing was a bit indirect; I meant "CheckSafeToVoteUnlocked() guarantees that tombstoned_last_logged_opid is set". However I took Todd's advice and inlined this with some other logic in RequestVote(). http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: PS6, Line 124: static Status Create(ConsensusOptions options, : RaftPeerPB local_peer_pb, : scoped_refptr<ConsensusMetadataManager> cmeta_manager, : ThreadPool* raft_pool, : std::shared_ptr<RaftConsensus>* consensus_out); > didn't you delete this a while back? Yeah, that was part of a bunch of refactoring that paid off some tech debt that had accumulated while adding features since the early days of Consensus. While I was refactoring I thought that for tombstoned voting we would now need to handle the case where we couldn't load cmeta but I think we can handle that at a higher level (TabletReplica or TSTabletManager) so putting this factory method back as a thin wrapper around { new; Init() } to avoid having to handle a non-ininitialized cmeta in RaftConsensus is reasonable, I think. Line 130: RaftConsensus(ConsensusOptions options, > can the ctor be made private? (may need ALLOW_MAKE_SHARED) Done PS6, Line 271: term > I think CurrentTerm or Term would be better since this takes lock_ and that Done PS6, Line 292: Tombstoned voting > agreed. Maybe better here to just say that it synchronously transitions to I filled in some details in RequestVote, SetStateUnlocked(), and here. Line 297: void Shutdown(); > same Done Line 339: enum State { > it would be nice to have a state transition diagram of sorts here. Or alter Done PS6, Line 340: // RaftConsensus has been freshly constructed. : kNew, > is this state ever externally exposed, considering we now have the factory Done PS6, Line 343: // RaftConsensus has been initialized. : kInitialized, > in terms of behavior, how is this state different than Stopped? eg let's sa kInitialized is very similar to kStopped. The difference is whether a tablet was tombstoned when it first came up (kInitialized) or whether it was tombstoned after having been running already (kStopped). I didn't document that, though, I only documented that voting is allowed in kInitialized and kStopped states in the RequestVote() doc comment. PS6, Line 354: RaftConsensus has been stopped. > maybe explain what this means in the context of the other states? something Done PS6, Line 391: // Initializes the RaftConsensus object. This should be called before : // publishing this object to any thread other than the thread that invoked : // the constructor. : Status Init(); > since the constructor is only invoked by the factory method, maybe not nece Done PS6, Line 647: // existence of a known last logged opid. : Status CheckSafeToVoteUnlocked(const boost::optional<OpId>& tombstone_last_logged_opid) : > similar to David's comment on the implementation: I think it would make mor inlined this based on a comment in the .cc file Line 783: AtomicBool stopped_; > it seems it was already like this, but what's the distinction between this Added a comment. Also, this should actually be shutdown_, not stopped_, so reverted the name change. http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/integration-tests/tombstoned_voting-itest.cc File src/kudu/integration-tests/tombstoned_voting-itest.cc: Line 48: > Replicated masters can have tombstoned replicas too, is that right? Does th Hmm. This should be very rare but could be possible. I'll look into it. PS6, Line 80: call Stop() on > you mean 'manually tombstone' right? I was confused by reading this comment Ah, right. PS6, Line 195: // Ask TS1 for a vote that should be denied (old last-logged opid). : s = itest::RequestVote(ts1_ets, tablet_id, "B", current_term, old_opid, : /*ignore_live_leader=*/ true, /*is_pre_election=*/ false, kTimeout); > can you add a test for same candidate same term but with an opid that is gr Maybe I'm misunderstanding. It sounds like you're asking for a test of re-voting for (A, current_term, last_logged_opid + 1) but in that case we will return an automatic "yes" because we already voted for "A" this term due to this code path in raft_consensus.cc: // We already voted this term. if (request->candidate_term() == CurrentTermUnlocked() && HasVotedCurrentTermUnlocked()) { // Already voted for the same candidate in the current term. if (GetVotedForCurrentTermUnlocked() == request->candidate_uuid()) { return RequestVoteRespondVoteAlreadyGranted(request, response); } Also note that that a bunch of election scenarios are tested in leader_election-test.cc and this test isn't really supposed to exhaustively test all possible voting scenarios. http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/integration-tests/tombstoned_voting-stress-test.cc File src/kudu/integration-tests/tombstoned_voting-stress-test.cc: PS6, Line 146: LOG(WARNING) > Why warning? Is it something non-regular or non-expected? Would INFO be m I just liked that it highlights the lines so it's easier to follow along. But no worries, I changed it to INFO. PS6, Line 148: LOG(FATAL) > nit: would FAIL() be more idiomatic in this case (given it's a test)? Can't do that; this runs on a separate thread. PS6, Line 159: WARNING > INFO? But I might be missing something here. Done PS6, Line 197: 2 > nit: maybe cluster_->num_tablet_servers() ? Done PS6, Line 228: voter_thread > What happens to the thread if the test fails in one of the ASSERT_OK() belo good catch; done PS6, Line 232: SleepFor(MonoDelta::FromMilliseconds(500)); > Could you add some comments explaining why this delay is necessary? What w I wanted to give us some time in this state. Since this is supposed to be a stress test and run in slow mode only it seemed appropriate to give a reasonable amount of time for looping to take place. If we only gave it 50ms I don't think there would be a lot of activity during that time in the thread running the votes because we fsync votes to disk. Added a comment. PS6, Line 240: SleepFor(MonoDelta::FromMilliseconds(500)); > ditto added a comment http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: PS6, Line 142: consensus_ = std::move(consensus); : set_state(INITIALIZED); : > i'm surprised these lines don't need to take lock_? is this method always c yes, it's always called before publishing Line 315: case STOPPED: > hrm, no case for STOPPING? I didn't think it was really important but I'll add one. Done http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: PS6, Line 81: // Initializes RaftConsensus. : Status Init(ThreadPool* raft_pool); > left a similar question in the implementation: is it expected that this be Yes, it should be called before publishing. I'll add a comment. PS6, Line 96: tombstoned voting > I think it's better to generalize this a bit. Is this _only_ used in the ca Yeah I suppose calling out tombstoned voting here is overly specific. I changed this to note that RaftConsensus also gets transitioned to a stopped state. RaftConsensus has its own documentation about what that means. Line 102: // Shut down this tablet replica permanently. > does this require that Stop be called first? again would be nice to have so Done PS6, Line 272: // Wait until the TabletReplica is fully in STOPPED or SHUTDOWN state. > does this have some requirements (does the replica need to be in "stopping" It doesn't have any such requirements for correctness, but if the replica isn't being stopped then this call will hang. http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: PS6, Line 979: replica->tablet_metadata()->tombstone_last_logged_opid > maybe cleaner for it to just return an optional<OpId> itself rather than th Done, in a separate patch. http://gerrit.cloudera.org:8080/#/c/6960/6/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS6, Line 525: tombtone > not from this patch, but typo Done Line 663: Status s = DeleteTabletData(replica->tablet_metadata(), cmeta_manager_, delete_type, > per discussion on Slack, we may need to modify this so that, if there was n Done, in a separate patch. -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Edward Fancher <e...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes