Mike Percy has posted comments on this change.

Change subject: KUDU-871. Support tombstoned voting
......................................................................


Patch Set 10:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS10, Line 1452: boost::
> nit: "using boost::optional" is already specified
Done


PS10, Line 1494: local_last_logged_opid = queue_->GetLastOpIdInLog();
> What if tombstone_last_logged_opid was provided as well?  Is it some some o
It's OK. I added a comment to explain.


Line 1856:     CHECK_EQ(kStopping, state_) << State_Name(state_);
> this is already CHECKed in SetStateUnlocked, no?
True, removed


Line 1863:       cmeta_->set_leader_uuid("");
> ClearLeaderUnlocked() ?
Done


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

PS10, Line 351:    |                    ^
              :   //                    |                    |
              :   //                    +--------------------+
> nit: prefer the slimmer version at L424
Done


PS10, Line 372: // Raft is in the process of stopping and will not accept 
writes.
              :     kStopping,
> nit: is it possible to request a vote if in this state?  It seems it is, bu
yes; done


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/cluster_itest_util.h
File src/kudu/integration-tests/cluster_itest_util.h:

PS10, Line 236: // Request the given replica to vote.
> nit: could you document at least first 5 parameters of this method?
It's just a thin wrapper around an RPC call. I added a comment referring to its 
definition.


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/delete_tablet-itest.cc
File src/kudu/integration-tests/delete_tablet-itest.cc:

Line 95:   // Tablet bootstrap failure should result in a SHUTDOWN tablet with 
an error
> per discussion on slack, let's revert this part of this patch since it isn'
Done


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS10, Line 3044: back come 
> nit: come back
removed this change per convo w/ Todd on slack and here in the comments


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-itest.cc
File src/kudu/integration-tests/tombstoned_voting-itest.cc:

Line 18: #include "kudu/consensus/metadata.pb.h"
> nit: from IWYU
Done, thank you


PS10, Line 46: using std::string;
             : using std::vector;
> nit: include these?
Done


PS10, Line 167: TS1
> nit: the tablet?  Or it's possible to tombstone a tablet server now?
added a comment to explain


PS10, Line 172: TServerDetails* ts1_ets
> nit: consider moving this closer to the call site where it's used.
Done


Line 173:   scoped_refptr<TabletReplica> replica;
> nit: does it make sense to verify the scenario below works without errors e
Makes sense. Done.


PS10, Line 181: "A"
> Could you add a comment explaining that the actual candidate UUID is not cr
Done


PS10, Line 249: ts0_replica->consensus()->StepDown(&resp)
> nit: should it be some selective error handling based on the code set in th
I'm not sure why else it would fail. This is a pretty controlled environment. 
If a disk fails, all the tests will fail on that Jenkins run so we don't really 
care about that.


PS10, Line 253: ts0
> nit: 'TS0' since it has been referred as that already.
Is your comment parser case-sensitive? :) Done


PS10, Line 256: FOLLOWER
> Could it change to something like 'LEARNER' in the middle and bring some fl
Nope.


PS10, Line 263: error (nee failed
> per other comment, back out this change
Done


PS10, Line 294: Shut each TS
> down
Done


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-stress-test.cc
File src/kudu/integration-tests/tombstoned_voting-stress-test.cc:

Line 18: #include <atomic>
> nit: from IWYU report:
thanks!


PS10, Line 44: using std::atomic;
             : using std::string;
             : using std::thread;
             : using std::unique_lock;
             : using std::vector;
> nit: includes?
Done


PS10, Line 57:       : num_workers_(1),
> this test seems to be implemented for multiple workers but only has one wor
I initially wanted >1 worker but had trouble making it deterministic enough to 
assert on things I wanted to assert on, so I dropped it to 1 worker. When I 
tried to simplify the code for a single worker it didn't really seem better, so 
I kept the more general implementation.


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

Line 2489:     LOG(WARNING) << "Tablet " << tablet->ToString() << " has failed 
on TS "
> wouldn't it be natural to have tablet::SHUTDOWN check here instead?  Or it'
I thought it was a little unwise to have a crash based on remote state 
reporting in, but I reverted this change per other comments related to FAILED, 
and since it's only a DCHECK and not a CHECK it's generally not a big deal to 
leave it in so I have left it like it was before.


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

PS10, Line 347: // TODO(mpercy): Set to failed if Init() fails?
> Does this still apply?
Without automatic re-replication of the master the utility of doing much of 
anything is limited. Anyway, thanks for the catch, I'll do that.


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tablet/tablet_metadata.h
File src/kudu/tablet/tablet_metadata.h:

PS10, Line 357: Has no meaning for non-tombstoned tablets
> Is this to say that this will be boost::none for non-tombstoned tablets?
It will be boost::none for TABLET_DATA_READY tablets.


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

PS10, Line 304: LOG(DFATAL)
> The rest of the CHECKs would work for non-debug builds as well.  What is th
changed to FATAL


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

PS10, Line 199: SetError
> nit: sort of brought this up on Slack, I think the naming for this could be
reverted per other convo


http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tserver/tablet_copy_client.cc
File src/kudu/tserver/tablet_copy_client.cc:

PS10, Line 298: RETURN_NOT_OK(WriteConsensusMetadata());
> This change isn't noted in the commit message. Mind documenting it, or at l
Added a comment here and a bullet point to the commit message.


-- 
To view, visit http://gerrit.cloudera.org:8080/6960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Edward Fancher <e...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to