Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8709 )
Change subject: KUDU-2230: the leader is always a viable voter ...................................................................... Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/8709/5/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: http://gerrit.cloudera.org:8080/#/c/8709/5/src/kudu/consensus/consensus_queue.cc@428 PS5, Line 428: // Only consider a peer to be a viable voter if... : // ...its last exchange was successful : viable &= peer->last_exchange_status == PeerStatus::OK; : : // ...the peer is up to date with the latest majority. : // : // This indicates that it's actively participating in majorities and likely to : // replicate a config change immediately when we propose it. : viable &= peer->last_received.index() >= queue_state_.majority_replicated_index; : : // ...we have communicated successfully with it recently. : // : // This handles the case where the tablet has had no recent writes and therefore : // even a replica that is down would have participated in the latest majority. : auto unreachable_time = now - peer->last_communication_time; : viable &= unreachable_time.ToMilliseconds() < FLAGS_consensus_rpc_timeout_ms; > My concern is that we had to duplicate the logic of incrementing remaining_ updated http://gerrit.cloudera.org:8080/#/c/8709/5/src/kudu/integration-tests/tablet_replacement-itest.cc File src/kudu/integration-tests/tablet_replacement-itest.cc: http://gerrit.cloudera.org:8080/#/c/8709/5/src/kudu/integration-tests/tablet_replacement-itest.cc@337 PS5, Line 337: Substitute("--consensus_rpc_timeout_ms=$0", kTimeout.ToMilliseconds()), > why do you think this would become flaky with a lower consensus rpc timeout http://dist-test.cloudera.org//job?job_id=aserbin.1512158682.78553 http://gerrit.cloudera.org:8080/#/c/8709/5/src/kudu/integration-tests/tablet_replacement-itest.cc@399 PS5, Line 399: SleepFor(kTimeout); > Why is it necessary to sleep here at all? we already slept at line 383. Sho That's because we want RPC timeout to pass at this point, and the RPC timeout is set to higher value than 2 * kUnavailableSec. -- To view, visit http://gerrit.cloudera.org:8080/8709 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id200ec8e562e2b21c8eef09e7b38a8d85d23239c Gerrit-Change-Number: 8709 Gerrit-PatchSet: 5 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Comment-Date: Fri, 01 Dec 2017 20:21:09 +0000 Gerrit-HasComments: Yes
