Dinesh Bhat has posted comments on this change.

Change subject: [tools] Implement a manual leader_step_down for a tablet
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4533/1/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

Line 218:       tablet_id_
> But why not get the leader info from the consensus info too? Why go to two 
One reason I was hesitant to use the same routine for leadership info was 
because GetConsensus collects the details from all the tservers, and I wasn't 
sure whether one of them could end up returning a stale info depending on when 
the RPC hits them. In the code below at L234, probability of that RPC hitting 
when the leader failover is in transition is very likely. If anything, this was 
mainly due to not knowing the codebase well and staying on safer side(go to 
master for leader info).


PS1, Line 230:             "leader_failure_max_missed_heartbeat_periods",
             :             &max_missed_hb_periods_str));
> Now I see what you mean; the number of attempts in AssertEventually is neve
Well we don't want to include the <do leader step down> under AssertEventually, 
because it's perfectly possible that even after specified timeout, we never 
elected a leader which is different than current_leader. ASSERT_NE(new_leader, 
current_leader) could yield us incorrect results. I will still debug what I 
observed is just conincidence or there is more to it than just meet the eye. 
However, the ++attempts kinda ensures that eventually new_leader emerges on the 
other side.


http://gerrit.cloudera.org:8080/#/c/4533/3/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS3, Line 221:     // Grab the default settings for heartbeat timeouts.
             :     string hb_timeout_str;
             :     
ASSERT_TRUE(google::GetCommandLineOption("raft_heartbeat_interval_ms",
             :                                              &hb_timeout_str));
             :     int32 hb_timeout;
             :     ASSERT_TRUE(safe_strto32(hb_timeout_str, &hb_timeout));
             :     string max_missed_hb_periods_str;
             :     ASSERT_TRUE(
             :         google::GetCommandLineOption(
             :             "leader_failure_max_missed_heartbeat_periods",
             :             &max_missed_hb_periods_str));
             :     double max_missed_hb_periods;
             :     ASSERT_TRUE(safe_strtod(max_missed_hb_periods_str,
             :                             &max_missed_hb_periods));
> You don't need to go through all this trouble (and you certainly don't need
Good point, forgot about DECLARE, done.


-- 
To view, visit http://gerrit.cloudera.org:8080/4533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia046a28a2008f4f5d1e955f57752a32a1ddc5ab8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat <din...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <din...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to