Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14177 )

Change subject: WIP KUDU-2780: create thread for auto-rebalancing
......................................................................


Patch Set 3:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc
File src/kudu/master/auto_rebalancer-test.cc:

http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@58
PS3, Line 58: if (cluster_)
nit: it's a good practice to use scope braces for 'if ()' clause even it's a 
single-line statement (sprawls to multi-line in this case).


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@85
PS3, Line 85:   int kNumMasters = 3;
            :   int kNumTservers = 3;
            :   int kNumTablets = 4;
nit: these are constants, right?  Add const/constexpr if so.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@96
PS3, Line 96: 10
10+ seconds run-time for a test makes it a 'slow' test.  Add a notion of 
running this test only when KUDU_ALLOW_SLOW_TESTS=1 is set; you can see how 
it's done in other tests.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@101
PS3, Line 101:     if (i==leader_idx) ASSERT_NE(0, 
number_of_loop_iterations(i));
             :     else ASSERT_EQ(0, moves_scheduled_this_round(i));
Would this test fail if master leadership changes while the test is running?


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@104
PS3, Line 104:
nit:  an extra space


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@111
PS3, Line 111:   int kNumMasters = 3;
             :   int kNumTservers = 3;
             :   int kNumTablets = 3;
Add const/constexpr.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@132
PS3, Line 132:   auto iterations = number_of_loop_iterations(new_leader_idx);
             :   ASSERT_LT(0, iterations);
Might it happen that the rebalancing thread hasn't yet after master leader 
elected?


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@137
PS3, Line 137: TEST_F(AutoRebalancerTest, NoReplicaMovesIfBalanced) {
Could it be possible to implement these scenarios in form of unit testing of 
some particular methods/function of the rebalancer?  That way it would give 
much more control over the replica placement, etc.  Also, it would be easier 
and faster to run.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@139
PS3, Line 139: kNumTservers, kNumTablets;
These are not constants, but 'kXxx' is used only for constants.  Change the 
naming.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@214
PS3, Line 214: auto-rebalancing flag is paused
nit: how does one pause a flag?


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@241
PS3, Line 241:   ASSERT_GE(moves_scheduled_before, moves_scheduled_after);
If there shouldn't be any moves, then why ASSERT_GE(), not ASSERT_EQ()?


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@243
PS3, Line 243:   FLAGS_auto_rebalancing_stop_flag = orig_flag;
I don't think this is necessary -- KuduTest does that automatically using 
google::FlagSaver


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.h
File src/kudu/master/auto_rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.h@81
PS3, Line 81:   // Variables for testing.
            :   int number_of_loop_iterations;
            :   int moves_scheduled_this_round;
If this is for testing, move then under 'private' section and move those tests 
friends of this class.

Another approach might be introducing metrics for these readings: look around 
for METRIC_DEFINE_gauge_<type> in other components for examples.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@120
PS3, Line 120: DEFINE_uint32(auto_rebalancing_max_moves_per_iteration, 5,
             :               "Maximum number of total replica moves to perform 
during one "
             :               "iteration of the auto-rebalancing loop.");
If I'm not mistaken, in the original rebalancer code constant 5 is used as a 
factor multiplied by maximum move operations per tserver, and that's where that 
'magic' constant came from.

The minimum number of operations per server was 2, so effectively this 
parameter was scaling along with number of nodes in the cluster, which is 
'natural'.  I think it would be nice to have some sort of relative constraint 
of the same sort for the auto-rebalancer as well, otherwise it would be 
necessary to tweak this constant on every cluster to make it relevant with 
regard to the size of the cluster.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/catalog_manager.h@769
PS3, Line 769:   // Functions for auto-rebalancer testing.
             :   int auto_rebalancer_iterations() const;
             :   int auto_rebalancer_moves() const;
             :
             :   master::TSManager* ts_manager() const;
If these are for testing-only purposes, a better place for these is in the 
'private' section.



--
To view, visit http://gerrit.cloudera.org:8080/14177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifca25d1063c07047cf2123e6792b3c7395be20e4
Gerrit-Change-Number: 14177
Gerrit-PatchSet: 3
Gerrit-Owner: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 16 Sep 2019 15:45:10 +0000
Gerrit-HasComments: Yes

Reply via email to