Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 11:

(69 comments)

I didn't have time to through everything thoroughly but I'm posting some 
comments so far. They are mostly wording in comments.

My one overall suggestion is to try to break things up a little more. Some of 
the core functions are beefy and it's hard to review them because there is a 
lot to keep track of.

I'll try and produce more feedback on the trickier sections later tonight.

http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@12
PS11, Line 12: <cluster_rpc_endpoints>
It's more common just to say <master addresses>.


http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@18
PS11, Line 18: the flash cluster
"the flash cluster" won't mean much to most Kudu community members. Maybe just 
quote how many nodes and how much data the cluster had.


http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@19
PS11, Line 19: CDH5.12.x
Mention this vendor's Kudu is based on upstream 1.4.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt
File src/kudu/tools/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@87
PS11, Line 87: rebalance_algo.cc
             :   rebalance.cc
(Puts Adar hat on) nit: Need to swap these two for strict ascii sort order.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@92
PS11, Line 92: kudu_common
             :   ksck
nit: Order.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@173
PS11, Line 173: ADD_KUDU_TEST(rebalance_algo-test)
              : ADD_KUDU_TEST(rebalance-test)
nit: Swap for correct order.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1320
PS11, Line 1320: at
Extra word.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1320
PS11, Line 1320: the rebalancing
nit: the rebalancer


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1323
PS11, Line 1323: public ::testing::WithParamInterface<bool>
Can we can reuse the enum class Kudu1097?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1330
PS11, Line 1330: constexpr int kRepFactor = 3;
This doesn't look necessary.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1338
PS11, Line 1338: FLAGS_num_tablet_servers = kNumTservers;
Can set to 3 directly if we shut down the first tablet server rather than a 
random one.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1339
PS11, Line 1339: FLAGS_num_replicas = kRepFactor;
Isn't this redundant?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1343
PS11, Line 1343: HostPort ts_host_port;
               :   {
               :     Random r(SeedRandom());
               :     auto idx = r.Next() % kNumTservers;
               :     auto* ts = cluster_->tablet_server(idx);
               :     ts_host_port = ts->bound_rpc_hostport();
               :     ts->Shutdown();
               :   }
It's fine to shut down the TS at idx = 0 instead of picking a random one.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1365
PS11, Line 1365: A test to verify that rebalancing works for both 3-4-3 and 
3-2-3 replica
Can you mention the test is also covering different replication factors?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1382
PS11, Line 1382: int
Why not auto?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1412
PS11, Line 1412: 3
kRepFactor + 1


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1427
PS11, Line 1427: Substitute("--table_num_replicas=$0", kRepFactor),
This shouldn't be needed because the table already exists.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1477
PS11, Line 1477: NO_FATALS(cluster_->AssertNoCrashes());
Can we run the ClusterVerifier as well?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h
File src/kudu/tools/rebalance.h:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@50
PS11, Line 50: (effectively Kudu cluster endpoints)
Seems unnecessary.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@53
PS11, Line 53: Tables to balance (names).
nit: Names of tables to balance.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@57
PS11, Line 57: at
nit: "on", or maybe "involving".


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58
PS11, Line 58:
the


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58
PS11, Line 58: at
nit: "on a" or "involving a".


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58
PS11, Line 58: mean
means a


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@59
PS11, Line 59: is at the
is located on the


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@65
PS11, Line 65: // Whether to move replicas of tablets with replication factor 
of one.
Why is this a special case?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@69
PS11, Line 69: // Information on particular tablet replica to move. The 
high-level
             :   // specification of rebalancing moves described by istance of 
TableReplicaMove
             :   // might be represented by instance of ReplicaMoveInfo, and in 
most cases
             :   // there will be multiple alternatives represented by 
ReplicaMoveInfo to
             :   // correspond TableReplicaMove.
I'd simplify a bit and say something like "Represents a concrete move of a 
replica from one tablet server to another. Formed logically from a 
TableReplicaMove by specifying a tablet for the move.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@74
PS11, Line 74: ReplicaMoveInfo
I think it'd be better just to call it ReplicaMove.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@83
PS11, Line 83: TIME_IS_UP,
nit: Does it make sense just to call it TIMED_OUT, a more typical descriptor?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@89
PS11, Line 89: // Constructor takes as a parameter configuration for the 
rebalancer.
nit: redundant.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@95
PS11, Line 95: when
nit: once


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@96
PS11, Line 96: an error occurred
if an error occurs


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@98
PS11, Line 98: Run
Can 'status' be null?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@101
PS11, Line 101: Builer
Builder


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@108
PS11, Line 108: moves_in_progress
Note what this parameter is for.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@115
PS11, Line 115: currently
              :   // pending replica moves. Those might left in such state from 
the previous
              :   // rebalancing step, and they will be completed eventually
              :   // (either succeeding or failing).
Just to clean it up a bit: "on pending moves, which may be scheduled or running 
after from the previous step ends."


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@120
PS11, Line 120: the
remove


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@120
PS11, Line 120: info
infos


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@121
PS11, Line 121: where the container
Replace with "which".


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@129
PS11, Line 129: If no suitable replicas are found,
              :   // 'tablet_ids' will be empty
Does this imply a NotFound or other non-OK status will be returned as well?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc
File src/kudu/tools/rebalance.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@117
PS11, Line 117: l
Use L instead since this looks too much like 1 in a lot of fonts.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@128
PS11, Line 128: "Replica Count", "Value"
I'd make this "Statistic" and "Value".


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@129
PS11, Line 129: Minimum", to_string(min_replica_count)});
              :     summary.AddRow({"Maximum", to_string(max_replica_count)});
              :     summary.AddRow({"Average
Then add "Replica Count" to each statistic name, e.g. "Maximum Replica Count".


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@143
PS11, Line 143: {"Server ID", "Replica Count", "Server RPC end-point"}
I think "UUID", "Address", "Replica Count" is a better order.

Do you think we should sort somehow? By count?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@181
PS11, Line 181: "Table ID", "Replica Skew", "Replicas Total", "Table Name"
I think "Table Id", "Table Name", "Replica Count", "Replica Skew" makes more 
sense.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@184
PS11, Line 184: const auto it = table_info.find(table_id);
Shouldn't it be impossible to have a table in one map but not the other?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@204
PS11, Line 204: Run
This function is hard to digest. Any change you could break it up?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@275
PS11, Line 275: unordered_map<string, set<size_t>> src_op_indices;
              :     unordered_map<string, set<size_t>> dst_op_indices;
Should these be part of the instance's state, with the update functions as 
private methods?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@292
PS11, Line 292: #ifndef NDEBUG
              :     // This is to enforce the invariants which the algorithm 
below relies upon.
              :     // In short, it can re-order operations suggested by the 
high-level
              :     // algorithm. However, since RunStep() outputs only one 
move operation per
              :     // tablet in every batch and there is at most one replica 
of a tablet
              :     // per server, that's perfectly safe to do. The code below 
verifies that
              :     // the contract between this code and RunStep() is in force.
              :     unordered_set<string> tablets;
              :     for (const auto& move_info : replica_move_infos) {
              :       const auto& tablet_id = move_info.tablet_uuid;
              :       if (!tablets.emplace(tablet_id).second) {
              :         LOG(DFATAL) << Substitute(
              :             "$0: more than one move operations for tablet", 
tablet_id);
              :       }
              :     }
              : #endif // #ifndef NDEBUG
Just in the interest of shortening this function, can this be a small helper we 
can DCHECK or something.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@311
PS11, Line 311:     vector<size_t> op_indices_to_schedule;
              :       for (auto it = ts_per_op_count.begin(); 
op_indices_to_schedule.empty() &&
              :            it != ts_per_op_count.end() &&
              :            it->first < config_.max_moves_per_server; ++it) {
              :         const auto& uuid_0 = it->second;
              :
              :         auto it_1 = it;
              :         ++it_1;
              :         for (; op_indices_to_schedule.empty() &&
              :              it_1 != ts_per_op_count.end() &&
              :              it_1->first < config_.max_moves_per_server; 
++it_1) {
              :           const auto& uuid_1 = it_1->second;
              :
              :           // Check for available operations where uuid_0, 
uuid_1 would be
              :           // source or destination servers correspondingly.
              :           {
              :             const auto it_src = src_op_indices.find(uuid_0);
              :             const auto it_dst = dst_op_indices.find(uuid_1);
              :             if (it_src != src_op_indices.end() &&
              :                 it_dst != dst_op_indices.end()) {
              :               set_intersection(it_src->second.begin(), 
it_src->second.end(),
              :                                it_dst->second.begin(), 
it_dst->second.end(),
              :                                
back_inserter(op_indices_to_schedule));
              :             }
              :           }
              :           {
              :             const auto it_src = src_op_indices.find(uuid_1);
              :             const auto it_dst = dst_op_indices.find(uuid_0);
              :             if (it_src != src_op_indices.end() &&
              :                 it_dst != dst_op_indices.end()) {
              :               set_intersection(it_src->second.begin(), 
it_src->second.end(),
              :                                it_dst->second.begin(), 
it_dst->second.end(),
              :                                
back_inserter(op_indices_to_schedule));
              :             }
              :           }
              :         }
This section also feels like it could be factored out into its own function.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@395
PS11, Line 395: // Check for the moves in progress.
Ditto.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@473
PS11, Line 473: WRONG_SERVER_UUID
What about UNAVAILABLE servers?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@74
PS11, Line 74: at
on


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@75
PS11, Line 75: ;
:, delete "where"


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@125
PS11, Line 125: master_addresses,
              :                          table_filters
If these are const, can we pass-by-value and move from them?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@130
PS11, Line 130: std::move(cfg));
I think this is redundant as we move from 'cfg' in the initializer list portion 
of the constructor.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@133
PS11, Line 133: ignore_result
Why not RETURN_NOT_OK?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@149
PS11, Line 149: DCHECK(false
Did you mean to add a message here too? Or take this out and return a bad 
status based on 'msg_result_status'?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@200
PS11, Line 200: by
for each


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h
File src/kudu/tools/tool_replica_util.h:

PS11:
Would you mind adding one sentence comments to these functions to explain what 
they do. Also it's nice if you explain what the output parameters are and 
whether they are allowed to be null.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@52
PS11, Line 52: bool* is_3_4_3_replication = nullptr
This seems like a funny parameter to have. Does it detect if the cluster is 
using 3-4-3 replication? It's strange to do that when retrieving the state of a 
single tablet.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@66
PS11, Line 66: Completed
nit: complete.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@68
PS11, Line 68: client::sp::shared_ptr<client::KuduClient>
Why not a cref?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@72
PS11, Line 72: completed
nit: complete.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@83
PS11, Line 83: GetLastCommittedOpId
Looks like this function also exists as GetLastOpIdForReplica in 
cluster_itest_util.{h,cc}. Can we reuse that?


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@145
PS11, Line 145: client->default_admin_operation_timeout()
This is the only reason we need the client, so I think we should pass in the 
timeout as a parameter instead.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@162
PS11, Line 162: GetTablet
Aside: I had to look up this method because it surprised me that we take 
ownership of the KuduTablet based on the name of the method.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@180
PS11, Line 180: ,
and the


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@184
PS11, Line 184: hapens
happens



--
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 11
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Wed, 30 May 2018 00:25:36 +0000
Gerrit-HasComments: Yes

Reply via email to