Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )
Change subject: [tools] rebalancer in the kudu CLI tool ...................................................................... Patch Set 11: (69 comments) I didn't have time to through everything thoroughly but I'm posting some comments so far. They are mostly wording in comments. My one overall suggestion is to try to break things up a little more. Some of the core functions are beefy and it's hard to review them because there is a lot to keep track of. I'll try and produce more feedback on the trickier sections later tonight. http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@12 PS11, Line 12: <cluster_rpc_endpoints> It's more common just to say <master addresses>. http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@18 PS11, Line 18: the flash cluster "the flash cluster" won't mean much to most Kudu community members. Maybe just quote how many nodes and how much data the cluster had. http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@19 PS11, Line 19: CDH5.12.x Mention this vendor's Kudu is based on upstream 1.4. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt File src/kudu/tools/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@87 PS11, Line 87: rebalance_algo.cc : rebalance.cc (Puts Adar hat on) nit: Need to swap these two for strict ascii sort order. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@92 PS11, Line 92: kudu_common : ksck nit: Order. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@173 PS11, Line 173: ADD_KUDU_TEST(rebalance_algo-test) : ADD_KUDU_TEST(rebalance-test) nit: Swap for correct order. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1320 PS11, Line 1320: at Extra word. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1320 PS11, Line 1320: the rebalancing nit: the rebalancer http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1323 PS11, Line 1323: public ::testing::WithParamInterface<bool> Can we can reuse the enum class Kudu1097? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1330 PS11, Line 1330: constexpr int kRepFactor = 3; This doesn't look necessary. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1338 PS11, Line 1338: FLAGS_num_tablet_servers = kNumTservers; Can set to 3 directly if we shut down the first tablet server rather than a random one. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1339 PS11, Line 1339: FLAGS_num_replicas = kRepFactor; Isn't this redundant? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1343 PS11, Line 1343: HostPort ts_host_port; : { : Random r(SeedRandom()); : auto idx = r.Next() % kNumTservers; : auto* ts = cluster_->tablet_server(idx); : ts_host_port = ts->bound_rpc_hostport(); : ts->Shutdown(); : } It's fine to shut down the TS at idx = 0 instead of picking a random one. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1365 PS11, Line 1365: A test to verify that rebalancing works for both 3-4-3 and 3-2-3 replica Can you mention the test is also covering different replication factors? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1382 PS11, Line 1382: int Why not auto? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1412 PS11, Line 1412: 3 kRepFactor + 1 http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1427 PS11, Line 1427: Substitute("--table_num_replicas=$0", kRepFactor), This shouldn't be needed because the table already exists. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1477 PS11, Line 1477: NO_FATALS(cluster_->AssertNoCrashes()); Can we run the ClusterVerifier as well? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h File src/kudu/tools/rebalance.h: http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@50 PS11, Line 50: (effectively Kudu cluster endpoints) Seems unnecessary. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@53 PS11, Line 53: Tables to balance (names). nit: Names of tables to balance. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@57 PS11, Line 57: at nit: "on", or maybe "involving". http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58 PS11, Line 58: the http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58 PS11, Line 58: at nit: "on a" or "involving a". http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58 PS11, Line 58: mean means a http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@59 PS11, Line 59: is at the is located on the http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@65 PS11, Line 65: // Whether to move replicas of tablets with replication factor of one. Why is this a special case? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@69 PS11, Line 69: // Information on particular tablet replica to move. The high-level : // specification of rebalancing moves described by istance of TableReplicaMove : // might be represented by instance of ReplicaMoveInfo, and in most cases : // there will be multiple alternatives represented by ReplicaMoveInfo to : // correspond TableReplicaMove. I'd simplify a bit and say something like "Represents a concrete move of a replica from one tablet server to another. Formed logically from a TableReplicaMove by specifying a tablet for the move. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@74 PS11, Line 74: ReplicaMoveInfo I think it'd be better just to call it ReplicaMove. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@83 PS11, Line 83: TIME_IS_UP, nit: Does it make sense just to call it TIMED_OUT, a more typical descriptor? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@89 PS11, Line 89: // Constructor takes as a parameter configuration for the rebalancer. nit: redundant. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@95 PS11, Line 95: when nit: once http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@96 PS11, Line 96: an error occurred if an error occurs http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@98 PS11, Line 98: Run Can 'status' be null? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@101 PS11, Line 101: Builer Builder http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@108 PS11, Line 108: moves_in_progress Note what this parameter is for. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@115 PS11, Line 115: currently : // pending replica moves. Those might left in such state from the previous : // rebalancing step, and they will be completed eventually : // (either succeeding or failing). Just to clean it up a bit: "on pending moves, which may be scheduled or running after from the previous step ends." http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@120 PS11, Line 120: the remove http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@120 PS11, Line 120: info infos http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@121 PS11, Line 121: where the container Replace with "which". http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@129 PS11, Line 129: If no suitable replicas are found, : // 'tablet_ids' will be empty Does this imply a NotFound or other non-OK status will be returned as well? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc File src/kudu/tools/rebalance.cc: http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@117 PS11, Line 117: l Use L instead since this looks too much like 1 in a lot of fonts. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@128 PS11, Line 128: "Replica Count", "Value" I'd make this "Statistic" and "Value". http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@129 PS11, Line 129: Minimum", to_string(min_replica_count)}); : summary.AddRow({"Maximum", to_string(max_replica_count)}); : summary.AddRow({"Average Then add "Replica Count" to each statistic name, e.g. "Maximum Replica Count". http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@143 PS11, Line 143: {"Server ID", "Replica Count", "Server RPC end-point"} I think "UUID", "Address", "Replica Count" is a better order. Do you think we should sort somehow? By count? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@181 PS11, Line 181: "Table ID", "Replica Skew", "Replicas Total", "Table Name" I think "Table Id", "Table Name", "Replica Count", "Replica Skew" makes more sense. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@184 PS11, Line 184: const auto it = table_info.find(table_id); Shouldn't it be impossible to have a table in one map but not the other? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@204 PS11, Line 204: Run This function is hard to digest. Any change you could break it up? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@275 PS11, Line 275: unordered_map<string, set<size_t>> src_op_indices; : unordered_map<string, set<size_t>> dst_op_indices; Should these be part of the instance's state, with the update functions as private methods? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@292 PS11, Line 292: #ifndef NDEBUG : // This is to enforce the invariants which the algorithm below relies upon. : // In short, it can re-order operations suggested by the high-level : // algorithm. However, since RunStep() outputs only one move operation per : // tablet in every batch and there is at most one replica of a tablet : // per server, that's perfectly safe to do. The code below verifies that : // the contract between this code and RunStep() is in force. : unordered_set<string> tablets; : for (const auto& move_info : replica_move_infos) { : const auto& tablet_id = move_info.tablet_uuid; : if (!tablets.emplace(tablet_id).second) { : LOG(DFATAL) << Substitute( : "$0: more than one move operations for tablet", tablet_id); : } : } : #endif // #ifndef NDEBUG Just in the interest of shortening this function, can this be a small helper we can DCHECK or something. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@311 PS11, Line 311: vector<size_t> op_indices_to_schedule; : for (auto it = ts_per_op_count.begin(); op_indices_to_schedule.empty() && : it != ts_per_op_count.end() && : it->first < config_.max_moves_per_server; ++it) { : const auto& uuid_0 = it->second; : : auto it_1 = it; : ++it_1; : for (; op_indices_to_schedule.empty() && : it_1 != ts_per_op_count.end() && : it_1->first < config_.max_moves_per_server; ++it_1) { : const auto& uuid_1 = it_1->second; : : // Check for available operations where uuid_0, uuid_1 would be : // source or destination servers correspondingly. : { : const auto it_src = src_op_indices.find(uuid_0); : const auto it_dst = dst_op_indices.find(uuid_1); : if (it_src != src_op_indices.end() && : it_dst != dst_op_indices.end()) { : set_intersection(it_src->second.begin(), it_src->second.end(), : it_dst->second.begin(), it_dst->second.end(), : back_inserter(op_indices_to_schedule)); : } : } : { : const auto it_src = src_op_indices.find(uuid_1); : const auto it_dst = dst_op_indices.find(uuid_0); : if (it_src != src_op_indices.end() && : it_dst != dst_op_indices.end()) { : set_intersection(it_src->second.begin(), it_src->second.end(), : it_dst->second.begin(), it_dst->second.end(), : back_inserter(op_indices_to_schedule)); : } : } : } This section also feels like it could be factored out into its own function. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@395 PS11, Line 395: // Check for the moves in progress. Ditto. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@473 PS11, Line 473: WRONG_SERVER_UUID What about UNAVAILABLE servers? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc File src/kudu/tools/tool_action_cluster.cc: http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@74 PS11, Line 74: at on http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@75 PS11, Line 75: ; :, delete "where" http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@125 PS11, Line 125: master_addresses, : table_filters If these are const, can we pass-by-value and move from them? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@130 PS11, Line 130: std::move(cfg)); I think this is redundant as we move from 'cfg' in the initializer list portion of the constructor. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@133 PS11, Line 133: ignore_result Why not RETURN_NOT_OK? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@149 PS11, Line 149: DCHECK(false Did you mean to add a message here too? Or take this out and return a bad status based on 'msg_result_status'? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@200 PS11, Line 200: by for each http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h File src/kudu/tools/tool_replica_util.h: PS11: Would you mind adding one sentence comments to these functions to explain what they do. Also it's nice if you explain what the output parameters are and whether they are allowed to be null. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@52 PS11, Line 52: bool* is_3_4_3_replication = nullptr This seems like a funny parameter to have. Does it detect if the cluster is using 3-4-3 replication? It's strange to do that when retrieving the state of a single tablet. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@66 PS11, Line 66: Completed nit: complete. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@68 PS11, Line 68: client::sp::shared_ptr<client::KuduClient> Why not a cref? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@72 PS11, Line 72: completed nit: complete. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc File src/kudu/tools/tool_replica_util.cc: http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@83 PS11, Line 83: GetLastCommittedOpId Looks like this function also exists as GetLastOpIdForReplica in cluster_itest_util.{h,cc}. Can we reuse that? http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@145 PS11, Line 145: client->default_admin_operation_timeout() This is the only reason we need the client, so I think we should pass in the timeout as a parameter instead. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@162 PS11, Line 162: GetTablet Aside: I had to look up this method because it surprised me that we take ownership of the KuduTablet based on the name of the method. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@180 PS11, Line 180: , and the http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@184 PS11, Line 184: hapens happens -- To view, visit http://gerrit.cloudera.org:8080/10399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238 Gerrit-Change-Number: 10399 Gerrit-PatchSet: 11 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Wed, 30 May 2018 00:25:36 +0000 Gerrit-HasComments: Yes
