Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10399 )
Change subject: [tools] rebalancer in the kudu CLI tool ...................................................................... Patch Set 13: (68 comments) http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@12 PS11, Line 12: <master_addresses> > It's more common just to say <master addresses>. Done http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@18 PS11, Line 18: the 12-node clust > "the flash cluster" won't mean much to most Kudu community members. Maybe j Done http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@19 PS11, Line 19: (3-4-3 r > Mention this vendor's Kudu is based on upstream 1.4. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt File src/kudu/tools/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@87 PS11, Line 87: rebalance.cc : rebalance_al > (Puts Adar hat on) nit: Need to swap these two for strict ascii sort order. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@92 PS11, Line 92: ksck : kudu > nit: Order. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@173 PS11, Line 173: ADD_KUDU_TEST(rebalance-test) : ADD_KUDU_TEST(rebalance_algo- > nit: Swap for correct order. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1320 PS11, Line 1320: > Extra word. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1320 PS11, Line 1320: ceStartCriteria > nit: the rebalancer Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1323 PS11, Line 1323: > Can we can reuse the enum class Kudu1097? Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1330 PS11, Line 1330: }; > This doesn't look necessary. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1338 PS11, Line 1338: // Shutdown one of the tablet servers. > Can set to 3 directly if we shut down the first tablet server rather than a Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1339 PS11, Line 1339: HostPort ts_host_port; > Isn't this redundant? Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1343 PS11, Line 1343: ts_host_port = ts->bound_rpc_hostport(); : ts->Shutdown(); : } : : string err; : Status s = RunKuduTool({ : "cluster", : > It's fine to shut down the TS at idx = 0 instead of picking a random one. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1365 PS11, Line 1365: public ::testing::WithParamInterface<std::tuple<int, Kudu1097>> { > Can you mention the test is also covering different replication factors? Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1382 PS11, Line 1382: xpr > Why not auto? Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1412 PS11, Line 1412: S > kRepFactor + 1 Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1427 PS11, Line 1427: r (auto i = 0; i < kNumTables; ++i) { > This shouldn't be needed because the table already exists. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1477 PS11, Line 1477: for (auto& workload : workloads) { > Can we run the ClusterVerifier as well? Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h File src/kudu/tools/rebalance.h: http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@50 PS11, Line 50: > Seems unnecessary. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@53 PS11, Line 53: fig(std::vector<std::strin > nit: Names of tables to balance. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@57 PS11, Line 57: > nit: "on", or maybe "involving". Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58 PS11, Line 58: > the Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58 PS11, Line 58: > means a Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58 PS11, Line 58: > nit: "on a" or "involving a". Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@59 PS11, Line 59: dpoints. > is located on the Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@65 PS11, Line 65: > Why is this a special case? That's because the code in rebalancer.cc has special provisions to avoid moving RF1 replicas. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@69 PS11, Line 69: size_t max_moves_per_server; : : // Maximum run time, in seconds. : int64_t max_run_time_sec; : > I'd simplify a bit and say something like "Represents a concrete move of a Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@74 PS11, Line 74: ether to move r > I think it'd be better just to call it ReplicaMove. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@83 PS11, Line 83: std::string > nit: Does it make sense just to call it TIMED_OUT, a more typical descripto Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@89 PS11, Line 89: TIMED_OUT, > nit: redundant. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@95 PS11, Line 95: > nit: once Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@96 PS11, Line 96: > if an error occurs Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@98 PS11, Line 98: Pri > Can 'status' be null? Nope, it cannot. I added corresponding comment. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@101 PS11, Line 101: ed or > Builder Done. It's gone. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@108 PS11, Line 108: > Note what this parameter is for. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@115 PS11, Line 115: eter. The 'cbi' : // output parameter cannot be null. : Status KsckResultsToClusterBalanceInfo( : const KsckResults& ksck_info, > Just to clean it up a bit: "on pending moves, which may be scheduled or run Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@120 PS11, Line 120: > infos Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@120 PS11, Line 120: > remove Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@121 PS11, Line 121: > Replace with "which". Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@129 PS11, Line 129: into 'replica_moves', : // which will be empty if no > Does this imply a NotFound or other non-OK status will be returned as well? Nope, in that case this method happily returns Status::OK(). Non-OK statuses are reserved for errors. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc File src/kudu/tools/rebalance.cc: http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@117 PS11, Line 117: > Use L instead since this looks too much like 1 in a lot of fonts. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@128 PS11, Line 128: plica_count = servers_lo > I'd make this "Statistic" and "Value". Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@129 PS11, Line 129: replica_count = servers_load_info.rbegin()->first; : const double ave_replica_count = : 1.0 * total_replic > Then add "Replica Count" to each statistic name, e.g. "Maximum Replica Coun Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@143 PS11, Line 143: y : tserver_summaries) { > I think "UUID", "Address", "Replica Count" is a better order. They are sorted by count, I think. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@181 PS11, Line 181: > I think "Table Id", "Table Name", "Replica Count", "Replica Skew" makes mor I would prefer the 'Table name' to be in the end of the list, otherwise the width of columns varies from run to run, and I don't like that. Let me know if you feel strongly against that. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@204 PS11, Line 204: > This function is hard to digest. Any change you could break it up? Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@275 PS11, Line 275: const auto& tablet_id = info.tablet_uuid; : const auto& src_ts_uuid = info.ts_uuid_from; > Should these be part of the instance's state, with the update functions as That's doable, yes. Initially, everything was an assorted collection of functions. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@292 PS11, Line 292: // Continue to the next round, populating op_indices_to_schedule with : // appropriate moves and scheduling the operations. : continue; : } : : DCHECK(!s.ok()); : // The source replica is not found in the tablet's consensus config : // or the tablet does not exit anymore. The replica might already : // moved because of some other concurrent activity, e.g. : // re-replication, another rebalancing session in progress, etc. : LOG(INFO) << Substitute("tablet $0: $1 -> $2 move ignored: $3", : tablet_id, src_ts_uuid, dst_ts_uuid, : s.ToString()); : UpdateOnMoveScheduled(src_op_indices_, op_idx, info.ts_uuid_from, false); : UpdateOnMoveScheduled(dst_op_indices_, op_idx, info.ts_uuid_to, false); : // Re-evaluate t > Just in the interest of shortening this function, can this be a small helpe Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@311 PS11, Line 311: // Check for the moves in progress. : clear_scheduled_moves = UpdateMovesInProgress(deadline, &timed_out); : if (re_evaluate_cluster_state || clear_scheduled_moves) { : break; : } : // Sleep a bit before going next cycle. : SleepFor(MonoDelta::FromMilliseconds(200)); : } : } : : *result_status = timed_out ? RunStatus::TIMED_OUT : : RunStatus::CLUSTER_IS_BALANCED; : if (moves_count) { : *moves_count = attempted_moves_count_; : } : : return Status::OK(); : } : : // Transform the information on the cluster returned by ksck into : // ClusterBalanceInfo that could be consumed by the rebalancing algorithm, : // taking into account pending replica movement operations. The pending : // operations are evaluated against the state of the cluster in accordance with : // the ksck results, and if the replica movement operations are still in : // progress, then they are interpreted as successfully completed. The idea is to : // prevent the algorithm outputting the same moves again while some of the : // moves recommended at prior steps are still in progress. : Status Rebalancer::KsckResultsToClusterBalanceInfo( : const KsckResults& ksck_info, : const MovesInProgress& pending_moves, : ClusterBalanceInfo* cbi) const { : DCHECK(cbi); : : // tserver UUID --> total replica count of all table's tablets at the server : typedef unordered_map<string, int32_t> TableReplicasAtServer; : > This section also feels like it could be factored out into its own function Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@395 PS11, Line 395: LOG(INFO) << msg; > Ditto. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@473 PS11, Line 473: > What about UNAVAILABLE servers? Good catch, indeed. The reason it's not handled is that initially there was an idea that this code would be not run against a tablet server which is down, however such servers might appear in the course of the run. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc File src/kudu/tools/tool_action_cluster.cc: http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@74 PS11, Line 74: on > on Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@75 PS11, Line 75: : > :, delete "where" Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@125 PS11, Line 125: ebalancer::Config( : std::move(master_addresses), > If these are const, can we pass-by-value and move from them? Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@130 PS11, Line 130: ve_single_replic > I think this is redundant as we move from 'cfg' in the initializer list por Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@133 PS11, Line 133: RETURN_NOT_OK > Why not RETURN_NOT_OK? Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@149 PS11, Line 149: msg_result_s > Did you mean to add a message here too? Or take this out and return a bad s Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@200 PS11, Line 200: le > for each Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h File src/kudu/tools/tool_replica_util.h: PS11: > Would you mind adding one sentence comments to these functions to explain w Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@52 PS11, Line 52: const MonoDelta& timeout, > This seems like a funny parameter to have. Does it detect if the cluster is Yes, this is what it is -- I just moved it from some other place. Probably, we could improve the signature of this function in a separate changelist. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@66 PS11, Line 66: ica is a > nit: complete. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@68 PS11, Line 68: us GetTabletLeader( > Why not a cref? Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@72 PS11, Line 72: leader_h > nit: complete. Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc File src/kudu/tools/tool_replica_util.cc: http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@83 PS11, Line 83: GetLastCommittedOpId > Looks like this function also exists as GetLastOpIdForReplica in cluster_it That one uses TServerDetails as a parameter. Probably, it's possible to make GetLastOpIdForReplica() to use this function. I think it's better to address that separately. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@145 PS11, Line 145: > This is the only reason we need the client, so I think we should pass in th Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@162 PS11, Line 162: GetTablet > Aside: I had to look up this method because it surprised me that we take ow Yep, that's the part of the public API so I don't think it's easy to fix that. http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@180 PS11, Line 180: > and the Done http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@184 PS11, Line 184: starts > happens Done -- To view, visit http://gerrit.cloudera.org:8080/10399 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238 Gerrit-Change-Number: 10399 Gerrit-PatchSet: 13 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Thu, 31 May 2018 05:02:02 +0000 Gerrit-HasComments: Yes
