Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10399 )

Change subject: [tools] rebalancer in the kudu CLI tool
......................................................................


Patch Set 13:

(68 comments)

http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@12
PS11, Line 12: <master_addresses>
> It's more common just to say <master addresses>.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@18
PS11, Line 18: the 12-node clust
> "the flash cluster" won't mean much to most Kudu community members. Maybe j
Done


http://gerrit.cloudera.org:8080/#/c/10399/11//COMMIT_MSG@19
PS11, Line 19:  (3-4-3 r
> Mention this vendor's Kudu is based on upstream 1.4.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt
File src/kudu/tools/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@87
PS11, Line 87: rebalance.cc
             :   rebalance_al
> (Puts Adar hat on) nit: Need to swap these two for strict ascii sort order.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@92
PS11, Line 92: ksck
             :   kudu
> nit: Order.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/CMakeLists.txt@173
PS11, Line 173: ADD_KUDU_TEST(rebalance-test)
              : ADD_KUDU_TEST(rebalance_algo-
> nit: Swap for correct order.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1320
PS11, Line 1320:
> Extra word.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1320
PS11, Line 1320: ceStartCriteria
> nit: the rebalancer
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1323
PS11, Line 1323:
> Can we can reuse the enum class Kudu1097?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1330
PS11, Line 1330: };
> This doesn't look necessary.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1338
PS11, Line 1338: // Shutdown one of the tablet servers.
> Can set to 3 directly if we shut down the first tablet server rather than a
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1339
PS11, Line 1339: HostPort ts_host_port;
> Isn't this redundant?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1343
PS11, Line 1343:   ts_host_port = ts->bound_rpc_hostport();
               :     ts->Shutdown();
               :   }
               :
               :   string err;
               :   Status s = RunKuduTool({
               :     "cluster",
               :
> It's fine to shut down the TS at idx = 0 instead of picking a random one.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1365
PS11, Line 1365:  public ::testing::WithParamInterface<std::tuple<int, 
Kudu1097>> {
> Can you mention the test is also covering different replication factors?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1382
PS11, Line 1382: xpr
> Why not auto?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1412
PS11, Line 1412: S
> kRepFactor + 1
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1427
PS11, Line 1427: r (auto i = 0; i < kNumTables; ++i) {
> This shouldn't be needed because the table already exists.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/kudu-admin-test.cc@1477
PS11, Line 1477: for (auto& workload : workloads) {
> Can we run the ClusterVerifier as well?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h
File src/kudu/tools/rebalance.h:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@50
PS11, Line 50:
> Seems unnecessary.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@53
PS11, Line 53: fig(std::vector<std::strin
> nit: Names of tables to balance.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@57
PS11, Line 57:
> nit: "on", or maybe "involving".
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58
PS11, Line 58:
> the
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58
PS11, Line 58:
> means a
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@58
PS11, Line 58:
> nit: "on a" or "involving a".
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@59
PS11, Line 59: dpoints.
> is located on the
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@65
PS11, Line 65:
> Why is this a special case?
That's because the code in rebalancer.cc has special provisions to avoid moving 
RF1 replicas.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@69
PS11, Line 69:   size_t max_moves_per_server;
             :
             :     // Maximum run time, in seconds.
             :     int64_t max_run_time_sec;
             :
> I'd simplify a bit and say something like "Represents a concrete move of a
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@74
PS11, Line 74: ether to move r
> I think it'd be better just to call it ReplicaMove.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@83
PS11, Line 83: std::string
> nit: Does it make sense just to call it TIMED_OUT, a more typical descripto
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@89
PS11, Line 89:   TIMED_OUT,
> nit: redundant.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@95
PS11, Line 95:
> nit: once
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@96
PS11, Line 96:
> if an error occurs
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@98
PS11, Line 98: Pri
> Can 'status' be null?
Nope, it cannot.  I added corresponding comment.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@101
PS11, Line 101: ed or
> Builder
Done. It's gone.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@108
PS11, Line 108:
> Note what this parameter is for.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@115
PS11, Line 115: eter. The 'cbi'
              :   // output parameter cannot be null.
              :   Status KsckResultsToClusterBalanceInfo(
              :       const KsckResults& ksck_info,
> Just to clean it up a bit: "on pending moves, which may be scheduled or run
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@120
PS11, Line 120:
> infos
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@120
PS11, Line 120:
> remove
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@121
PS11, Line 121:
> Replace with "which".
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.h@129
PS11, Line 129:  into 'replica_moves',
              :   // which will be empty if no
> Does this imply a NotFound or other non-OK status will be returned as well?
Nope, in that case this method happily returns Status::OK().  Non-OK statuses 
are reserved for errors.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc
File src/kudu/tools/rebalance.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@117
PS11, Line 117:
> Use L instead since this looks too much like 1 in a lot of fonts.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@128
PS11, Line 128: plica_count = servers_lo
> I'd make this "Statistic" and "Value".
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@129
PS11, Line 129: replica_count = servers_load_info.rbegin()->first;
              :       const double ave_replica_count =
              :           1.0 * total_replic
> Then add "Replica Count" to each statistic name, e.g. "Maximum Replica Coun
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@143
PS11, Line 143: y : tserver_summaries) {
> I think "UUID", "Address", "Replica Count" is a better order.
They are sorted by count, I think.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@181
PS11, Line 181:
> I think "Table Id", "Table Name", "Replica Count", "Replica Skew" makes mor
I would prefer the 'Table name' to be in the end of the list, otherwise the 
width of columns varies from run to run, and I don't like that.  Let me know if 
you feel strongly against that.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@204
PS11, Line 204:
> This function is hard to digest. Any change you could break it up?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@275
PS11, Line 275:     const auto& tablet_id = info.tablet_uuid;
              :         const auto& src_ts_uuid = info.ts_uuid_from;
> Should these be part of the instance's state, with the update functions as
That's doable, yes.  Initially, everything was an assorted collection of 
functions.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@292
PS11, Line 292:           // Continue to the next round, populating 
op_indices_to_schedule with
              :           // appropriate moves and scheduling the operations.
              :           continue;
              :         }
              :
              :         DCHECK(!s.ok());
              :         // The source replica is not found in the tablet's 
consensus config
              :         // or the tablet does not exit anymore. The replica 
might already
              :         // moved because of some other concurrent activity, e.g.
              :         // re-replication, another rebalancing session in 
progress, etc.
              :         LOG(INFO) << Substitute("tablet $0: $1 -> $2 move 
ignored: $3",
              :                                 tablet_id, src_ts_uuid, 
dst_ts_uuid,
              :                                 s.ToString());
              :         UpdateOnMoveScheduled(src_op_indices_, op_idx, 
info.ts_uuid_from, false);
              :         UpdateOnMoveScheduled(dst_op_indices_, op_idx, 
info.ts_uuid_to, false);
              :         // Re-evaluate t
> Just in the interest of shortening this function, can this be a small helpe
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@311
PS11, Line 311:     // Check for the moves in progress.
              :       clear_scheduled_moves = UpdateMovesInProgress(deadline, 
&timed_out);
              :       if (re_evaluate_cluster_state || clear_scheduled_moves) {
              :         break;
              :       }
              :       // Sleep a bit before going next cycle.
              :       SleepFor(MonoDelta::FromMilliseconds(200));
              :     }
              :   }
              :
              :   *result_status = timed_out ? RunStatus::TIMED_OUT
              :                              : RunStatus::CLUSTER_IS_BALANCED;
              :   if (moves_count) {
              :     *moves_count = attempted_moves_count_;
              :   }
              :
              :   return Status::OK();
              : }
              :
              : // Transform the information on the cluster returned by ksck 
into
              : // ClusterBalanceInfo that could be consumed by the rebalancing 
algorithm,
              : // taking into account pending replica movement operations. The 
pending
              : // operations are evaluated against the state of the cluster in 
accordance with
              : // the ksck results, and if the replica movement operations are 
still in
              : // progress, then they are interpreted as successfully 
completed. The idea is to
              : // prevent the algorithm outputting the same moves again while 
some of the
              : // moves recommended at prior steps are still in progress.
              : Status Rebalancer::KsckResultsToClusterBalanceInfo(
              :     const KsckResults& ksck_info,
              :     const MovesInProgress& pending_moves,
              :     ClusterBalanceInfo* cbi) const {
              :   DCHECK(cbi);
              :
              :   // tserver UUID --> total replica count of all table's 
tablets at the server
              :   typedef unordered_map<string, int32_t> TableReplicasAtServer;
              :
> This section also feels like it could be factored out into its own function
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@395
PS11, Line 395:   LOG(INFO) << msg;
> Ditto.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/rebalance.cc@473
PS11, Line 473: 
> What about UNAVAILABLE servers?
Good catch, indeed.  The reason it's not handled is that initially there was an 
idea that this code would be not run against a tablet server which is down, 
however such servers might appear in the course of the run.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@74
PS11, Line 74: on
> on
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@75
PS11, Line 75: :
> :, delete "where"
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@125
PS11, Line 125: ebalancer::Config(
              :       std::move(master_addresses),
> If these are const, can we pass-by-value and move from them?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@130
PS11, Line 130: ve_single_replic
> I think this is redundant as we move from 'cfg' in the initializer list por
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@133
PS11, Line 133: RETURN_NOT_OK
> Why not RETURN_NOT_OK?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@149
PS11, Line 149: msg_result_s
> Did you mean to add a message here too? Or take this out and return a bad s
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_action_cluster.cc@200
PS11, Line 200: le
> for each
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h
File src/kudu/tools/tool_replica_util.h:

PS11:
> Would you mind adding one sentence comments to these functions to explain w
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@52
PS11, Line 52: const MonoDelta& timeout,
> This seems like a funny parameter to have. Does it detect if the cluster is
Yes, this is what it is -- I just moved it from some other place.  Probably, we 
could improve the signature of this function in a separate changelist.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@66
PS11, Line 66: ica is a
> nit: complete.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@68
PS11, Line 68: us GetTabletLeader(
> Why not a cref?
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.h@72
PS11, Line 72:  leader_h
> nit: complete.
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc
File src/kudu/tools/tool_replica_util.cc:

http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@83
PS11, Line 83: GetLastCommittedOpId
> Looks like this function also exists as GetLastOpIdForReplica in cluster_it
That one uses TServerDetails as a parameter.  Probably, it's possible to make 
GetLastOpIdForReplica() to use this function. I think it's better to address 
that separately.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@145
PS11, Line 145:
> This is the only reason we need the client, so I think we should pass in th
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@162
PS11, Line 162: GetTablet
> Aside: I had to look up this method because it surprised me that we take ow
Yep, that's the part of the public API so I don't think it's easy to fix that.


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@180
PS11, Line 180:
> and the
Done


http://gerrit.cloudera.org:8080/#/c/10399/11/src/kudu/tools/tool_replica_util.cc@184
PS11, Line 184: starts
> happens
Done



--
To view, visit http://gerrit.cloudera.org:8080/10399
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I269ea1dcb0b528ad9f03308bac6b8769e2141238
Gerrit-Change-Number: 10399
Gerrit-PatchSet: 13
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Thu, 31 May 2018 05:02:02 +0000
Gerrit-HasComments: Yes

Reply via email to