Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14177 )
Change subject: WIP KUDU-2780: create thread for auto-rebalancing ...................................................................... Patch Set 5: (28 comments) http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc File src/kudu/master/auto_rebalancer-test.cc: http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@214 PS3, Line 214: F=2 is a balanced cluster. > How do you allow a flag to be changed via the CLI? I'm not sure I understand. My question was about the comment. I'm not sure what 'flag is paused' means. Maybe, you could re-phrase the comment a bit, so it's more readable? http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@241 PS3, Line 241: int kNumMasters = 1; > if the flag to pause auto-rebalancing is set, the loop is still active (ie Ah, OK. Could you add this comment into the code, please? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer-test.cc File src/kudu/master/auto_rebalancer-test.cc: http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer-test.cc@111 PS5, Line 111: } : else { Join the lines for better readability. http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.h File src/kudu/master/auto_rebalancer.h: http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.h@70 PS5, Line 70: // Variables for testing. : int number_of_loop_iterations; : int moves_scheduled_this_round; Is it possible to make this variables private and declare the test that accesses these a friend of this class? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.h@85 PS5, Line 85: const boost::optional<std::string>& location Please document the semantics of the 'location' parameter -- it's not self-evident. http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.h@133 PS5, Line 133: nit: an extra line http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc File src/kudu/master/auto_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@139 PS5, Line 139: nit: should be 4 spaces for this sort of 'continuation indent' http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@147 PS5, Line 147: 300 Do we want to make this parameter configurable or the hard-coded value is good enough for all cases? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@177 PS5, Line 177: DCHECK What's the reason of using DCHECK() here but using CHECK() for thread_ validation? Maybe, unify and use DCHECK() in both places to protect against programming errors? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@212 PS5, Line 212: Status s; : s = BuildClusterRawInfo(boost::none, &raw_info_); nit: why not just Status s = ...; ? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@433 PS5, Line 433: nit: extra line http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@460 PS5, Line 460: nit: extra line http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@462 PS5, Line 462: if (policy_fixing) What is the actual difference between policy_fixing clause and the !policy_fixing when dst_ts_uuid is empty in the latter? I guess my question is why to separate those two cases? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@491 PS5, Line 491: MonoDelta::FromSeconds(60) Where does this timeout comes from? Should it be configurable? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@500 PS5, Line 500: nit: extra line http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@513 PS5, Line 513: HostPort hp; : RETURN_NOT_OK(hp.ParseString(leader_uuid, leader_hp.port())); : vector<Sockaddr> resolved; : RETURN_NOT_OK(hp.ResolveAddresses(&resolved)); : proxy.reset(new ConsensusServiceProxy(messenger_, resolved[0], hp.host())); At lines 528-533 TSDescriptor for dst_ts_uuid is fetched. TSDescriptor has GetConsensusProxy() method that allows to get ConsensusProxy. Is it possible to reuse that (in most cases, TS address would be already resolved if GetConsensusProxy has been already called). http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@564 PS5, Line 564: vector<ServerHealthSummary> tserver_summaries; : vector<TableSummary> table_summaries; : vector<TabletSummary> tablet_summaries; Since the code below uses emplace_back()/push_back() to fill this vectors, does it make sense to call reserve() in the beginning to avoid re-allocations? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@577 PS5, Line 577: (ts->last_address()) Are the outer parentheses really necessary? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@577 PS5, Line 577: summary.ts_location = (ts->last_address()).ToString(); How come address (i.e. HostPort) became a location for a tablet server? This seems like a typo to me. http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@594 PS5, Line 594: SysTablesEntryPB table_data = table->metadata().state().pb; Is it possible to get a constant reference to the SysTablesEntryPB here? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@601 PS5, Line 601: What about TableMetadataLock when retrieving information on all its tablets? Is it necessary to hold it here? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@615 PS5, Line 615: ReplicaTypeFilter::ANY_REPLICA Are we really interested in non-voters as well while rebalancing? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@622 PS5, Line 622: TSInfoPB ts_info Why not to use a reference here? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@626 PS5, Line 626: rep.ts_address = Substitute("$0:$1", addr.host(), addr.port()); What about filling in 'is_leader' field for ReplicaSummary? http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@627 PS5, Line 627: replicas.push_back(rep); What about filling in the 'consensus_state' field for ReplicaSummary? As I can see BuildTabletsPlacementInfo() in placement_policy_util.cc uses that information. http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@641 PS5, Line 641: raw_info->table_summaries = std::move(table_summaries); nit: for readability, add return Status::OK() and move the code out of the 'else' scope below. http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@680 PS5, Line 680: namespace { nit: add a line before 'namespace' http://gerrit.cloudera.org:8080/#/c/14177/5/src/kudu/master/auto_rebalancer.cc@682 PS5, Line 682: nit: extra line -- To view, visit http://gerrit.cloudera.org:8080/14177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifca25d1063c07047cf2123e6792b3c7395be20e4 Gerrit-Change-Number: 14177 Gerrit-PatchSet: 5 Gerrit-Owner: Hannah Nguyen <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hannah Nguyen <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Mon, 23 Sep 2019 05:49:46 +0000 Gerrit-HasComments: Yes
