Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14177 )
Change subject: KUDU-2780: create thread for auto-rebalancing ...................................................................... Patch Set 11: (18 comments) Haven't made it all the way through; mostly nits, but there are a couple real questions worth considering (namely, what happens if tservers are down?) http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc File src/kudu/master/auto_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@146 PS11, Line 146: 300 Is this used by the rebalancer thread? http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@249 PS11, Line 249: lock_guard<simple_spinlock> l(lock_for_test_); : moves_scheduled_this_round_for_test_ = 0; nit: might be worth just using an atomic<int> or something http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@252 PS11, Line 252: SleepFor(MonoDelta::FromSeconds(FLAGS_auto_rebalancing_interval_seconds)); Don't need this anymore, right? We'll sleep up top? Same below. http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@298 PS11, Line 298: // One location: use greedy rebalancing algorithm to find moves. : if (ts_id_by_location.size() == 1) { : rebalance::TwoDimensionalGreedyAlgo algo; : RETURN_NOT_OK(GetMovesUsingRebalancingAlgo(raw_info, &algo, CrossLocations::NO, replica_moves)); : *has_location_violations = false; : } else { : nit: could you write this avoiding the extra scope layering? It's easier on the eyes that way, e.g if (ts_id_by_location.size() == 1) { // do stuff return Status::OK(); } // Logic to consider location placement policy... http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@317 PS11, Line 317: // If no placement policy violations are found, do load rebalancing. : if (replica_moves->empty()) { nit: could you write this avoiding the extra scope layering? It's easier on the eyes that way, e.g if (!replica_moves->empty()) { return Status::OK(); } // No placement policy violations found, do other stuff... http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@358 PS11, Line 358: ion==Cro nit: spacing http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@375 PS11, Line 375: on==CrossLoca nit: spacing http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@384 PS11, Line 384: std::random_device rd; : std::mt19937 random_generator(rd()); How about we have a single random_generator associated with the task and reuse it? http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@405 PS11, Line 405: ANY_REPLICA Why is this distinction over VOTER_REPLICA important? http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@498 PS11, Line 498: ts_manager_->GetDescriptorsAvailableForPlacement(&descriptors); What happens when not all of the tablet servers have registered with the master yet? Or more generally, when there exist tablets with replicas on tablet servers that haven't registered yet? Will this eventually return an error? Or will it try to move things that it maybe shouldn't? Could probably use a test. http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@517 PS11, Line 517: CatalogManager::ScopedLeaderSharedLock l(catalog_manager_); Check the status here. http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@550 PS11, Line 550: l nit: could you use a different variable name so this doesn't get conflated with the TabletMetadataLock? http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@572 PS11, Line 572: ConsensusConfigType::COMMITTED, : cstatepb.current_term(), : cstatepb.committed_config().opid_index(), : cstatepb.leader_uuid(), : voters, : non_voters); nit: too many spaces http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@739 PS11, Line 739: sort(locations.begin(), locations.end()); Why is this sorting important? http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/master/auto_rebalancer.cc@748 PS11, Line 748: nit: extra space http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/rebalance/rebalancer.h File src/kudu/rebalance/rebalancer.h: http://gerrit.cloudera.org:8080/#/c/14177/11/src/kudu/rebalance/rebalancer.h@160 PS11, Line 160: // Filter move operations in 'replica_moves': remove all operations that would : // involve moving replicas of tablets which are in 'scheduled_moves'. The : // 'replica_moves' cannot be null. : static void FilterMoves(const MovesInProgress& scheduled_moves, : std::vector<ReplicaMove>* replica_moves); Did this need to be made public? It isn't used by auto_rebalancer or rebalancer_tool AFAICT. http://gerrit.cloudera.org:8080/#/c/14177/9/src/kudu/tools/rebalancer_tool.cc File src/kudu/tools/rebalancer_tool.cc: http://gerrit.cloudera.org:8080/#/c/14177/9/src/kudu/tools/rebalancer_tool.cc@894 PS9, Line 894: * 5 Not your fault, but any idea what this is? http://gerrit.cloudera.org:8080/#/c/14177/9/src/kudu/tools/rebalancer_tool.cc@924 PS9, Line 924: // This will return Status::NotFound if no replica can be moved. : // In that case, we just continue through the loop. nit: maybe wrap with ignore_result() then -- To view, visit http://gerrit.cloudera.org:8080/14177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifca25d1063c07047cf2123e6792b3c7395be20e4 Gerrit-Change-Number: 14177 Gerrit-PatchSet: 11 Gerrit-Owner: Hannah Nguyen <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hannah Nguyen <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 04 Oct 2019 19:36:36 +0000 Gerrit-HasComments: Yes
