Marton Greber has posted comments on this change. ( http://gerrit.cloudera.org:8080/24085 )
Change subject: KUDU-3729 Prefer follower moves in auto-rebalancing ...................................................................... Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/24085/3/src/kudu/master/auto_rebalancer-test.cc File src/kudu/master/auto_rebalancer-test.cc: http://gerrit.cloudera.org:8080/#/c/24085/3/src/kudu/master/auto_rebalancer-test.cc@269 PS3, Line 269: static Status BuildClusterInfoForTest( BuildClusterInfoForTest always passes an empty MovesInProgress. The new ts_with_followers_by_table_and_tag population sits inside the do_count_replica guard, so a replica that is the source of an in-progress move should be excluded from the follower map. Could we add a case here that passes a fake in-progress move and checks the source tserver is absent from the map for that {table_id, tag}? http://gerrit.cloudera.org:8080/#/c/24085/3/src/kudu/master/auto_rebalancer-test.cc@1181 PS3, Line 1181: TEST_F(AutoRebalancerTest, TestRebalancingAvoidsLeaderMoves) { ts_with_followers_by_table_and_tag is populated in BuildClusterInfo but there's no test that verifies this from a live cluster. Could we add a case using BuildClusterRawInfoForTest + BuildClusterInfoForTest that checks the map is non-empty after data is loaded? http://gerrit.cloudera.org:8080/#/c/24085/3/src/kudu/master/auto_rebalancer-test.cc@1204 PS3, Line 1204: CheckNoLeaderMovesScheduled The test's name and its CheckNoLeaderMovesScheduled() call are mismatched. CheckNoLeaderMovesScheduled() verifies that the leader-rebalancing thread (a separate feature) has 0 moves scheduled. http://gerrit.cloudera.org:8080/#/c/24085/3/src/kudu/master/auto_rebalancer.cc File src/kudu/master/auto_rebalancer.cc: http://gerrit.cloudera.org:8080/#/c/24085/3/src/kudu/master/auto_rebalancer.cc@151 PS3, Line 151: auto_rebalancing_avoid_leader_moves The flag=false branch has zero integration coverage. The counterpart to the existing TestRebalancingAvoidsLeaderMoves simply sets FLAGS_auto_rebalancing_avoid_leader_moves = false, adds a tserver, and asserts moves are still scheduled. Importantly it documents that disabling the flag doesn't break rebalancing http://gerrit.cloudera.org:8080/#/c/24085/3/src/kudu/master/auto_rebalancer.cc@158 PS3, Line 158: TAG_FLAG(auto_rebalancing_avoid_leader_moves, runtime); nit: add a new line here. http://gerrit.cloudera.org:8080/#/c/24085/3/src/kudu/rebalance/rebalance_algo-test.cc File src/kudu/rebalance/rebalance_algo-test.cc: http://gerrit.cloudera.org:8080/#/c/24085/3/src/kudu/rebalance/rebalance_algo-test.cc@342 PS3, Line 342: TEST(RebalanceAlgoUnitTest, PreferFollowerMovesByTableAndTag) { The existing test only covers the non_leader_move branch (source has a follower -> pick it). The leader_fallback branch (all tables have only leaders on the source) is not tested at all. If ts_with_followers_by_table_and_tag is empty, with prefer_follower_moves=true, the algorithm must still produce a move. http://gerrit.cloudera.org:8080/#/c/24085/3/src/kudu/rebalance/rebalance_algo.cc File src/kudu/rebalance/rebalance_algo.cc: http://gerrit.cloudera.org:8080/#/c/24085/3/src/kudu/rebalance/rebalance_algo.cc@375 PS3, Line 375: break; Can you please add a comment about why the early break on non_leader_move is safe? -- To view, visit http://gerrit.cloudera.org:8080/24085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0b56ae5ef8db5fe15ab25df1d1cebe84fd3b8f2c Gerrit-Change-Number: 24085 Gerrit-PatchSet: 3 Gerrit-Owner: Gabriella Lotz <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Comment-Date: Wed, 25 Mar 2026 14:33:30 +0000 Gerrit-HasComments: Yes
