Hao Hao has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15905 )
Change subject: KUDU-3113: fix auto-rebalancer move execution ...................................................................... KUDU-3113: fix auto-rebalancer move execution NOTE: this wasn't clean, on account of 8528ac70245efb6d7fec2d18dfa17e45b80421a9. When executing moves, the auto-rebalancer would try to resolve the leader's address by passing its UUID instead of its host. This fixes it to use an appropriate host. This includes some light cleanup, and updates auto_rebalancer-test to verify the moves lead to the copying of bytes on tablet servers. The following flakiness is also addressed: - NoRebalancingIfReplicasRecovering would sometimes schedule some moves before shutting down the tablet server, and we'd time out waiting to iterate without scheduling moves. I adjusted the ordering of the shutdown so the rebalancer doesn't get a chance to schedule moves. - Rarely, TestHandlingFailedTservers would see a different error than expected when checking for failed sent RPCs. I updated the test to expect a couple of messages. I looped auto_rebalancer-test in DEBUG mode and it pased 1000/1000 times, compared to failing 4/10 times with the change to actually execute moves. I also validated this on a real cluster: - First, I enabled auto-rebalancing on the master. - I put a tablet server into maintenance mode. - I then moved all replicas off the tablet server using the rebalancer tool's --move_replicas_from_ignored_tservers option. - I verified that even with the significant skew, since one of the tablet servers was in maintenance mode (i.e. unavailable for placement), the master didn't automatically move any replicas. - Once I took the tablet server out of maintenance mode, moves were scheduled to repopulate it. - Steady state was reached with a cluster skew of 1. Change-Id: If658997dc9bcb709c27d981db56cf2db13ba235f Reviewed-on: http://gerrit.cloudera.org:8080/15850 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <aser...@cloudera.com> Reviewed-on: http://gerrit.cloudera.org:8080/15905 --- M src/kudu/master/auto_rebalancer-test.cc M src/kudu/master/auto_rebalancer.cc 2 files changed, 183 insertions(+), 111 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/15905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.12.x Gerrit-MessageType: merged Gerrit-Change-Id: If658997dc9bcb709c27d981db56cf2db13ba235f Gerrit-Change-Number: 15905 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120)