Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15850 )
Change subject: KUDU-3113: fix auto-rebalancer move execution ...................................................................... KUDU-3113: fix auto-rebalancer move execution When executing moves, the auto-rebalancer would try to resolve the leader's address by passing its UUID instead of its host. This fixes it to use an appropriate host. This includes some light cleanup, and updates auto_rebalancer-test to verify the moves lead to the copying of bytes on tablet servers. The following flakiness is also addressed: - NoRebalancingIfReplicasRecovering would sometimes schedule some moves before shutting down the tablet server, and we'd time out waiting to iterate without scheduling moves. I adjusted the ordering of the shutdown so the rebalancer doesn't get a chance to schedule moves. - Rarely, TestHandlingFailedTservers would see a different error than expected when checking for failed sent RPCs. I updated the test to expect a couple of messages. I looped auto_rebalancer-test in DEBUG mode and it pased 1000/1000 times, compared to failing 4/10 times with the change to actually execute moves. I also validated this on a real cluster: - First, I enabled auto-rebalancing on the master. - I put a tablet server into maintenance mode. - I then moved all replicas off the tablet server using the rebalancer tool's --move_replicas_from_ignored_tservers option. - I verified that even with the significant skew, since one of the tablet servers was in maintenance mode (i.e. unavailable for placement), the master didn't automatically move any replicas. - Once I took the tablet server out of maintenance mode, moves were scheduled to repopulate it. - Steady state was reached with a cluster skew of 1. Change-Id: If658997dc9bcb709c27d981db56cf2db13ba235f Reviewed-on: http://gerrit.cloudera.org:8080/15850 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]> --- M src/kudu/master/auto_rebalancer-test.cc M src/kudu/master/auto_rebalancer.cc 2 files changed, 183 insertions(+), 109 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/15850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If658997dc9bcb709c27d981db56cf2db13ba235f Gerrit-Change-Number: 15850 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hannah Nguyen <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120)
