Hello Alexey Serbin, Kudu Jenkins, Hannah Nguyen,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/15850
to look at the new patch set (#4).
Change subject: KUDU-3113: fix auto-rebalancer move execution
......................................................................
KUDU-3113: fix auto-rebalancer move execution
When executing moves, the auto-rebalancer would try to resolve the
leader's address by passing its UUID instead of its host. This fixes it
to use an appropriate host.
This includes some light cleanup, and updates auto_rebalancer-test to
verify the moves lead to the copying of bytes on tablet servers.
The following flakiness is also addressed:
- NoRebalancingIfReplicasRecovering would sometimes schedule some moves
before shutting down the tablet server, and we'd time out waiting to
iterate without scheduling moves. I adjusted the ordering of the
shutdown so the rebalancer doesn't get a chance to schedule moves.
- Rarely, TestHandlingFailedTservers would see a different error than
expected when checking for failed sent RPCs. I updated the test to
expect a couple of messages.
I looped auto_rebalancer-test in DEBUG mode and it pased 1000/1000
times, compared to failing 4/10 times with the change to actually
execute moves.
I also validated this on a real cluster:
- First, I enabled auto-rebalancing on the master.
- I put a tablet server into maintenance mode.
- I then moved all replicas off the tablet server using the rebalancer
tool's --move_replicas_from_ignored_tservers option.
- I verified that even with the significant skew, since one of the
tablet servers was in maintenance mode (i.e. unavailable for
placement), the master didn't automatically move any replicas.
- Once I took the tablet server out of maintenance mode, moves were
scheduled to repopulate it.
- Steady state was reached with a cluster skew of 1.
Change-Id: If658997dc9bcb709c27d981db56cf2db13ba235f
---
M src/kudu/master/auto_rebalancer-test.cc
M src/kudu/master/auto_rebalancer.cc
2 files changed, 181 insertions(+), 107 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/15850/4
--
To view, visit http://gerrit.cloudera.org:8080/15850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If658997dc9bcb709c27d981db56cf2db13ba235f
Gerrit-Change-Number: 15850
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Hannah Nguyen <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)