Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1387. Fix a case where the scanner tight-loops and then sleeps too long ......................................................................
KUDU-1387. Fix a case where the scanner tight-loops and then sleeps too long This prevents the following issue: - the leader is down and election has not yet been triggered - the scanner tries to hit the leader, and gets 'connection refused', and thus marks it as down, then goes back to the scanner retry loop - in the tablet lookup path, RemoteTablet::HasLeader() returns false because the leader is known to be down. This causes the client to fetch new locations. - fetching new locations marks the server as up again. This logic is dubious, but will be more complicated to address. - because the server is now seen as "up" again, we just retry on the same server. The patch fixes the scanner code so that, when a tablet server is down, it is added to the scan's blacklist in addition to marking the server as down client-wide. This makes the scanner code realize that all eligible servers are blacklisted and trigger a sleep and backoff before retrying. Without this patch, linked_list-test timed out a few percent of the time in RELEASE builds. With the patch, it passed 200/200 times. I also noticed that an existing test in client-test was triggering the tight retries, but didn't have any assertion to detect the problematic number of RPCs. Change-Id: I3cb3afa81cd6f75756c328b6ffe23a385f4b172d Reviewed-on: http://gerrit.cloudera.org:8080/2699 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Kudu Jenkins (cherry picked from commit 563313d15e922db4255736ed1423bb418bbcd6fd) Reviewed-on: http://gerrit.cloudera.org:8080/2709 Reviewed-by: Jean-Daniel Cryans --- M src/kudu/client/client-test.cc M src/kudu/client/scanner-internal.cc 2 files changed, 34 insertions(+), 9 deletions(-) Approvals: Jean-Daniel Cryans: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/2709 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3cb3afa81cd6f75756c328b6ffe23a385f4b172d Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.8.x Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]>
