Hello Adar Dembo,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/3326

to review the following change.

Change subject: WIP: KUDU-1466: improve error message when writes fail at TS
......................................................................

WIP: KUDU-1466: improve error message when writes fail at TS

Currently, when we hit certain types of tablet server errors,
we fall back to re-requesting locations from the master. If
the timing of the errors lines up right, the last request to the
master may have a very short time out, in which case we will
misreport the write as failing due to a timeout on the
GetTableLocations() RPC, rather than due to the actual error on the
tablet server.

Injecting a bit of latency into GetTableLocations() reproduces the
issue reliably in ClientTest.TestFailedDnsResolution which is
already quite flaky in TSAN builds due to this issue.

This is a WIP patch as one potential way to solve it -- have the
location picker keep track of the "best" error seen so far. But,
perhaps it's actually better for this to be done in the retriable RPC.
Posting in order to get some comments on the best approach.

Change-Id: I5f1de8159e515cbb5f52fdc440d71370437c1af2
---
M src/kudu/client/client-test.cc
M src/kudu/client/meta_cache.cc
M src/kudu/client/meta_cache.h
M src/kudu/rpc/retriable_rpc.h
4 files changed, 31 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/26/3326/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5f1de8159e515cbb5f52fdc440d71370437c1af2
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>

Reply via email to