Hello Adar Dembo, I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/3326 to review the following change. Change subject: WIP: KUDU-1466: improve error message when writes fail at TS ...................................................................... WIP: KUDU-1466: improve error message when writes fail at TS Currently, when we hit certain types of tablet server errors, we fall back to re-requesting locations from the master. If the timing of the errors lines up right, the last request to the master may have a very short time out, in which case we will misreport the write as failing due to a timeout on the GetTableLocations() RPC, rather than due to the actual error on the tablet server. Injecting a bit of latency into GetTableLocations() reproduces the issue reliably in ClientTest.TestFailedDnsResolution which is already quite flaky in TSAN builds due to this issue. This is a WIP patch as one potential way to solve it -- have the location picker keep track of the "best" error seen so far. But, perhaps it's actually better for this to be done in the retriable RPC. Posting in order to get some comments on the best approach. Change-Id: I5f1de8159e515cbb5f52fdc440d71370437c1af2 --- M src/kudu/client/client-test.cc M src/kudu/client/meta_cache.cc M src/kudu/client/meta_cache.h M src/kudu/rpc/retriable_rpc.h 4 files changed, 31 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/26/3326/1 -- To view, visit http://gerrit.cloudera.org:8080/3326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I5f1de8159e515cbb5f52fdc440d71370437c1af2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>