Adar Dembo has posted comments on this change.

Change subject: Refactor retry handling logic for writes
......................................................................


Patch Set 7:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/2970/7//COMMIT_MSG
Commit Message:

Line 10: th new methods: WriteRpc::AnalyzeResponse() WriteRpc::RetryIfNeeded().
Nit: "th" new methods? the? three?


http://gerrit.cloudera.org:8080/#/c/2970/7/src/kudu/client/batcher.cc
File src/kudu/client/batcher.cc:

Line 48: #include "kudu/rpc/replicated_rpc.h"
Nit: before rpc.h.


Line 213: can be used to choose the action to take
Nit: the enum doesn't enable a "choice" per se; it explicitly tells the caller 
which action to take.


Line 403: void WriteRpc::LookupTabletCb(const Status& status) {
Shouldn't this also call into Analyze/Retry? In theory it should; in practice 
it's tricky because the OK case behaves a bit differently than in the usual 
Analyze/Retry call: we empty followers_ and retry.


Line 444:     if (result.status.IsRemoteError()) {
I don't think this affects control flow but should improve readability: could 
you move L444-452 out of the if statement beginning at L442?

This helps separate the two different things going on here, the first being 
"prefer early failures over controller failures" and the second being "check 
for a retriable TOO_BUSY controller failure". It also helps emphasize that the 
probing for this failure condition is the same kind of probing that we do below 
for other conditions.


Line 445:       const ErrorStatusPB* err = 
mutable_retrier()->controller().error_response();
So part of this refactor (maybe not in this patch) will include the removal of 
RpcRetrier, right?


Line 508:       DCHECK(found) << "Tablet " << tablet_->tablet_id() << ": Unable 
to mark replica "
Nit: can you rewrite this line using strings::Substitute() so it's more clear?


Line 528:     default: {
Nit: why the new scope?


Line 537: 
Two things:
1. Extract result.status into a local variable. You can use a shorter variable 
name plus I think it's more clear than overwrite what you got from 
AnalyzeResponse().
2. Add a comment saying that from here on out, the RPC either succeeded or 
experienced a fatal failure.


http://gerrit.cloudera.org:8080/#/c/2970/7/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

Line 78:     s = Status::NetworkError("No addresses for " + hp.ToString());
I know you asked me about this in chat, but could you also mention it in the 
commit description?


http://gerrit.cloudera.org:8080/#/c/2970/7/src/kudu/rpc/replicated_rpc.h
File src/kudu/rpc/replicated_rpc.h:

Line 22: // Result status of a replicated Rpc.
What is a "replicated" RPC? Why is it useful to constrain this object to only 
"replicated" RPCs?

If you mean that the RPC's contents are replicated server-side into the WAL, 
that seems like an implementation detail that RPC code shouldn't care about.


Line 23: struct ReplicatedRpcStatus {
At first I expected that this and ScanRpcStatus overlapped, but now I see that, 
though they're structured in the same way, the Result enum is mostly different.

But there are some similarities (i.e. OK, TOO_BUSY, NON_RETRIABLE_ERROR, 
possibly some deadline stuff that's missing here). How do you intend for the 
two to interplay in the future? Given the commonalities between them I'd expect 
something...


-- 
To view, visit http://gerrit.cloudera.org:8080/2970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0e0d491f902191c88c58e3d627106cc1be1bb3cc
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to