[kudu-CR] Add a design doc for rpc retry/failover semantics

David Ribeiro Alves (Code Review) Tue, 26 Apr 2016 09:57:38 -0700

David Ribeiro Alves has posted comments on this change.

Change subject: Add a design doc for rpc retry/failover semantics
......................................................................



Patch Set 4:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/2642/4/docs/design-docs/rpc-retry-and-failover.md
File docs/design-docs/rpc-retry-and-failover.md:

Line 20: RPCss are not idempotent i.e. if handled twice they will lead to 
incorrect state
> nit: s/RPCss/RPCs/
Done


Line 22: is guaranteed to execute it **Exactly Once**.
> nit: no need to capitalize exactly once
based on other comments I already capitalized other stuff to make it look like 
this, so I'm punting on changing all that.


Line 28: 1. Doesn't need Exactly Once - Operations that don't mutate server 
state.
> nit: instead of capitalizing Exactly Once here and below, we can have it re
see my comment above


Line 34: 3. Needs Exactly Once but has an alternative - Operations like in 2 
but that already
> Missing: Needs exactly once but an alternative mechanism is required (e.g. 
can you come up with a case where this would happen?


Line 37: 4. Require Exactly Once for correctness - Operations that don't 
currently have E.O. implemented
> nit: s/Require/Requires/ to maintain singular tense in this list
Done


Line 83: - No Op - A passthrough, no-op component that just delegates to what 
handling we already
> nit: This doesn't render properly in HTML. Push to a personal GitHub branch
Done


Line 93: Option 1 is reasonably well presented in [1], so the discussion will 
mostly focus on options
> Please provide a two-paragraph summary of the relevant part of the paper. I
My feeling was that it wasn't necessary to know the specifics of the design 
described in the paper as long as we knew _what_ it does in general, which I 
explain above.


Line 119: each client request is
        : recorded by the consensus log almost as is so it wouldn't be 
problematic to additionally store
        : the client id and request seq no. so when a write is "consensus 
committed" all future handlers
        : of that write (future leaders) will automatically be able to identify 
the client and request
> Yes, but they don't currently store the responses in the log. I don't see a
My idea was to make this a general design doc supposed to drive discussion on 
the general direction of replay cache (encapsulation, layers, milestones etc) 
and not on the specifics of each possible implementation.

Milestone 1 includes some more detail on how this would work for Write(), which 
I think answers your question.


Line 123: This would, of course, limit exactly once semantics on server 
failover to RPCs who end up as
        : consensus rounds
> I'd like to see more discussion here about how this is handled across leade
See my comment above about the detail and my answer above regarding what we 
store and when we store it.

+1 on noting the blocking think in M1 below, which I did.

I didn't go too deep into anchoring, but the gist is: you're right, we need to 
anchor the wal for as long as we keep the stuff on the cache. Note that this is 
not because we need to read anything from it at runtime (see M1, we don't cache 
requests and results don't actually go in the wal) but because we need to make 
sure we can abide by the cache retention policy on reboot and thus need to 
repopulate it. Will make a note of this too.


Line 178: Retry handling on the client side and retry rendez-vous logic will be 
implemented at the
> Here is something tricky related to plumbing the concept of exactly-once se
You're essentially describing the problem that distributed transactions solve 
:) Basically the caller is just guaranteed that the client will try all the 
individual writes and that if successful they will execute exactly once no 
guarantee is made for the set.


Line 235: These operations are called by the master and already handle at least 
once. They might benefit
> Anything the (multi-) master calls might be cat 5, since the replay cache i
The argument of the previous reviewers on this one is that these are already 
idempotent and thus don't need special attention. I just mentioned that they 
might benefit from a common client side retry logic.


Line 279: [1](web.stanford.edu/~ouster/cgi-bin/papers/rifl.pdf "Implementing 
Linearizability at Large Scale and Low Latency")
> nit: This doesn't render well as HTML. How about:
Done. Thanks


-- 
To view, visit http://gerrit.cloudera.org:8080/2642
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2aa40486153b39724e1c9bd09c626b829274c6
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

[kudu-CR] Add a design doc for rpc retry/failover semantics

Reply via email to