Adar Dembo has posted comments on this change.

Change subject: WIP: Add a design doc for rpc retry/failover semantics
......................................................................


Patch Set 1:

(48 comments)

I don't have that much context on this, but I tried to focus on the 
implications for multi-master.

http://gerrit.cloudera.org:8080/#/c/2642/1//COMMIT_MSG
Commit Message:

Line 7: WIP: Add a design doc for rpc retry/failover semantics
Please add it to the README.md file in the design-docs directory.


http://gerrit.cloudera.org:8080/#/c/2642/1/docs/design-docs/rpc-retry-and-failover.md
File docs/design-docs/rpc-retry-and-failover.md:

Did we ever talk about how "rough" these design docs can be, as a group? I left 
many stylistic comments to improve readability, but I'll understand if I've 
incorrectly gauged where the "bar" should be on that sort of stuff.


Line 20: RPC's are not idempotent i.e. if handled twice they will lead to 
incorrect state
Nit: RPCs (that's what you used on L17).


Line 22: is guaranteed to execute it *Exactly One* times.
Nit: to go well with the list below, change to "execute it *Exactly Once*."

Also, I'd use bold here (**foo**) instead of italics.


Line 24: Kudu currently has 4 RPC service interfaces, Tablet, TabletAdmin, 
Master and Consensus.
Let's refer to them more like how the .proto files refer to them: TabletServer, 
TabletServerAdmin, Master, and Consensus. 

There's also RemoteBootstrap, which should probably be included.

And there's Generic, though I'm not sure if it needs to be included.


Line 29: volatile
What does volatile mean in this context?


Line 33: 3- Needs Exactly Once but has currently an alternative - Operations 
like in 2 but that already
Drop "currently", otherwise it sounds like CAS is a temporary thing that you'd 
rather replace with "proper" Exactly Once.


Line 27: 1- Doesn't need Exactly Once - Operations that don't mutate server 
state.
       : 
       : 2- Would benefit from Exactly Once - Operations that only mutate 
volatile state or mutate
       : state that is not replicated across servers and that would benefit 
from exactly once for
       : generalized error handling, but that don't strictly need it.
       : 
       : 3- Needs Exactly Once but has currently an alternative - Operations 
like in 2 but that already
       : have some alternative error handling mechanism like CAS.
       : 
       : 4- Require Exactly Once for correctness - Operations that don't 
currently have E.O. implemented
       : and require it for correctness.
Nit: this list didn't format well in markdown. Use "1.", "2.", etc.

Also, surround "Exactly Once" with stars like you did before.


Line 47: replicated rpcs, we'll use the same terminology to avoid redefining 
the concepts:
Nit: RPCs


Line 49: - Identification - Each RPC must have an unique identifier.
Nit: a, not an


Line 50: - Completion record - Each RPC must have a durable record of its 
completion.
Like "Exactly Once", let's refer to Completion Record in capital letters? 
Elsewhere in the doc.


Line 51: - Retry rendez-vous - When an RPC is retried it must find the 
completion record of the
Nit: rendezvous


Line 55: ## Design options
Having read through this list, I'm in favor of #1, but I probably don't fully 
grasp its inherent complexity.


Line 60: 1- Completely encapsulated at the RPC layer - This the option that [1] 
presents at length,
Nit: This is the option...


Line 61: allowing to "bolt-on" these properties almost independently of what 
the underlying rpc's
Nit: RPCs


Line 63: replicated RPCs we'd need to come up with a generic, durable, 
Completion record mechanism.
Nit: Completion Record. Or completion record. But it's weird to capitalize just 
the C.


Line 65: it's not totally clear what happens when an RPC mutates multiple 
server "objects" or
Could you give an example of such an RPC, so that I understand what you mean by 
"server objects"?


Line 66: what these server objects really are.
Also not a markdown-friendly list.


Line 70: each different RPC. For specific rpc's like Write() this option seems 
to map to existing
Nit: RPCs


Line 72: replay would mean that is would be relatively easy to implement a 
retry rendez-vous
Nit: would mean that it

Also, rendezvous


Line 73: mechanism. However this option lack in generality and it would likely 
require duplicate
Nit: lacks


Line 79: rendez
       : vous 
Nit: there's a space inside rendezvous and the sentence just trails off.


Line 84: Option 1 is reasonably well presented in the paper, so the discussion 
will mostly focus on options
Nit: replace "the paper" with "[1]".


Line 90: - Identification - Although the clients don't currently have an ID, a 
natural place to issue
Do clients needs to persist the ID somehow? Or is it OK for a client to get a 
new ID when it restarts?

Also, when are client IDs garbage collected by the master? Maybe that's 
described below...


Line 92: to fully guarantee unique id's though, we'd have to store client ids 
in the master tablet.
Nit: IDs. Elsewhere in the doc too.


Line 96: Note on client leases: [1] Discusses client leases at length and why 
they are a useful feature
Nit: discusses


Line 99: is what identifies each single RPC uniquely. On top of a lease 
issueing mechanism this system
Nit: issuing


Line 104: client IDs issued by the master or quite some time. Moreover it's 
something that we can
Nit: issued by the master for quite some time?


Line 106: 
>From here down, we're basically trying to squeeze as much mileage as possible 
>out of the WAL (to solve the remaining problems). It may be clearer to state 
>that explicitly up front.


Line 107: - Completion record - Each client request is recorded by the 
consensus log almost as is so
That's only true for Write requests, not every request. What of master state 
changes, like CreateTable?

Oh, I see, you touch on that in the last sentence of this paragraph. Maybe 
reword a bit so it's clear up here too?


Line 113: - Retry rendez-vous - For this element we'd need to create a new 
component, the Replay Cache,
Nit: rendezvous. Elsewhere too.


Line 125: with log (wal) garbage collection, with the added complexity that we 
should then remove
Nit: WAL


Line 132: interfaces, implemeted by other components, from which to fetch/store 
completion records
Nit: implemented


Line 138: Milestone 1 - Define interfaces for Completion Record storage and 
handling and implement
Nit: use some markdown to make the milestones stand out a little more. Maybe 
subheadings?


Line 140:  
Nit: trailing whitespace in the doc. Look for red blocks in gerrit.

Also issuing. Elsewhere too.


Line 143: cat. 4 operation), which is arguably the most important one. We'll 
define a
Would be good to explicitly state why it's most important (i.e. most common 
state change, probably?)


Line 145: 1 - Start new operations if this is a new RPC.
Nit: not a markdown-friendly list.


Line 146: 2 - For RPCs that have previsouly completed it will be able to fetch 
existing completion
Nit: previously


Line 154: wal. Note that this component must cache not only the RPC "result" 
(i.e. SUCCESS/ABORTED
Nit: WAL. Elsewhere too.


Line 163: Client ID issueing will be done by the master, when a client connects 
to the master for
Would be nice to see more detail on how the master persists client IDs.


Line 167: starting step.
As long as it's configurable by gflag, I agree this is a reasonable first step.


Line 175: in Introdction) we would benefit from having an RPC layer retry 
mechanism, for the sender,
Nit: the introduction


Line 181: CompletionRecordStore. This local version would not having a 
durability mechanism and would
Nit: have, not having


Line 184: help immensely in consoldating retry logic that we have spread all 
over. 
Presumably there'd be a CompletionRecordStore for the master as well? If it's 
local, does that mean that an ill-timed master leader election (i.e. one just 
after a failed client request) could lead to non-E.O. semantics?


Line 189: leases are stored and garbage collection is performed with the help 
with a "cluster clock".
Would be good to explain in more detail what this "cluster clock" is.


Line 212: We're currently observing errors with these operations, so they 
should be classified as cat. 2.
Isn't it kind of AlterSchema though? Viewed from the tserver's perspective I 
agree that they're category 2, e.g. the following sequence:
1. Sender 1 sends CreateTablet("foo") to TS.
2. The request never reaches the TS.
3. Sender 2 sends CreateTablet("foo") to TS.
4. Tablet foo is created.
5. Sender 1 retries CreateTablet("foo") to TS.
6. TS responds with ALREADY_PRESENT, but sender 1 doesn't know whether foo was 
already present to begin with, or whether step 1 created it.

However, there is no sender 1 and sender 2: there is only one leader master. 
Even in the case of a leader master election, sender 1 should not retry because 
it is no longer the leader.


Line 248: Due to a central table map, these operations are already executed 
exactly once, and thus
But there's still ambiguity if multiple clients try to create/alter/delete the 
same table and one ends up retrying. For example, a retrying CreateTable() 
client that receives ALREADY_PRESENT doesn't know who created the table. Was it 
the original RPC? Was it another client altogether, in which case the schema 
may not match?

Moreover, a retried CreateTable() may end up recreating a table that was just 
deleted by a second client, in between the two RPCs sent by the first client.


Line 253: [1] - Implementing Linearizability at Large Scale and Low Latency
        : web.stanford.edu/~ouster/cgi-bin/papers/rifl.pdf
The markdown-approved way of linking is: 
[I'm an inline-style link with title](https://www.google.com "Google's 
Homepage")


-- 
To view, visit http://gerrit.cloudera.org:8080/2642
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2aa40486153b39724e1c9bd09c626b829274c6
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to