Mike Percy has posted comments on this change. Change subject: Add a design doc for rpc retry/failover semantics ......................................................................
Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/2642/4/docs/design-docs/rpc-retry-and-failover.md File docs/design-docs/rpc-retry-and-failover.md: Line 119: each client request is : recorded by the consensus log almost as is so it wouldn't be problematic to additionally store : the client id and request seq no. so when a write is "consensus committed" all future handlers : of that write (future leaders) will automatically be able to identify the client and request > Did you see the last rev? I've updated milestone 1 to make even clearer tha Thanks, I was reading via email and so I missed your response to the general comments because it came in a 2nd reply. The design is starting to clarify for me. I think something else that is missing is a list of errors that the replay cache will *not* handle. One example: If the seqno is auto-assigned by the client library, then in the case of a partial timeout, how can we manually retry inserts with the same seqno? Likewise, say the cluster is rebooted. How can we maintain cache consistency across reboots when we are not durably storing the result of an operation? Maybe neither of these situations is handled. If so, let's make sure to call it out in the design as a tradeoff. I think we are also missing specific design goals, for example which errors specifically are we trying to handle? The most important one is writes, but the corner cases on writes are many. I think we want to handle the following: 1. Leader commits and then crashes, we retry on the new leader and get the cached result. 2. We get a client timeout, retry and get the operation result on the same leader if it completed. (however this second use case is complicated by the seqno thing above) -- To view, visit http://gerrit.cloudera.org:8080/2642 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idc2aa40486153b39724e1c9bd09c626b829274c6 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
