Josh Slocum created ZOOKEEPER-4024:
--------------------------------------

             Summary: ZooKeeper Client Fault Tolerance Extensions
                 Key: ZOOKEEPER-4024
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4024
             Project: ZooKeeper
          Issue Type: New Feature
            Reporter: Josh Slocum


Tl;dr My team at Indeed has developed ZooKeeper functionality to handle 
stateful retrying of connectionloss for write operations, and we wanted to 
reach out to discuss if this is something the Curator team may be interested in 
incorporating.
We initially reached out to the Zookeeper team 
(https://issues.apache.org/jira/browse/ZOOKEEPER-3927) but were redirected to 
Curator as the better place to contribute them. The changes could be relatively 
easily added as additional parameters and/or extensions of the existing retry 
behavior in Curator's write operations.
 
Hi Curator Devs,

My team uses zookeeper extensively as part of a distributed key-value store 
we've built at Indeed (think HBase replacement). Due to our deployment setup 
co-locating our database daemons with our large hadoop cluster, and the 
network-intensive nature of a lot of our compute jobs, we were experiencing a 
large amount of transient ConnectionLoss issues. This was especially 
problematic on important write operations, such as the creation deletion of 
distributed locks/leases or updating distributed state in the cluster. 

We saw that some existing zookeeper client wrappers handled retrying in the 
presence of ConnectionLoss, but all of the ones we looked at (including 
Curator) didn't allow for retrying writes wiith all of the proper state. 
Consider the case of retrying a create. If the initial create had succeeded on 
the server, but the client got connectionloss, the client would get a 
NodeExists exception on the retried request, even though the znode was created. 
This resulted in many issues. For the distributed lock/lease example, to other 
nodes, it looked like the calling node had been successful acquiring the 
"lock", and to the calling node, it appeared that it was not able to acquire 
the "lock", which results in a deadlock.

Curator has parameters that can modify the behavior upon retry, but those were 
not sufficient. For example, create() has orSetData(), and delete() has 
guaranteed().

To solve this, we implemented a set of "connection-loss tolerant primitives" 
for the main types of write operations. They handle a connection loss by 
retrying the operation in a loop, but upon error cases in the retry, inspect 
the current state to see if it matches the case where a previous round that got 
connectionloss actually succeeded.

* createRetriable(String path, byte[] data)
* setDataRetriable(String path, byte[] newData, int currentVersion)
* deleteRetriable(String path, int currentVersion)
* compareAndDeleteRetriable(String path, byte[] currentData, int currentVersion)

For example, in createRetriable, it will retry the create again on connection 
loss. If the retried call gets a NodeExists exception, it will check to see if 
(getData(path) == data and dataVersion == 0). If it does, it assumes the first 
create succeeded and returns success, otherwise it propagates the NodeExists 
exception.

These primitives have allowed us to program our ZooKeeper layer as if 
ConnectionLoss isn't a transient state we have to worry about, since they have 
essentially the same guarantees as the non-retriable functions in the zookeeper 
api do (with a slight difference in semantics).

Because these behaviors could be relatively easily added to Curator as 
additional parameters to the existing mechanisms, and (to my knowledge) aren't 
implemented anywhere else, we think it could be a useful contribution to the 
Curator project. If this isn't something that Curator is interested in 
incorporating, Indeed may also consider open sourcing it as a standalone 
library.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to