Just published the post!
https://medium.com/@jmslocum16/idempotent-fault-tolerant-writes-in-curator-5-2-0-eadf7c12c814

Thanks all!

On Mon, Aug 2, 2021 at 10:35 AM Enrico Olivelli <[email protected]> wrote:

> Josh
> Great work!
>
> Looking forward to your post
>
>
>
> Enrico
>
> Il Lun 2 Ago 2021, 17:31 Josh Slocum <[email protected]> ha scritto:
>
> > Thanks to Jordan for the helpful comments and suggestions on the blog
> post!
> > I'll wait another day in case anyone else has any other comments,
> otherwise
> > I'll move forward with getting the post published.
> >
> > Thanks All,
> > Josh
> >
> > On Wed, Jul 28, 2021 at 12:29 PM Josh Slocum <[email protected]>
> wrote:
> >
> > > Hi All,
> > >
> > > Per my discussion with Enrico, I wrote up a Google Doc with a Draft of
> a
> > > blog post for idempotent writes in Curator 5.2.0:
> > >
> > >
> >
> https://docs.google.com/document/d/1867mkraRzat3fueYm5BR2ihzma4f_zgywqDH0msQLu0/edit?usp=sharing
> > >
> > > If anyone has comments or suggestions, feel free to comment them on the
> > > Google Doc.
> > >
> > > Thanks,
> > > Josh
> > >
> > > On Tue, Jul 27, 2021 at 9:58 AM Enrico Olivelli (Jira) <
> [email protected]>
> > > wrote:
> > >
> > >>
> > >>     [
> > >>
> >
> https://issues.apache.org/jira/browse/CURATOR-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388123#comment-17388123
> > >> ]
> > >>
> > >> Enrico Olivelli commented on CURATOR-584:
> > >> -----------------------------------------
> > >>
> > >> Great !
> > >>
> > >> I usually use Medium [https://medium.com/@eolivelli] to post my
> blogs.
> > >> If you want to write up a Google Doc I will be happy to review it and
> > >> help you get it published (or co-authored)
> > >>
> > >> you can reach me out directly to my apache address
> [email protected]
> > ,
> > >> or better we can interact on [[email protected]|mailto:
> > >> [email protected]] ([
> https://curator.apache.org/mailing-lists.html
> > >> )]
> > >>
> > >> > Curator Client Fault Tolerance Extensions
> > >> > -----------------------------------------
> > >> >
> > >> >                 Key: CURATOR-584
> > >> >                 URL:
> > https://issues.apache.org/jira/browse/CURATOR-584
> > >> >             Project: Apache Curator
> > >> >          Issue Type: Improvement
> > >> >            Reporter: Josh Slocum
> > >> >            Assignee: Enrico Olivelli
> > >> >            Priority: Minor
> > >> >             Fix For: 5.2.0
> > >> >
> > >> >          Time Spent: 7h 40m
> > >> >  Remaining Estimate: 0h
> > >> >
> > >> > Tl;dr My team at Indeed has developed ZooKeeper functionality to
> > handle
> > >> stateful retrying of connectionloss for write operations, and we
> wanted
> > to
> > >> reach out to discuss if this is something the Curator team may be
> > >> interested in incorporating.
> > >> > We initially reached out to the Zookeeper team (
> > >> https://issues.apache.org/jira/browse/ZOOKEEPER-3927) but were
> > >> redirected to Curator as the better place to contribute them. The
> > changes
> > >> could be relatively easily added as additional parameters and/or
> > extensions
> > >> of the existing retry behavior in Curator's write operations.
> > >> >
> > >> > Hi Curator Devs,
> > >> > My team uses zookeeper extensively as part of a distributed
> key-value
> > >> store we've built at Indeed (think HBase replacement). Due to our
> > >> deployment setup co-locating our database daemons with our large
> hadoop
> > >> cluster, and the network-intensive nature of a lot of our compute
> jobs,
> > we
> > >> were experiencing a large amount of transient ConnectionLoss issues.
> > This
> > >> was especially problematic on important write operations, such as the
> > >> creation deletion of distributed locks/leases or updating distributed
> > state
> > >> in the cluster.
> > >> > We saw that some existing zookeeper client wrappers handled retrying
> > in
> > >> the presence of ConnectionLoss, but all of the ones we looked at
> > (including
> > >> Curator) didn't allow for retrying writes wiith all of the proper
> state.
> > >> Consider the case of retrying a create. If the initial create had
> > succeeded
> > >> on the server, but the client got connectionloss, the client would
> get a
> > >> NodeExists exception on the retried request, even though the znode was
> > >> created. This resulted in many issues. For the distributed lock/lease
> > >> example, to other nodes, it looked like the calling node had been
> > >> successful acquiring the "lock", and to the calling node, it appeared
> > that
> > >> it was not able to acquire the "lock", which results in a deadlock.
> > >> > Curator has parameters that can modify the behavior upon retry, but
> > >> those were not sufficient. For example, create() has orSetData(), and
> > >> delete() has guaranteed().
> > >> > To solve this, we implemented a set of "connection-loss tolerant
> > >> primitives" for the main types of write operations. They handle a
> > >> connection loss by retrying the operation in a loop, but upon error
> > cases
> > >> in the retry, inspect the current state to see if it matches the case
> > where
> > >> a previous round that got connectionloss actually succeeded.
> > >> > * createRetriable(String path, byte[] data)
> > >> > * setDataRetriable(String path, byte[] newData, int currentVersion)
> > >> > * deleteRetriable(String path, int currentVersion)
> > >> > * compareAndDeleteRetriable(String path, byte[] currentData, int
> > >> currentVersion)
> > >> > For example, in createRetriable, it will retry the create again on
> > >> connection loss. If the retried call gets a NodeExists exception, it
> > will
> > >> check to see if (getData(path) == data and dataVersion == 0). If it
> > does,
> > >> it assumes the first create succeeded and returns success, otherwise
> it
> > >> propagates the NodeExists exception.
> > >> > These primitives have allowed us to program our ZooKeeper layer as
> if
> > >> ConnectionLoss isn't a transient state we have to worry about, since
> > they
> > >> have essentially the same guarantees as the non-retriable functions in
> > the
> > >> zookeeper api do (with a slight difference in semantics).
> > >> > Because these behaviors could be relatively easily added to Curator
> as
> > >> additional parameters to the existing mechanisms, and (to my
> knowledge)
> > >> aren't implemented anywhere else, we think it could be a useful
> > >> contribution to the Curator project. If this isn't something that
> > Curator
> > >> is interested in incorporating, Indeed may also consider open sourcing
> > it
> > >> as a standalone library.
> > >>
> > >>
> > >>
> > >> --
> > >> This message was sent by Atlassian Jira
> > >> (v8.3.4#803005)
> > >>
> > >
> >
>

Reply via email to