Re: Write retry issues with ZooKeeperClient

Sijie Guo Wed, 28 Jun 2017 00:40:52 -0700

On Tue, Jun 27, 2017 at 4:27 PM, Sam Just <sj...@salesforce.com> wrote:


> Should EXPIRED be considered a recoverable error for retry purposes?
> Retrying in that case would mean that operations which might have been
> submitted under the assumption that ephemeral nodes were still present
> would be retried after the ephemeral nodes disappeared.  Don't all users
> have special handling for EXPIRED anyway?
>

If a request is failed with expired, that means the request was never
attempted. so it is a recoverable error to retry.
When a request is failed with expired, that means the session that this
request used is also expired and all ephemeral nodes associated with this
session will also be deleted. so it is a safe situation for retries.


> -Sam
>
> On Tue, Jun 27, 2017 at 4:08 PM, Sijie Guo <guosi...@gmail.com> wrote:
>
> > On Tue, Jun 27, 2017 at 2:36 PM, Sam Just <sj...@salesforce.com> wrote:
> >
> > > On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo <guosi...@gmail.com> wrote:
> > >
> > > > On Tue, Jun 27, 2017 at 10:18 AM, Sam Just <sj...@salesforce.com>
> > wrote:
> > > >
> > > > > JV: What do you mean by "May not be perfect for negative testing"?
> > > > >
> > > > > I don't think there's anything inevitable about this particular
> class
> > > of
> > > > > behavior.  ZK could have chosen to avoid this problem entirely by
> > doing
> > > > > duplicate op detection server-side with a per-session transaction
> > log.
> > > > >
> > > > > Since it doesn't, we'll need to solve it ourselves.
> > > > > ZkLedgerUnderreplicationManager relies on either getting success
> or
> > > > > NodeExists on the ephemeral lock node to determine whether a
> > particular
> > > > > ReplicationWorker is responsible for replicating a particular
> ledger.
> > > I
> > > > > haven't reproduced this one, but it seems to me that two workers
> > could
> > > > both
> > > > > try to replicate the same ledger and *both get NodeExists* on the
> > lock
> > > > > node.  This would leave the ledger locked for replication until
> > > whichever
> > > > > one actually wrote the node restarts.
> > > > >
> > > > > Users like the above are pretty easy to fix.  One option would be
> to
> > > > simply
> > > > > include with the write a payload including a nonce for the client.
> > > Upon
> > > > a
> > > > > ConnectionLoss event, we read the node and determine whether we
> > > "won".  I
> > > > > think ledger creation probably falls into the same category, the
> > > metadata
> > > > > could include an identifier for the creator.
> > > > >
> > > >
> > > > for ephemeral znode, you don't have to add extra payload.
> > > >
> > > > in the retry logic,
> > > >
> > > > - catch NodeExists exception
> > > > - call exists to check the znode. you can get the Stat (
> > > > https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/
> > > > zookeeper/data/Stat.html#getEphemeralOwner()
> > > > ) of this znode.
> > > > - you can compare the Stat#getEphmeralOwner with the client's current
> > > > session id. if they match, the node is created by this session,
> > otherwise
> > > > this node is created by other session.
> > > >
> > > >
> > > Ah, I'll do that then, much easier.
> > >
> > >
> > > >
> > > > >
> > > > > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.
> It
> > > > > relies
> > > > > on setData success/BadVersion atomicity to ensure consistent rmw.
> > > >
> > > > It's
> > > > > certainly solvable by including a version vector, but I think
> there's
> > > > > probably a simpler way to do it.  I'm not sure under what
> > circumstances
> > > > > there even *can* be racing writes to the metadata -- is the only
> case
> > > > that
> > > > > matters concurrent updates between the writer and a single
> > replication
> > > > > worker?
> > > > >
> > > >
> > > > I don't think it can be addressed by a version vector, there is no
> > public
> > > > *history* about who changed what version.
> > > > But the question is why do you need to address this
> > #writeLedgerMetadata
> > > > inside the ZooKeeperClient retry loop. I think there is already a
> logic
> > > on
> > > > #writeLedgerMetadata to resolve conflicts due to BadVersion.
> > > > The LedgerHandle will re-read the ledger metadata when encountering
> > > version
> > > > conflicts.
> > > >
> > > >
> > > That's the part I'm unsure of.  I know it has handling for BadVersion,
> I
> > > don't know
> > > whether it's still correct if you get a BadVersion even though your
> write
> > > actually
> > > succeeded, I need to look into that more.  Also, I'm not at all
> > suggesting
> > > we handle
> > > this in the retry loop, I'm suggesting that for non-idempotent writes,
> we
> > > should not
> > > retry in ZooKeeperClient and instead propagate the ConnectionLoss error
> > and
> > > let the caller deal with it.
> > >
> >
> > +1 I agree with no retrying non-idempotent writes.
> >
> >
> > > -Sam
> > >
> > >
> > > >
> > > > >
> > > > > The outline is probably:
> > > > > 1) Add a way to do retries automatically on *unconditional* creates
> > and
> > > > > deletes (there are some users that don't care).
> > > > > 2) Audit and modify existing users to either use the methods
> > introduced
> > > > in
> > > > > 1 or handle the ConnectionLoss events explicitly.
> > > > > 3) Switch ZooKeeperClient to not retry conditional writes instead
> > > > > propagating the ConnectionLoss event, that should make its
> semantics
> > > > match
> > > > > the vanilla ZK client.
> > > > > -Sam
> > > > >
> > > > > On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <guosi...@gmail.com>
> > wrote:
> > > > >
> > > > > > I agree in current callback model, we only propagate error code
> to
> > > the
> > > > > > outer client, so we lose the information about the detail cause.
> > > > > >
> > > > > > But I think we also tried to map some zk error codes to bk error
> > > codes.
> > > > > >
> > > > > > NoNode -> NoLedger
> > > > > > NodeExists -> LedgerExists
> > > > > > ··· (other codes)
> > > > > >
> > > > > > Also we tried to hide zk metadata related implementation behind
> > > > > interfaces,
> > > > > > so some of the errors should be handled by the zk ledger manager
> > > > > > implementation before propagating to the client.
> > > > > >
> > > > > > Sijie
> > > > > >
> > > > > > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
> > > > > > enrico.olive...@diennea.com> wrote:
> > > > > >
> > > > > > > Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao
> > > Jujjuri
> > > > ha
> > > > > > > scritto:
> > > > > > >
> > > > > > > This is nothing different in any network based system. Like
> nfs.
> > So
> > > > we
> > > > > > need
> > > > > > > to have some kind of logic on the client side to make
> reasonable
> > > > > > > assumption. May not be perfect for negative testing.
> > > > > > >
> > > > > > >
> > > > > > > Many times I wanted to have some "exception cause" on
> > BKException,
> > > > > > > expecially for ZK issues.
> > > > > > > The way we use only int error codes hides the root causes of
> the
> > > > error.
> > > > > > > BookKeeper client writes to the log, but the "cause" cannot be
> > > > reported
> > > > > > to
> > > > > > > higher level logs and sometime this is annoying.
> > > > > > > In the future I would like to add more details on errors
> > > > > > >
> > > > > > > -- Enrico
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > JV
> > > > > > >
> > > > > > > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosi...@gmail.com
> > > > <mailto:
> > > > > > guo
> > > > > > > si...@gmail.com>> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi Sam,
> > > > > > >
> > > > > > > Let's assume there is no such retry logic. How do you expect to
> > > > handle
> > > > > > this
> > > > > > > situation?
> > > > > > >
> > > > > > > Can your application try to create a new ledger or catch
> > NodeExists
> > > > > > > exception?
> > > > > > >
> > > > > > > - Sijie
> > > > > > >
> > > > > > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <
> sj...@salesforce.com
> > > > > <mailto:s
> > > > > > > j...@salesforce.com>> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hmm, curator seems to have essentially the same problem:
> > > > > > > https://issues.apache.org/jira/browse/CURATOR-268
> > > > > > > I'm not sure there's a good way to solve this transparently --
> > the
> > > > > right
> > > > > > > answer is
> > > > > > > probably to plumb the ConnectionLoss event through
> > ZooKeeperClient
> > > > for
> > > > > > > interested callers, audit for metadata users where we depend on
> > > > > > >
> > > > > > >
> > > > > > > atomicity,
> > > > > > >
> > > > > > >
> > > > > > > and update each one to handle it appropriately.
> > > > > > > -Sam
> > > > > > >
> > > > > > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <
> sj...@salesforce.com
> > > > > <mailto:s
> > > > > > > j...@salesforce.com>> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > BookKeeper has a wrapper class for the ZooKeeper client called
> > > > > > > ZooKeeperClient.
> > > > > > > Its purpose appears to be to transparently perform retries in
> the
> > > > case
> > > > > > >
> > > > > > >
> > > > > > > that
> > > > > > >
> > > > > > >
> > > > > > > ZooKeeper returns ConnectionLoss on an operation due to a
> > > Disconnect
> > > > > > >
> > > > > > >
> > > > > > > event.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > The trouble is that it's possible that a write which received a
> > > > > > > ConnectionLoss
> > > > > > > error as a return value actually succeeded.  Once
> ZooKeeperClient
> > > > > > >
> > > > > > >
> > > > > > > retries,
> > > > > > >
> > > > > > >
> > > > > > > it'll
> > > > > > > get back NodeExists and propagate that error to the caller,
> even
> > > > though
> > > > > > >
> > > > > > >
> > > > > > > the
> > > > > > >
> > > > > > >
> > > > > > > write succeeded and the node in fact did not exist.
> > > > > > >
> > > > > > > It seems as though the same issue would hold for setData and
> > delete
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > calls
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > which
> > > > > > > use the version argument -- you could get a spurious
> BadVersion.
> > > > > > >
> > > > > > > I believe I've reproduced the variant with a spurious
> NodeExists.
> > > It
> > > > > > > manifested as a suprious BKLedgerExistException when running
> > > against
> > > > a
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 3
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > instance ZooKeeper cluster with dm-delay under the ZooKeeper
> > > instance
> > > > > > > storage
> > > > > > > to force Disconnect events by injecting write delays.  This
> seems
> > > to
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > make
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > sense
> > > > > > > as AbstractZkLedgerManager.createLedgerMetadata uses
> > > > > > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata
> > node
> > > > and
> > > > > > > appears
> > > > > > > to depend on the create atomicity to avoid two writers
> > overwriting
> > > > each
> > > > > > > other's
> > > > > > > nodes.
> > > > > > >
> > > > > > > AbstractZkLedgerManager.writeLedger would seem to have the
> same
> > > > > problem
> > > > > > > with
> > > > > > > its dependence on using setData with the version argument to
> > avoid
> > > > lost
> > > > > > > updates.
> > > > > > >
> > > > > > > Am I missing something in this analysis?  It seems to me that
> > > > behavior
> > > > > > > could
> > > > > > > be actually problematic during periods of spotty connectivity
> to
> > > the
> > > > > > > ZooKeeper cluster.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > -Sam
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Enrico Olivelli Software Development Manager @Diennea Tel.:
> (+39)
> > > > 0546
> > > > > > > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA)
> > > MagNews -
> > > > > > > E-mail Marketing Solutions http://www.magnews.it Diennea -
> > Digital
> > > > > > > Marketing Solutions http://www.diennea.com
> > > > > > >
> > > > > > > ________________________________
> > > > > > >
> > > > > > > Iscriviti alla nostra newsletter per rimanere aggiornato su
> > digital
> > > > ed
> > > > > > > email marketing! http://www.magnews.it/newsletter/
> > > > > > >
> > > > > > > The information in this email is confidential and may be
> legally
> > > > > > > privileged. If you are not the intended recipient please notify
> > the
> > > > > > sender
> > > > > > > immediately and destroy this email. Any unauthorized, direct or
> > > > > indirect,
> > > > > > > disclosure, copying, storage, distribution or other use is
> > strictly
> > > > > > > forbidden.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Write retry issues with ZooKeeperClient

Reply via email to