Re: Write retry issues with ZooKeeperClient

Sam Just Tue, 27 Jun 2017 14:37:36 -0700

On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo <guosi...@gmail.com> wrote:


> On Tue, Jun 27, 2017 at 10:18 AM, Sam Just <sj...@salesforce.com> wrote:
>
> > JV: What do you mean by "May not be perfect for negative testing"?
> >
> > I don't think there's anything inevitable about this particular class of
> > behavior.  ZK could have chosen to avoid this problem entirely by doing
> > duplicate op detection server-side with a per-session transaction log.
> >
> > Since it doesn't, we'll need to solve it ourselves.
> > ZkLedgerUnderreplicationManager relies on either getting success or
> > NodeExists on the ephemeral lock node to determine whether a particular
> > ReplicationWorker is responsible for replicating a particular ledger.  I
> > haven't reproduced this one, but it seems to me that two workers could
> both
> > try to replicate the same ledger and *both get NodeExists* on the lock
> > node.  This would leave the ledger locked for replication until whichever
> > one actually wrote the node restarts.
> >
> > Users like the above are pretty easy to fix.  One option would be to
> simply
> > include with the write a payload including a nonce for the client.  Upon
> a
> > ConnectionLoss event, we read the node and determine whether we "won".  I
> > think ledger creation probably falls into the same category, the metadata
> > could include an identifier for the creator.
> >
>
> for ephemeral znode, you don't have to add extra payload.
>
> in the retry logic,
>
> - catch NodeExists exception
> - call exists to check the znode. you can get the Stat (
> https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/
> zookeeper/data/Stat.html#getEphemeralOwner()
> ) of this znode.
> - you can compare the Stat#getEphmeralOwner with the client's current
> session id. if they match, the node is created by this session, otherwise
> this node is created by other session.
>
>
Ah, I'll do that then, much easier.


>
> >
> > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.  It
> > relies
> > on setData success/BadVersion atomicity to ensure consistent rmw.
>
> It's
> > certainly solvable by including a version vector, but I think there's
> > probably a simpler way to do it.  I'm not sure under what circumstances
> > there even *can* be racing writes to the metadata -- is the only case
> that
> > matters concurrent updates between the writer and a single replication
> > worker?
> >
>
> I don't think it can be addressed by a version vector, there is no public
> *history* about who changed what version.
> But the question is why do you need to address this #writeLedgerMetadata
> inside the ZooKeeperClient retry loop. I think there is already a logic on
> #writeLedgerMetadata to resolve conflicts due to BadVersion.
> The LedgerHandle will re-read the ledger metadata when encountering version
> conflicts.
>
>
That's the part I'm unsure of.  I know it has handling for BadVersion, I
don't know
whether it's still correct if you get a BadVersion even though your write
actually
succeeded, I need to look into that more.  Also, I'm not at all suggesting
we handle
this in the retry loop, I'm suggesting that for non-idempotent writes, we
should not
retry in ZooKeeperClient and instead propagate the ConnectionLoss error and
let the caller deal with it.
-Sam


>
> >
> > The outline is probably:
> > 1) Add a way to do retries automatically on *unconditional* creates and
> > deletes (there are some users that don't care).
> > 2) Audit and modify existing users to either use the methods introduced
> in
> > 1 or handle the ConnectionLoss events explicitly.
> > 3) Switch ZooKeeperClient to not retry conditional writes instead
> > propagating the ConnectionLoss event, that should make its semantics
> match
> > the vanilla ZK client.
> > -Sam
> >
> > On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <guosi...@gmail.com> wrote:
> >
> > > I agree in current callback model, we only propagate error code to the
> > > outer client, so we lose the information about the detail cause.
> > >
> > > But I think we also tried to map some zk error codes to bk error codes.
> > >
> > > NoNode -> NoLedger
> > > NodeExists -> LedgerExists
> > > ··· (other codes)
> > >
> > > Also we tried to hide zk metadata related implementation behind
> > interfaces,
> > > so some of the errors should be handled by the zk ledger manager
> > > implementation before propagating to the client.
> > >
> > > Sijie
> > >
> > > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
> > > enrico.olive...@diennea.com> wrote:
> > >
> > > > Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao Jujjuri
> ha
> > > > scritto:
> > > >
> > > > This is nothing different in any network based system. Like nfs. So
> we
> > > need
> > > > to have some kind of logic on the client side to make reasonable
> > > > assumption. May not be perfect for negative testing.
> > > >
> > > >
> > > > Many times I wanted to have some "exception cause" on BKException,
> > > > expecially for ZK issues.
> > > > The way we use only int error codes hides the root causes of the
> error.
> > > > BookKeeper client writes to the log, but the "cause" cannot be
> reported
> > > to
> > > > higher level logs and sometime this is annoying.
> > > > In the future I would like to add more details on errors
> > > >
> > > > -- Enrico
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > JV
> > > >
> > > > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosi...@gmail.com
> <mailto:
> > > guo
> > > > si...@gmail.com>> wrote:
> > > >
> > > >
> > > >
> > > > Hi Sam,
> > > >
> > > > Let's assume there is no such retry logic. How do you expect to
> handle
> > > this
> > > > situation?
> > > >
> > > > Can your application try to create a new ledger or catch NodeExists
> > > > exception?
> > > >
> > > > - Sijie
> > > >
> > > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sj...@salesforce.com
> > <mailto:s
> > > > j...@salesforce.com>> wrote:
> > > >
> > > >
> > > >
> > > > Hmm, curator seems to have essentially the same problem:
> > > > https://issues.apache.org/jira/browse/CURATOR-268
> > > > I'm not sure there's a good way to solve this transparently -- the
> > right
> > > > answer is
> > > > probably to plumb the ConnectionLoss event through ZooKeeperClient
> for
> > > > interested callers, audit for metadata users where we depend on
> > > >
> > > >
> > > > atomicity,
> > > >
> > > >
> > > > and update each one to handle it appropriately.
> > > > -Sam
> > > >
> > > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sj...@salesforce.com
> > <mailto:s
> > > > j...@salesforce.com>> wrote:
> > > >
> > > >
> > > >
> > > > BookKeeper has a wrapper class for the ZooKeeper client called
> > > > ZooKeeperClient.
> > > > Its purpose appears to be to transparently perform retries in the
> case
> > > >
> > > >
> > > > that
> > > >
> > > >
> > > > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect
> > > >
> > > >
> > > > event.
> > > >
> > > >
> > > >
> > > > The trouble is that it's possible that a write which received a
> > > > ConnectionLoss
> > > > error as a return value actually succeeded.  Once ZooKeeperClient
> > > >
> > > >
> > > > retries,
> > > >
> > > >
> > > > it'll
> > > > get back NodeExists and propagate that error to the caller, even
> though
> > > >
> > > >
> > > > the
> > > >
> > > >
> > > > write succeeded and the node in fact did not exist.
> > > >
> > > > It seems as though the same issue would hold for setData and delete
> > > >
> > > >
> > > >
> > > >
> > > > calls
> > > >
> > > >
> > > >
> > > >
> > > > which
> > > > use the version argument -- you could get a spurious BadVersion.
> > > >
> > > > I believe I've reproduced the variant with a spurious NodeExists.  It
> > > > manifested as a suprious BKLedgerExistException when running against
> a
> > > >
> > > >
> > > >
> > > >
> > > > 3
> > > >
> > > >
> > > >
> > > >
> > > > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
> > > > storage
> > > > to force Disconnect events by injecting write delays.  This seems to
> > > >
> > > >
> > > >
> > > >
> > > > make
> > > >
> > > >
> > > >
> > > >
> > > > sense
> > > > as AbstractZkLedgerManager.createLedgerMetadata uses
> > > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node
> and
> > > > appears
> > > > to depend on the create atomicity to avoid two writers overwriting
> each
> > > > other's
> > > > nodes.
> > > >
> > > > AbstractZkLedgerManager.writeLedger would seem to have the same
> > problem
> > > > with
> > > > its dependence on using setData with the version argument to avoid
> lost
> > > > updates.
> > > >
> > > > Am I missing something in this analysis?  It seems to me that
> behavior
> > > > could
> > > > be actually problematic during periods of spotty connectivity to the
> > > > ZooKeeper cluster.
> > > >
> > > > Thanks!
> > > > -Sam
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Enrico Olivelli Software Development Manager @Diennea Tel.: (+39)
> 0546
> > > > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews -
> > > > E-mail Marketing Solutions http://www.magnews.it Diennea - Digital
> > > > Marketing Solutions http://www.diennea.com
> > > >
> > > > ________________________________
> > > >
> > > > Iscriviti alla nostra newsletter per rimanere aggiornato su digital
> ed
> > > > email marketing! http://www.magnews.it/newsletter/
> > > >
> > > > The information in this email is confidential and may be legally
> > > > privileged. If you are not the intended recipient please notify the
> > > sender
> > > > immediately and destroy this email. Any unauthorized, direct or
> > indirect,
> > > > disclosure, copying, storage, distribution or other use is strictly
> > > > forbidden.
> > > >
> > >
> >
>

Re: Write retry issues with ZooKeeperClient

Reply via email to