Re: Write retry issues with ZooKeeperClient

Sijie Guo Tue, 27 Jun 2017 16:09:25 -0700

On Tue, Jun 27, 2017 at 2:36 PM, Sam Just <sj...@salesforce.com> wrote:


> On Tue, Jun 27, 2017 at 2:29 PM, Sijie Guo <guosi...@gmail.com> wrote:
>
> > On Tue, Jun 27, 2017 at 10:18 AM, Sam Just <sj...@salesforce.com> wrote:
> >
> > > JV: What do you mean by "May not be perfect for negative testing"?
> > >
> > > I don't think there's anything inevitable about this particular class
> of
> > > behavior.  ZK could have chosen to avoid this problem entirely by doing
> > > duplicate op detection server-side with a per-session transaction log.
> > >
> > > Since it doesn't, we'll need to solve it ourselves.
> > > ZkLedgerUnderreplicationManager relies on either getting success or
> > > NodeExists on the ephemeral lock node to determine whether a particular
> > > ReplicationWorker is responsible for replicating a particular ledger.
> I
> > > haven't reproduced this one, but it seems to me that two workers could
> > both
> > > try to replicate the same ledger and *both get NodeExists* on the lock
> > > node.  This would leave the ledger locked for replication until
> whichever
> > > one actually wrote the node restarts.
> > >
> > > Users like the above are pretty easy to fix.  One option would be to
> > simply
> > > include with the write a payload including a nonce for the client.
> Upon
> > a
> > > ConnectionLoss event, we read the node and determine whether we
> "won".  I
> > > think ledger creation probably falls into the same category, the
> metadata
> > > could include an identifier for the creator.
> > >
> >
> > for ephemeral znode, you don't have to add extra payload.
> >
> > in the retry logic,
> >
> > - catch NodeExists exception
> > - call exists to check the znode. you can get the Stat (
> > https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/
> > zookeeper/data/Stat.html#getEphemeralOwner()
> > ) of this znode.
> > - you can compare the Stat#getEphmeralOwner with the client's current
> > session id. if they match, the node is created by this session, otherwise
> > this node is created by other session.
> >
> >
> Ah, I'll do that then, much easier.
>
>
> >
> > >
> > > I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.  It
> > > relies
> > > on setData success/BadVersion atomicity to ensure consistent rmw.
> >
> > It's
> > > certainly solvable by including a version vector, but I think there's
> > > probably a simpler way to do it.  I'm not sure under what circumstances
> > > there even *can* be racing writes to the metadata -- is the only case
> > that
> > > matters concurrent updates between the writer and a single replication
> > > worker?
> > >
> >
> > I don't think it can be addressed by a version vector, there is no public
> > *history* about who changed what version.
> > But the question is why do you need to address this #writeLedgerMetadata
> > inside the ZooKeeperClient retry loop. I think there is already a logic
> on
> > #writeLedgerMetadata to resolve conflicts due to BadVersion.
> > The LedgerHandle will re-read the ledger metadata when encountering
> version
> > conflicts.
> >
> >
> That's the part I'm unsure of.  I know it has handling for BadVersion, I
> don't know
> whether it's still correct if you get a BadVersion even though your write
> actually
> succeeded, I need to look into that more.  Also, I'm not at all suggesting
> we handle
> this in the retry loop, I'm suggesting that for non-idempotent writes, we
> should not
> retry in ZooKeeperClient and instead propagate the ConnectionLoss error and
> let the caller deal with it.
>

+1 I agree with no retrying non-idempotent writes.


> -Sam
>
>
> >
> > >
> > > The outline is probably:
> > > 1) Add a way to do retries automatically on *unconditional* creates and
> > > deletes (there are some users that don't care).
> > > 2) Audit and modify existing users to either use the methods introduced
> > in
> > > 1 or handle the ConnectionLoss events explicitly.
> > > 3) Switch ZooKeeperClient to not retry conditional writes instead
> > > propagating the ConnectionLoss event, that should make its semantics
> > match
> > > the vanilla ZK client.
> > > -Sam
> > >
> > > On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <guosi...@gmail.com> wrote:
> > >
> > > > I agree in current callback model, we only propagate error code to
> the
> > > > outer client, so we lose the information about the detail cause.
> > > >
> > > > But I think we also tried to map some zk error codes to bk error
> codes.
> > > >
> > > > NoNode -> NoLedger
> > > > NodeExists -> LedgerExists
> > > > ··· (other codes)
> > > >
> > > > Also we tried to hide zk metadata related implementation behind
> > > interfaces,
> > > > so some of the errors should be handled by the zk ledger manager
> > > > implementation before propagating to the client.
> > > >
> > > > Sijie
> > > >
> > > > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
> > > > enrico.olive...@diennea.com> wrote:
> > > >
> > > > > Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao
> Jujjuri
> > ha
> > > > > scritto:
> > > > >
> > > > > This is nothing different in any network based system. Like nfs. So
> > we
> > > > need
> > > > > to have some kind of logic on the client side to make reasonable
> > > > > assumption. May not be perfect for negative testing.
> > > > >
> > > > >
> > > > > Many times I wanted to have some "exception cause" on BKException,
> > > > > expecially for ZK issues.
> > > > > The way we use only int error codes hides the root causes of the
> > error.
> > > > > BookKeeper client writes to the log, but the "cause" cannot be
> > reported
> > > > to
> > > > > higher level logs and sometime this is annoying.
> > > > > In the future I would like to add more details on errors
> > > > >
> > > > > -- Enrico
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > JV
> > > > >
> > > > > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosi...@gmail.com
> > <mailto:
> > > > guo
> > > > > si...@gmail.com>> wrote:
> > > > >
> > > > >
> > > > >
> > > > > Hi Sam,
> > > > >
> > > > > Let's assume there is no such retry logic. How do you expect to
> > handle
> > > > this
> > > > > situation?
> > > > >
> > > > > Can your application try to create a new ledger or catch NodeExists
> > > > > exception?
> > > > >
> > > > > - Sijie
> > > > >
> > > > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sj...@salesforce.com
> > > <mailto:s
> > > > > j...@salesforce.com>> wrote:
> > > > >
> > > > >
> > > > >
> > > > > Hmm, curator seems to have essentially the same problem:
> > > > > https://issues.apache.org/jira/browse/CURATOR-268
> > > > > I'm not sure there's a good way to solve this transparently -- the
> > > right
> > > > > answer is
> > > > > probably to plumb the ConnectionLoss event through ZooKeeperClient
> > for
> > > > > interested callers, audit for metadata users where we depend on
> > > > >
> > > > >
> > > > > atomicity,
> > > > >
> > > > >
> > > > > and update each one to handle it appropriately.
> > > > > -Sam
> > > > >
> > > > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sj...@salesforce.com
> > > <mailto:s
> > > > > j...@salesforce.com>> wrote:
> > > > >
> > > > >
> > > > >
> > > > > BookKeeper has a wrapper class for the ZooKeeper client called
> > > > > ZooKeeperClient.
> > > > > Its purpose appears to be to transparently perform retries in the
> > case
> > > > >
> > > > >
> > > > > that
> > > > >
> > > > >
> > > > > ZooKeeper returns ConnectionLoss on an operation due to a
> Disconnect
> > > > >
> > > > >
> > > > > event.
> > > > >
> > > > >
> > > > >
> > > > > The trouble is that it's possible that a write which received a
> > > > > ConnectionLoss
> > > > > error as a return value actually succeeded.  Once ZooKeeperClient
> > > > >
> > > > >
> > > > > retries,
> > > > >
> > > > >
> > > > > it'll
> > > > > get back NodeExists and propagate that error to the caller, even
> > though
> > > > >
> > > > >
> > > > > the
> > > > >
> > > > >
> > > > > write succeeded and the node in fact did not exist.
> > > > >
> > > > > It seems as though the same issue would hold for setData and delete
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > calls
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > which
> > > > > use the version argument -- you could get a spurious BadVersion.
> > > > >
> > > > > I believe I've reproduced the variant with a spurious NodeExists.
> It
> > > > > manifested as a suprious BKLedgerExistException when running
> against
> > a
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 3
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > instance ZooKeeper cluster with dm-delay under the ZooKeeper
> instance
> > > > > storage
> > > > > to force Disconnect events by injecting write delays.  This seems
> to
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > make
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > sense
> > > > > as AbstractZkLedgerManager.createLedgerMetadata uses
> > > > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node
> > and
> > > > > appears
> > > > > to depend on the create atomicity to avoid two writers overwriting
> > each
> > > > > other's
> > > > > nodes.
> > > > >
> > > > > AbstractZkLedgerManager.writeLedger would seem to have the same
> > > problem
> > > > > with
> > > > > its dependence on using setData with the version argument to avoid
> > lost
> > > > > updates.
> > > > >
> > > > > Am I missing something in this analysis?  It seems to me that
> > behavior
> > > > > could
> > > > > be actually problematic during periods of spotty connectivity to
> the
> > > > > ZooKeeper cluster.
> > > > >
> > > > > Thanks!
> > > > > -Sam
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Enrico Olivelli Software Development Manager @Diennea Tel.: (+39)
> > 0546
> > > > > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA)
> MagNews -
> > > > > E-mail Marketing Solutions http://www.magnews.it Diennea - Digital
> > > > > Marketing Solutions http://www.diennea.com
> > > > >
> > > > > ________________________________
> > > > >
> > > > > Iscriviti alla nostra newsletter per rimanere aggiornato su digital
> > ed
> > > > > email marketing! http://www.magnews.it/newsletter/
> > > > >
> > > > > The information in this email is confidential and may be legally
> > > > > privileged. If you are not the intended recipient please notify the
> > > > sender
> > > > > immediately and destroy this email. Any unauthorized, direct or
> > > indirect,
> > > > > disclosure, copying, storage, distribution or other use is strictly
> > > > > forbidden.
> > > > >
> > > >
> > >
> >
>

Re: Write retry issues with ZooKeeperClient

Reply via email to