Re: Write retry issues with ZooKeeperClient

Sijie Guo Tue, 27 Jun 2017 14:29:50 -0700

On Tue, Jun 27, 2017 at 10:18 AM, Sam Just <sj...@salesforce.com> wrote:


> JV: What do you mean by "May not be perfect for negative testing"?
>
> I don't think there's anything inevitable about this particular class of
> behavior.  ZK could have chosen to avoid this problem entirely by doing
> duplicate op detection server-side with a per-session transaction log.
>
> Since it doesn't, we'll need to solve it ourselves.
> ZkLedgerUnderreplicationManager relies on either getting success or
> NodeExists on the ephemeral lock node to determine whether a particular
> ReplicationWorker is responsible for replicating a particular ledger.  I
> haven't reproduced this one, but it seems to me that two workers could both
> try to replicate the same ledger and *both get NodeExists* on the lock
> node.  This would leave the ledger locked for replication until whichever
> one actually wrote the node restarts.
>
> Users like the above are pretty easy to fix.  One option would be to simply
> include with the write a payload including a nonce for the client.  Upon a
> ConnectionLoss event, we read the node and determine whether we "won".  I
> think ledger creation probably falls into the same category, the metadata
> could include an identifier for the creator.
>

for ephemeral znode, you don't have to add extra payload.

in the retry logic,

- catch NodeExists exception
- call exists to check the znode. you can get the Stat (
https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/zookeeper/data/Stat.html#getEphemeralOwner()
) of this znode.
- you can compare the Stat#getEphmeralOwner with the client's current
session id. if they match, the node is created by this session, otherwise
this node is created by other session.


>
> I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.  It
> relies
> on setData success/BadVersion atomicity to ensure consistent rmw.

It's
> certainly solvable by including a version vector, but I think there's
> probably a simpler way to do it.  I'm not sure under what circumstances
> there even *can* be racing writes to the metadata -- is the only case that
> matters concurrent updates between the writer and a single replication
> worker?
>

I don't think it can be addressed by a version vector, there is no public
*history* about who changed what version.
But the question is why do you need to address this #writeLedgerMetadata
inside the ZooKeeperClient retry loop. I think there is already a logic on
#writeLedgerMetadata to resolve conflicts due to BadVersion.
The LedgerHandle will re-read the ledger metadata when encountering version
conflicts.


>
> The outline is probably:
> 1) Add a way to do retries automatically on *unconditional* creates and
> deletes (there are some users that don't care).
> 2) Audit and modify existing users to either use the methods introduced in
> 1 or handle the ConnectionLoss events explicitly.
> 3) Switch ZooKeeperClient to not retry conditional writes instead
> propagating the ConnectionLoss event, that should make its semantics match
> the vanilla ZK client.
> -Sam
>
> On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <guosi...@gmail.com> wrote:
>
> > I agree in current callback model, we only propagate error code to the
> > outer client, so we lose the information about the detail cause.
> >
> > But I think we also tried to map some zk error codes to bk error codes.
> >
> > NoNode -> NoLedger
> > NodeExists -> LedgerExists
> > ··· (other codes)
> >
> > Also we tried to hide zk metadata related implementation behind
> interfaces,
> > so some of the errors should be handled by the zk ledger manager
> > implementation before propagating to the client.
> >
> > Sijie
> >
> > On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
> > enrico.olive...@diennea.com> wrote:
> >
> > > Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao Jujjuri ha
> > > scritto:
> > >
> > > This is nothing different in any network based system. Like nfs. So we
> > need
> > > to have some kind of logic on the client side to make reasonable
> > > assumption. May not be perfect for negative testing.
> > >
> > >
> > > Many times I wanted to have some "exception cause" on BKException,
> > > expecially for ZK issues.
> > > The way we use only int error codes hides the root causes of the error.
> > > BookKeeper client writes to the log, but the "cause" cannot be reported
> > to
> > > higher level logs and sometime this is annoying.
> > > In the future I would like to add more details on errors
> > >
> > > -- Enrico
> > >
> > >
> > >
> > >
> > >
> > > JV
> > >
> > > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosi...@gmail.com<mailto:
> > guo
> > > si...@gmail.com>> wrote:
> > >
> > >
> > >
> > > Hi Sam,
> > >
> > > Let's assume there is no such retry logic. How do you expect to handle
> > this
> > > situation?
> > >
> > > Can your application try to create a new ledger or catch NodeExists
> > > exception?
> > >
> > > - Sijie
> > >
> > > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sj...@salesforce.com
> <mailto:s
> > > j...@salesforce.com>> wrote:
> > >
> > >
> > >
> > > Hmm, curator seems to have essentially the same problem:
> > > https://issues.apache.org/jira/browse/CURATOR-268
> > > I'm not sure there's a good way to solve this transparently -- the
> right
> > > answer is
> > > probably to plumb the ConnectionLoss event through ZooKeeperClient for
> > > interested callers, audit for metadata users where we depend on
> > >
> > >
> > > atomicity,
> > >
> > >
> > > and update each one to handle it appropriately.
> > > -Sam
> > >
> > > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sj...@salesforce.com
> <mailto:s
> > > j...@salesforce.com>> wrote:
> > >
> > >
> > >
> > > BookKeeper has a wrapper class for the ZooKeeper client called
> > > ZooKeeperClient.
> > > Its purpose appears to be to transparently perform retries in the case
> > >
> > >
> > > that
> > >
> > >
> > > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect
> > >
> > >
> > > event.
> > >
> > >
> > >
> > > The trouble is that it's possible that a write which received a
> > > ConnectionLoss
> > > error as a return value actually succeeded.  Once ZooKeeperClient
> > >
> > >
> > > retries,
> > >
> > >
> > > it'll
> > > get back NodeExists and propagate that error to the caller, even though
> > >
> > >
> > > the
> > >
> > >
> > > write succeeded and the node in fact did not exist.
> > >
> > > It seems as though the same issue would hold for setData and delete
> > >
> > >
> > >
> > >
> > > calls
> > >
> > >
> > >
> > >
> > > which
> > > use the version argument -- you could get a spurious BadVersion.
> > >
> > > I believe I've reproduced the variant with a spurious NodeExists.  It
> > > manifested as a suprious BKLedgerExistException when running against a
> > >
> > >
> > >
> > >
> > > 3
> > >
> > >
> > >
> > >
> > > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
> > > storage
> > > to force Disconnect events by injecting write delays.  This seems to
> > >
> > >
> > >
> > >
> > > make
> > >
> > >
> > >
> > >
> > > sense
> > > as AbstractZkLedgerManager.createLedgerMetadata uses
> > > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
> > > appears
> > > to depend on the create atomicity to avoid two writers overwriting each
> > > other's
> > > nodes.
> > >
> > > AbstractZkLedgerManager.writeLedger would seem to have the same
> problem
> > > with
> > > its dependence on using setData with the version argument to avoid lost
> > > updates.
> > >
> > > Am I missing something in this analysis?  It seems to me that behavior
> > > could
> > > be actually problematic during periods of spotty connectivity to the
> > > ZooKeeper cluster.
> > >
> > > Thanks!
> > > -Sam
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546
> > > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews -
> > > E-mail Marketing Solutions http://www.magnews.it Diennea - Digital
> > > Marketing Solutions http://www.diennea.com
> > >
> > > ________________________________
> > >
> > > Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
> > > email marketing! http://www.magnews.it/newsletter/
> > >
> > > The information in this email is confidential and may be legally
> > > privileged. If you are not the intended recipient please notify the
> > sender
> > > immediately and destroy this email. Any unauthorized, direct or
> indirect,
> > > disclosure, copying, storage, distribution or other use is strictly
> > > forbidden.
> > >
> >
>

Re: Write retry issues with ZooKeeperClient

Reply via email to