Re: Write retry issues with ZooKeeperClient

Sam Just Tue, 27 Jun 2017 10:19:25 -0700

JV: What do you mean by "May not be perfect for negative testing"?


I don't think there's anything inevitable about this particular class of
behavior.  ZK could have chosen to avoid this problem entirely by doing
duplicate op detection server-side with a per-session transaction log.

Since it doesn't, we'll need to solve it ourselves.
ZkLedgerUnderreplicationManager relies on either getting success or
NodeExists on the ephemeral lock node to determine whether a particular
ReplicationWorker is responsible for replicating a particular ledger.  I
haven't reproduced this one, but it seems to me that two workers could both
try to replicate the same ledger and *both get NodeExists* on the lock
node.  This would leave the ledger locked for replication until whichever
one actually wrote the node restarts.

Users like the above are pretty easy to fix.  One option would be to simply
include with the write a payload including a nonce for the client.  Upon a
ConnectionLoss event, we read the node and determine whether we "won".  I
think ledger creation probably falls into the same category, the metadata
could include an identifier for the creator.

I'm less sure about AbstractZkLedgerManager.writeLedgerMetadata.  It relies
on setData success/BadVersion atomicity to ensure consistent rmw.  It's
certainly solvable by including a version vector, but I think there's
probably a simpler way to do it.  I'm not sure under what circumstances
there even *can* be racing writes to the metadata -- is the only case that
matters concurrent updates between the writer and a single replication
worker?

The outline is probably:
1) Add a way to do retries automatically on *unconditional* creates and
deletes (there are some users that don't care).
2) Audit and modify existing users to either use the methods introduced in
1 or handle the ConnectionLoss events explicitly.
3) Switch ZooKeeperClient to not retry conditional writes instead
propagating the ConnectionLoss event, that should make its semantics match
the vanilla ZK client.
-Sam

On Tue, Jun 27, 2017 at 8:59 AM, Sijie Guo <guosi...@gmail.com> wrote:

> I agree in current callback model, we only propagate error code to the
> outer client, so we lose the information about the detail cause.
>
> But I think we also tried to map some zk error codes to bk error codes.
>
> NoNode -> NoLedger
> NodeExists -> LedgerExists
> ··· (other codes)
>
> Also we tried to hide zk metadata related implementation behind interfaces,
> so some of the errors should be handled by the zk ledger manager
> implementation before propagating to the client.
>
> Sijie
>
> On Jun 27, 2017 7:32 AM, "Enrico Olivelli - Diennea" <
> enrico.olive...@diennea.com> wrote:
>
> > Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao Jujjuri ha
> > scritto:
> >
> > This is nothing different in any network based system. Like nfs. So we
> need
> > to have some kind of logic on the client side to make reasonable
> > assumption. May not be perfect for negative testing.
> >
> >
> > Many times I wanted to have some "exception cause" on BKException,
> > expecially for ZK issues.
> > The way we use only int error codes hides the root causes of the error.
> > BookKeeper client writes to the log, but the "cause" cannot be reported
> to
> > higher level logs and sometime this is annoying.
> > In the future I would like to add more details on errors
> >
> > -- Enrico
> >
> >
> >
> >
> >
> > JV
> >
> > On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo <guosi...@gmail.com<mailto:
> guo
> > si...@gmail.com>> wrote:
> >
> >
> >
> > Hi Sam,
> >
> > Let's assume there is no such retry logic. How do you expect to handle
> this
> > situation?
> >
> > Can your application try to create a new ledger or catch NodeExists
> > exception?
> >
> > - Sijie
> >
> > On Mon, Jun 26, 2017 at 5:49 PM, Sam Just <sj...@salesforce.com<mailto:s
> > j...@salesforce.com>> wrote:
> >
> >
> >
> > Hmm, curator seems to have essentially the same problem:
> > https://issues.apache.org/jira/browse/CURATOR-268
> > I'm not sure there's a good way to solve this transparently -- the right
> > answer is
> > probably to plumb the ConnectionLoss event through ZooKeeperClient for
> > interested callers, audit for metadata users where we depend on
> >
> >
> > atomicity,
> >
> >
> > and update each one to handle it appropriately.
> > -Sam
> >
> > On Mon, Jun 26, 2017 at 4:34 PM, Sam Just <sj...@salesforce.com<mailto:s
> > j...@salesforce.com>> wrote:
> >
> >
> >
> > BookKeeper has a wrapper class for the ZooKeeper client called
> > ZooKeeperClient.
> > Its purpose appears to be to transparently perform retries in the case
> >
> >
> > that
> >
> >
> > ZooKeeper returns ConnectionLoss on an operation due to a Disconnect
> >
> >
> > event.
> >
> >
> >
> > The trouble is that it's possible that a write which received a
> > ConnectionLoss
> > error as a return value actually succeeded.  Once ZooKeeperClient
> >
> >
> > retries,
> >
> >
> > it'll
> > get back NodeExists and propagate that error to the caller, even though
> >
> >
> > the
> >
> >
> > write succeeded and the node in fact did not exist.
> >
> > It seems as though the same issue would hold for setData and delete
> >
> >
> >
> >
> > calls
> >
> >
> >
> >
> > which
> > use the version argument -- you could get a spurious BadVersion.
> >
> > I believe I've reproduced the variant with a spurious NodeExists.  It
> > manifested as a suprious BKLedgerExistException when running against a
> >
> >
> >
> >
> > 3
> >
> >
> >
> >
> > instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
> > storage
> > to force Disconnect events by injecting write delays.  This seems to
> >
> >
> >
> >
> > make
> >
> >
> >
> >
> > sense
> > as AbstractZkLedgerManager.createLedgerMetadata uses
> > ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
> > appears
> > to depend on the create atomicity to avoid two writers overwriting each
> > other's
> > nodes.
> >
> > AbstractZkLedgerManager.writeLedger would seem to have the same problem
> > with
> > its dependence on using setData with the version argument to avoid lost
> > updates.
> >
> > Am I missing something in this analysis?  It seems to me that behavior
> > could
> > be actually problematic during periods of spotty connectivity to the
> > ZooKeeper cluster.
> >
> > Thanks!
> > -Sam
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546
> > 066100 - Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews -
> > E-mail Marketing Solutions http://www.magnews.it Diennea - Digital
> > Marketing Solutions http://www.diennea.com
> >
> > ________________________________
> >
> > Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed
> > email marketing! http://www.magnews.it/newsletter/
> >
> > The information in this email is confidential and may be legally
> > privileged. If you are not the intended recipient please notify the
> sender
> > immediately and destroy this email. Any unauthorized, direct or indirect,
> > disclosure, copying, storage, distribution or other use is strictly
> > forbidden.
> >
>

Re: Write retry issues with ZooKeeperClient

Reply via email to