Re: Write retry issues with ZooKeeperClient

Enrico Olivelli - Diennea Tue, 27 Jun 2017 07:32:33 -0700

Il giorno mar, 27/06/2017 alle 13.45 +0000, Venkateswara Rao Jujjuri ha scritto:


This is nothing different in any network based system. Like nfs. So we need
to have some kind of logic on the client side to make reasonable
assumption. May not be perfect for negative testing.


Many times I wanted to have some "exception cause" on BKException, expecially 
for ZK issues.
The way we use only int error codes hides the root causes of the error.
BookKeeper client writes to the log, but the "cause" cannot be reported to 
higher level logs and sometime this is annoying.
In the future I would like to add more details on errors

-- Enrico





JV

On Mon, Jun 26, 2017 at 11:19 PM Sijie Guo 
<guosi...@gmail.com<mailto:guosi...@gmail.com>> wrote:



Hi Sam,

Let's assume there is no such retry logic. How do you expect to handle this
situation?

Can your application try to create a new ledger or catch NodeExists
exception?

- Sijie

On Mon, Jun 26, 2017 at 5:49 PM, Sam Just 
<sj...@salesforce.com<mailto:sj...@salesforce.com>> wrote:



Hmm, curator seems to have essentially the same problem:
https://issues.apache.org/jira/browse/CURATOR-268
I'm not sure there's a good way to solve this transparently -- the right
answer is
probably to plumb the ConnectionLoss event through ZooKeeperClient for
interested callers, audit for metadata users where we depend on


atomicity,


and update each one to handle it appropriately.
-Sam

On Mon, Jun 26, 2017 at 4:34 PM, Sam Just 
<sj...@salesforce.com<mailto:sj...@salesforce.com>> wrote:



BookKeeper has a wrapper class for the ZooKeeper client called
ZooKeeperClient.
Its purpose appears to be to transparently perform retries in the case


that


ZooKeeper returns ConnectionLoss on an operation due to a Disconnect


event.



The trouble is that it's possible that a write which received a
ConnectionLoss
error as a return value actually succeeded.  Once ZooKeeperClient


retries,


it'll
get back NodeExists and propagate that error to the caller, even though


the


write succeeded and the node in fact did not exist.

It seems as though the same issue would hold for setData and delete




calls




which
use the version argument -- you could get a spurious BadVersion.

I believe I've reproduced the variant with a spurious NodeExists.  It
manifested as a suprious BKLedgerExistException when running against a




3




instance ZooKeeper cluster with dm-delay under the ZooKeeper instance
storage
to force Disconnect events by injecting write delays.  This seems to




make




sense
as AbstractZkLedgerManager.createLedgerMetadata uses
ZkUtils.asyncCreateFullPathOptimistic to create the metadata node and
appears
to depend on the create atomicity to avoid two writers overwriting each
other's
nodes.

AbstractZkLedgerManager.writeLedger would seem to have the same problem
with
its dependence on using setData with the version argument to avoid lost
updates.

Am I missing something in this analysis?  It seems to me that behavior
could
be actually problematic during periods of spotty connectivity to the
ZooKeeper cluster.

Thanks!
-Sam









--

Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546 066100 - 
Int. 925 Viale G.Marconi 30/14 - 48018 Faenza (RA) MagNews - E-mail Marketing 
Solutions http://www.magnews.it Diennea - Digital Marketing Solutions 
http://www.diennea.com

________________________________

Iscriviti alla nostra newsletter per rimanere aggiornato su digital ed email 
marketing! http://www.magnews.it/newsletter/

The information in this email is confidential and may be legally privileged. If 
you are not the intended recipient please notify the sender immediately and 
destroy this email. Any unauthorized, direct or indirect, disclosure, copying, 
storage, distribution or other use is strictly forbidden.

Re: Write retry issues with ZooKeeperClient

Reply via email to