I wanted to add 2 things.

  1.  Increasing to a 64-bit counter certainly solves the problem, but this 
might require conversion of zk data when the current counter is stored as 32-bit
  2.  A client that relies on a unique version that it uses as a reference 
outside of zk should verify that a version that it receives does not already 
exist outside zk. This applies even if 1. is considered, should a zk quorum be 
reset or lose its data.

Josef Roehrl​
FuseForward | Senior Architect - Professional Services
[https://fuseforward.atlassian.net/wiki/download/attachments/512327681/image001.png?version=1&modificationDate=1537397840684&cacheVersion=1&api=v2]
Website<https://fuseforward.com/?utm_source=Email%20Signature&utm_medium=email%20signature&utm_campaign=email%20signature>
 | 
Newsletter<https://fuseforward.com/subscribe-to-our-newsletter/?utm_source=Email%20Signature&utm_medium=Email%20Signature&utm_campaign=Email%20Signature>
 | Twitter<https://twitter.com/fuseforward> | 
LinkedIn<https://www.linkedin.com/company/fuseforward/?originalSubdomain=ca>

________________________________
From: Ted Dunning <ted.dunn...@gmail.com>
Sent: Thursday, June 15, 2023 6:53 PM
To: dev@zookeeper.apache.org <dev@zookeeper.apache.org>
Subject: Re: Name of sequence node is not unique

The invariant is that the value should be increasing except in failure
modes. You found a somewhat surprising failure mode.

Please compute how long it would take for a 64-bit counter to overflow if
incremented a million times per second. (hint, half a million years).
Remember that zk only does things at less than 100,000 per second

On Thu, Jun 15, 2023, 17:03 Li Wang <li4w...@gmail.com> wrote:

> Thanks a lot for your inputs, Ted.
>
> On Thu, Jun 15, 2023 at 2:52 PM Ted Dunning <ted.dunn...@gmail.com> wrote:
>
> > Breaking a semantic invariant is a dangerous solution here.
>
> Totally agree. We should not break a semantic invariant if there is one.
> What's the semantic invariant here and how ZK is supposed to behave in the
> overflow case?
>
>
> > It would be better to use a longer integer.
> >
> Yes, I thought about this too. Longer integer will overflow too, so the
> issue of not generating unique numbers will still exist.
>
> Best,
>
> Li
>
> >
> >
> >
> > On Thu, Jun 15, 2023, 13:35 Li Wang <li4w...@gmail.com> wrote:
> >
> > > Thanks for the response, Enrico.
> > >
> > > Please see comments below.
> > >
> > > On Thu, Jun 15, 2023 at 5:47 AM Enrico Olivelli <eolive...@gmail.com>
> > > wrote:
> > >
> > > > Li,
> > > > thanks for reporting your problem.
> > > >
> > > > Most likely you have found a bug.
> > > >
> > > > I have one question, related to your use case,
> > > > is the problem that the numbers are not "unique" or that the number
> is
> > > > not monotonically increasing ?
> > > >
> > >
> > > Technically speaking, monotonically increasing means either always
> > > increasing or *remaining constant. *With tha*t, * the problem is only
> > > the numbers are not "unique'. In this case, the parent cversion
> > > remains 2147483647
> > > after reaching Integer.MAX_VALUE, not unique any more.
> > >
> > > >
> > > > Do you have 2147483647 concurrent sessions and you found that two
> > > > sessions got the same sequenceId ?
> > > > or are you storing the sequenceId somewhere and you use it as a
> > > > globally unique id, not only among the connected sessions but also
> > > > among all the sessions that are ever connected to the cluster ?
> > > >
> > >
> > > In our use case, we create *persistent* *sequential* nodes. We store
> the
> > > sequence id in the client application and use it as a globally unique
> id.
> > > Currently Zookeeper guarantees the following non-overflow case but not
> > > after overflow.
> > >
> > > 1. Monotonically increasing
> > > 2. Uniqueness
> > > 3. Sequentially increase by 1
> > >
> > > For customers who have an overflow use case and can handle the sequence
> > > number cycling in a circular fashion, how about having a simple patch
> > > to support it and handle the overflow case better?  The change is
> adding
> > a
> > > condition to allow the wraparound when it flows to negative. We can
> also
> > > have a property to control whether to add the additional condition if
> > > needed.
> > >
> > > New
> > > ===
> > > if (parentCVersion > currentParentCVersion
> > >                 *|| parentCVersion == Integer.MIN_VALUE &&
> > > currentParentCVersion == Integer.MAX_VALUE) *{
> > >                 parent.stat.setCversion(parentCVersion);
> > >                 parent.stat.setPzxid(zxid);
> > >           }
> > >
> > > Current
> > > =====
> > > if (parentCVersion > parent.stat.getCversion()) {
> > >                 parent.stat.setCversion(parentCVersion);
> > >                 parent.stat.setPzxid(zxid);
> > >             }
> > >
> > > Please let me know what you think. Any input or feedback would be
> > > appreciated.
> > >
> > > Thanks,
> > >
> > > Li
> > >
> > >
> > > > Enrico
> > > >
> > > > Il giorno ven 9 giu 2023 alle ore 21:10 Li Wang <li4w...@gmail.com>
> ha
> > > > scritto:
> > > > >
> > > > > Hello,
> > > > >
> > > > > We are running 3.7.1 in production and running into an "issue" that
> > the
> > > > > names of sequence nodes are not unique after the counter hits the
> max
> > > int
> > > > > (i.e 2147483647) and overflows.  I would like to start a thread to
> > > > discuss
> > > > > the following
> > > > >
> > > > > 1. Is this a bug or "expected" behavior?
> > > > > 2. Is ZK supposed to support the overflow scenario and need to make
> > > sure
> > > > > the name is unique when overflow happens?
> > > > >
> > > > > The name is not unique after hitting the max int value because of
> we
> > > > > have the following in zk  code base:
> > > > >
> > > > > 1.  The cversion of parent znode is used to build the child name in
> > > > > PrepRequestProcessor
> > > > >
> > > > >         int parentCVersion = parentRecord.stat.getCversion();
> > > > >         if (createMode.isSequential()) {
> > > > >             path = path + String.format(Locale.ENGLISH, "%010d",
> > > > > parentCVersion);
> > > > >         }
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/
> > > > >
> java/org/apache/zookeeper/server/PrepRequestProcessor.java#L668-L671
> > > > >
> > > > >
> > > > > 2. The parent znode is read from either
> zks.outstandingChangesForPath
> > > map
> > > > > or zk database/datatree.
> > > > >
> > > > >            lastChange = zks.outstandingChangesForPath.get(path);
> > > > >             if (lastChange == null) {
> > > > >                 DataNode n = zks.getZKDatabase().getNode(path);
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/PrepRequestProcessor.java#L168-L170
> > > > >
> > > > >
> > > > >
> > > > > 3. The cversion of the parent node in outstandingChangesForPath map
> > is
> > > > > always updated  but not in zk database as we added the following
> code
> > > in
> > > > 3.6
> > > > >
> > > > >             if (parentCVersion > parent.stat.getCversion()) {
> > > > >                 parent.stat.setCversion(parentCVersion);
> > > > >                 parent.stat.setPzxid(zxid);
> > > > >             }
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java#L477-L480
> > > > >
> > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-3249
> > > > >
> > > > >
> > > > > When overflow happens, the new parentCversion is changed to
> > > -2147483648.
> > > > > It's updated in the outstandingChangesForPath map. It's not updated
> > in
> > > > > DataTree and the value stays as 2147483647  because -2147483648 is
> > less
> > > > > than 2147483647, so the cVerson is inconsistent in  ZK.
> > > > >
> > > > > Due to the inconsistent cVersion, when the next request comes in
> > after
> > > > > overflow, the sequence number is non-deterministic and not unique
> > > > depending
> > > > > on where the parent node is read from.  It can be 2147483647 if the
> > > > > parent node is read from DataTree or -2147483648,  -2147483647 and
> so
> > > on
> > > > if
> > > > > it's from the outstandingChangesForPath map.
> > > > >
> > > > > We have the following doc about unique naming but no info on
> > > "expected"
> > > > > behavior after overflow.
> > > > >
> > > > > Sequence Nodes -- Unique Naming
> > > > >
> > > > >
> > > > > When creating a znode you can also request that ZooKeeper append a
> > > > > monotonically increasing counter to the end of path. This counter
> is
> > > > unique
> > > > > to the parent znode. The counter has a format of %010d -- that is
> 10
> > > > digits
> > > > > with 0 (zero) padding (the counter is formatted in this way to
> > simplify
> > > > > sorting), i.e. "0000000001". See Queue Recipe for an example use of
> > > this
> > > > > feature. Note: the counter used to store the next sequence number
> is
> > a
> > > > > signed int (4bytes) maintained by the parent node, the counter will
> > > > > overflow when incremented beyond 2147483647 (resulting in a name
> > > > > "-2147483648").
> > > > >
> > > > >
> > > > >
> > > > > Please let me know if you have any comments or inputs.
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > >
> > > > > Li
> > > >
> > >
> >
>

Reply via email to