[jira] [Created] (ZOOKEEPER-4884) FastLeaderElection WorkerSender/WorkerReceiver don't need to be Thread

2024-10-28 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4884:
-

 Summary: FastLeaderElection WorkerSender/WorkerReceiver don't need 
to be Thread
 Key: ZOOKEEPER-4884
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4884
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Kezhu Wang


ZOOKEEPER-1810 replaced them with dedicated threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4883) Rollover leader epoch when counter part of zxid reach limit

2024-10-27 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4883:
-

 Summary: Rollover leader epoch when counter part of zxid reach 
limit
 Key: ZOOKEEPER-4883
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4883
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Kezhu Wang


Currently, zxid rollover will cause re-election(ZOOKEEPER-1277) which is time 
consuming.

ZOOKEEPER-2789 proposes to use 24 bits for epoch and 40 bits for counter. I do 
think it is promising as [it promotes rollover rate from 49.7 days to 34.9 
years assuming 1k/s 
ops|https://github.com/apache/zookeeper/pull/2164#issuecomment-2368107479].

But I think it is a one-way ticket. And the change of data format may require 
community wide spread to upgrade third party libraries/tools if they are ever 
tied to this. Inside ZooKeeper, `accepetedEpoch` and `currentEpoch` are tied to 
`zxid`. Given a snapshot and a txn log, we need probably deduced those two 
epoch values to join quorum.

So, I presents alternative solution to rollover leader epoch when counter part 
of zxid reach limit.

# Treats last proposal of an epoch as rollover proposal.
# Requests from next epoch are proposed normally.
# Fences next epoch once rollover proposal persisted.
# Proposals from next epoch will not be written to disk before rollover 
committed.
# Leader commits rollover proposal once it get quorum ACKs.
# Blocked new epoch proposals are logged once rollover proposal is committed in 
corresponding nodes.
 
This results in:

# No other lead cloud lead using next epoch number once rollover proposal is 
considered committed.
# No proposals from next epoch will be written to disk before rollover proposal 
is considered committed.

Here is the branch, I will draft a pr later.

https://github.com/kezhuw/zookeeper/tree/zxid-rollover



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4882) Data loss after restarting an node experienced temporary disk error and rejoin

2024-10-26 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4882:
-

 Summary: Data loss after restarting an node experienced temporary 
disk error and rejoin
 Key: ZOOKEEPER-4882
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4882
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.9.3, 3.8.4
Reporter: Kezhu Wang


The cause is multifold:
1. Leader will commit a proposal once quorum acked.
2. Proposal is able to be committed in node's memory even if it has not
   been written to that node's disk.
3. In case of disk error, the txn log could lag behind memory database.

The above applies to both leader and follower. I have not verified leader 
branch, let's consider only follower for now.

f4. A follower experienced temporary disk error will have hole in txn log
   after re-join.
f5. Restarted follower will lose the data. Worse, it is able to win
   election and propagate data loss to whole cluster.

I authored commits in my repo to expose this.

https://github.com/kezhuw/zookeeper/commits/data-loss-temporary-sync-disk-error/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4881) Support non blocking ZooKeeper::close

2024-10-23 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4881:
-

 Summary: Support non blocking ZooKeeper::close
 Key: ZOOKEEPER-4881
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4881
 Project: ZooKeeper
  Issue Type: New Feature
Reporter: Kezhu Wang


{{ZooKeeper::close}} is synchronous, it waits until {{OpCode.closeSession}} 
returned. I think it would be useful to support background closing to avoid 
block caller, just like `close(2)` for socket. We probably could use 
{{ZooKeeperBuilder}} from ZOOKEEPER-4697 to manually enable it. This way we 
would not surprise anyone just like `SO_LINGER` for socket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4875) Support pre-constructed ZKConfig in server side

2024-10-14 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4875:
-

 Summary: Support pre-constructed ZKConfig in server side
 Key: ZOOKEEPER-4875
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4875
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Kezhu Wang
Assignee: Kezhu Wang


Currently, `ZKConfig` is only constructed right before its usage. It makes it 
hard to run multiple zookeeper servers in one JVM but with different 
configurations. Though, we don't officially claim support to that, but I think 
it should be a good to not have such a ban in our side. Also, accepting 
pre-constructed `ZKConfig` clould also benefit tests to not mess up properties 
between client and server.

See also https://github.com/apache/zookeeper/pull/2200#discussion_r1800328858



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4870) Proactive leadership transfer

2024-10-09 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4870:
-

 Summary: Proactive leadership transfer
 Key: ZOOKEEPER-4870
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4870
 Project: ZooKeeper
  Issue Type: New Feature
  Components: java client, server
Reporter: Kezhu Wang


We do have leadership transfer, but it only happen when we are removing leader 
in reconfiguration. It would be nice to support it with dedicated API. This way 
it will be really useful to reduce unavailability during rolling upgrade or 
leader shutdown.

Also, I think it cloud also help zxid rollover. Inheriting leadership in 
rollover should be similar to leadership transfer in protocol.

https://www.usenix.org/conference/atc12/technical-sessions/presentation/shraer

{quote}
we investigate the effect of
reconfigurations removing the leader. Note that a server
can never be added to a cluster as leader as we always
prioritize the current leader. Figure 8 shows the advan-
tage of designating a new leader when removing the cur-
rent one, and thus avoiding leader election. It depicts
the average time to recover from a leader crash versus
the average time to regain system availability following
the removal of the leader. The average is taken on 10
executions. We can see that designating a default leader
saves up to 1sec, depending on the cluster size. As cluster
size increases, leader election takes longer while using a
default leader takes constant time regardless of the clus-
ter size. Nevertheless, as the figure shows, cluster size
always affects total leader recovery time, as it includes
synchronizing state with a quorum of followers.
{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4859) C client tests hang to be cancelled quite often

2024-09-10 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4859:
-

 Summary: C client tests hang to be cancelled quite often
 Key: ZOOKEEPER-4859
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4859
 Project: ZooKeeper
  Issue Type: Test
  Components: c client, tests
Reporter: Kezhu Wang
Assignee: Kezhu Wang


CPPUNIT tests runs sequentially.  By comparing logs with successful run, I 
think it could be {{Zookeeper_readOnly::testReadOnlyWithSSL}}.

Logs from failed run.
{noformat}
 [exec] Zookeeper_multi::testWatch : elapsed 2005 : OK
 [exec] Zookeeper_multi::testSequentialNodeCreateInAsyncMulti : elapsed 
2001 : OK
 [exec] Zookeeper_multi::testBigAsyncMulti : elapsed 3003 : OK
 [exec] Zookeeper_operations::testAsyncWatcher1 : elapsed 54 : OK
 [exec] Zookeeper_operations::testAsyncGetOperation : elapsed 54 : OK
 [exec] Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : 
elapsed 382 : OK
 [exec] Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : 
elapsed 0 : OK
 [exec] Zookeeper_operations::testConcurrentOperations1 : elapsed 21127 : OK
 [exec] Zookeeper_readOnly::testReadOnly ZooKeeper server started ZooKeeper 
server started : elapsed 12214 : OK
{noformat}


Logs from successful run.
{noformat}
 [exec] Zookeeper_multi::testCheck : elapsed 1007 : OK
 [exec] Zookeeper_multi::testWatch : elapsed 2006 : OK
 [exec] Zookeeper_multi::testSequentialNodeCreateInAsyncMulti : elapsed 
2001 : OK
 [exec] Zookeeper_multi::testBigAsyncMulti : elapsed 3003 : OK
 [exec] Zookeeper_operations::testAsyncWatcher1 : elapsed 54 : OK
 [exec] Zookeeper_operations::testAsyncGetOperation : elapsed 54 : OK
 [exec] Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : 
elapsed 387 : OK
 [exec] Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : 
elapsed 0 : OK
 [exec] Zookeeper_operations::testConcurrentOperations1 : elapsed 22459 : OK
 [exec] Zookeeper_readOnly::testReadOnly ZooKeeper server started ZooKeeper 
server started : elapsed 15515 : OK
 [exec] Zookeeper_readOnly::testReadOnlyWithSSL ZooKeeper server started 
ZooKeeper server started : elapsed 14865 : OK
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4853) erroneous assertion in ZooKeeperQuotaTest#testQuota

2024-08-25 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4853:
-

 Summary: erroneous assertion in ZooKeeperQuotaTest#testQuota
 Key: ZOOKEEPER-4853
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4853
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Affects Versions: 3.9.2
Reporter: Kezhu Wang
 Fix For: 3.10.0


Created for https://github.com/apache/zookeeper/pull/2169.

{code:java}
assertNotNull(server.getZKDatabase().getDataTree().getMaxPrefixWithQuota(path) 
!= null, "Quota is still set");
{code}

The above equation test evaluates to a non null boolean value, so always pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4848) Possible stack overflow in setup_random

2024-08-06 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4848:
-

 Summary: Possible stack overflow in setup_random
 Key: ZOOKEEPER-4848
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4848
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client
Affects Versions: 3.9.2, 3.8.4
Reporter: Kezhu Wang


Created for https://github.com/apache/zookeeper/pull/2097.

{code:c}
int seed_len = 0;
/* Enter a loop to fill in seed with random data from /dev/urandom.
 * This is done in a loop so that we can safely handle short reads
 * which can happen due to signal interruptions.
 */
while (seed_len < sizeof(seed)) {
/* Assert we either read something or we were interrupted due to a
 * signal (errno == EINTR) in which case we need to retry.
 */
int rc = read(fd, &seed + seed_len, sizeof(seed) - seed_len);
assert(rc > 0 || errno == EINTR);
if (rc > 0) {
seed_len += rc;
}
}
{code}

Above code will overflow {{seed}} in case of a short read.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4833) Bring back ARM64 to Github Actions using macos-latest

2024-05-22 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4833:
-

 Summary: Bring back ARM64 to Github Actions using macos-latest
 Key: ZOOKEEPER-4833
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4833
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Reporter: Kezhu Wang


ARM64 ci was added in ZOOKEEPER-3919 but lost in ZOOKEEPER-4642 as a 
consequence to migration to github actions. Now github actions has runner 
[{{macos-latest}} on 
{{M1}}|https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories].
 We can bring it back. I think it is valuable for ZooKeeper as we have 
{{zookeeper-client-c}}. 






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4821) ConnectRequest got NOTREADONLY ReplyHeader

2024-03-28 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4821:
-

 Summary: ConnectRequest got NOTREADONLY ReplyHeader
 Key: ZOOKEEPER-4821
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4821
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.9.2, 3.8.4
Reporter: Kezhu Wang


I would expect {{ConnectRequest}} has two kinds of response in normal 
conditions: {{ConnectResponse}} and socket close. But if sever was configured 
with {{readonlymode.enabled}} but not {{localSessionsEnabled}}, then client 
could get {{NOTREADONLY}} in reply to {{ConnectRequest}}. I saw, at least, no 
handling in java client. And, I encountered this in writing tests for rust 
client.

It guess it is not by design. And we probably could close the socket in early 
phase. But also, it could be solved in client sides as 
{{sizeof(ConnectResponse)}} is larger than {{sizeof(ReplyHeader)}}. Then, we 
gain ability to carry error for {{ConnectRequest}} while {{ConnectResponse}} 
does not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4819) Can't seek for writable tls server if connected to readonly server

2024-03-26 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4819:
-

 Summary: Can't seek for writable tls server if connected to 
readonly server
 Key: ZOOKEEPER-4819
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4819
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.9.2, 3.8.4
Reporter: Kezhu Wang


{{[ClientCnxn::pingRwServer|https://github.com/apache/zookeeper/blob/d12aba599233b0fcba0b9b945ed3d2f45d4016f0/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L1280]}}
 uses raw socket to issue "isro" 4lw command. This results in unsuccessful 
handshake to tls server.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4750) RequestPathMetricsCollector does not align with FinalRequestProcessor

2023-09-30 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4750:
-

 Summary: RequestPathMetricsCollector does not align with 
FinalRequestProcessor
 Key: ZOOKEEPER-4750
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4750
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.9.0
Reporter: Kezhu Wang


For example, it does not handle {{createTTL}}.

{noformat}
2023-09-30 17:46:59,212 [myid:] - ERROR 
[SyncThread:0:o.a.z.s.u.RequestPathMetricsCollector@216] - We should not handle 
21
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4749) Request timeout is not respected for asynchronous api

2023-09-27 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4749:
-

 Summary: Request timeout is not respected for asynchronous api
 Key: ZOOKEEPER-4749
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4749
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.9.0
Reporter: Kezhu Wang


"zookeeper.request.timeout" is only consulted in synchronous code path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4747) Java api lacks synchronous version of sync() call

2023-09-24 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4747:
-

 Summary: Java api lacks synchronous version of sync() call
 Key: ZOOKEEPER-4747
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4747
 Project: ZooKeeper
  Issue Type: New Feature
  Components: java client
Reporter: Kezhu Wang
Assignee: Kezhu Wang
 Fix For: 3.10.0


Ideally, it should be redundant just as what [~breed] says in ZOOKEEPER-1167.

{quote}
it wasn't an oversight. there is no reason for a synchronous version. because 
of the ordering guarantees, if you issue an asynchronous sync, the next call, 
whether synchronous or asynchronous will see the updated state.
{quote}

But in case of connection loss and absent of ZOOKEEPER-22, client has to check 
result of asynchronous sync before next call. So, currently, we can't simply 
issue an fire-and-forget asynchronous sync and an read to gain strong 
consistent. Then in a synchronous call chain, client has to convert 
asynchronous {{sync}} to synchronous to gain strong consistent. This is what I 
do in 
[EagerACLFilterTest::syncClient|https://github.com/apache/zookeeper/blob/f42c01de73867ffbc12707b3e9f9cd7f847fe462/zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/EagerACLFilterTest.java#L98],
 it is apparently unfriendly to end users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4746) cppunit tests hang and cancelled

2023-09-20 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4746:
-

 Summary: cppunit tests hang and cancelled
 Key: ZOOKEEPER-4746
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4746
 Project: ZooKeeper
  Issue Type: Test
  Components: tests
Affects Versions: 3.10.0
Reporter: Kezhu Wang


* https://github.com/apache/zookeeper/actions/runs/6007712384/job/16337953123
* https://github.com/apache/zookeeper/actions/runs/6047057349/job/16409786315
* https://github.com/apache/zookeeper/actions/runs/6195151365/job/16819317479
* https://github.com/apache/zookeeper/actions/runs/6196548582/job/16823409398

Hang too long to be cancelled by runner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4745) End to End tests fail occasionally

2023-09-20 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4745:
-

 Summary: End to End tests fail occasionally
 Key: ZOOKEEPER-4745
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4745
 Project: ZooKeeper
  Issue Type: Test
  Components: tests
Affects Versions: 3.10.0
Reporter: Kezhu Wang


I saw:
* https://github.com/apache/zookeeper/actions/runs/5587157838/job/15131211778
* https://github.com/kezhuw/zookeeper/actions/runs/5251205631/job/14209201285
* 
https://github.com/kezhuw/zookeeper/actions/runs/6198985701/job/16830576384#step:9:38
* 
https://github.com/apache/zookeeper/actions/runs/6244974218/job/16952757583#step:11:44

{noformat}
2023-07-18 12:08:34,046 [myid:] - ERROR [main:o.a.z.u.ServiceUtils@48] - 
Exiting JVM with code 1
ZooKeeper JMX enabled by default
Using config: 
/home/runner/work/zookeeper/zookeeper/apache-zookeeper-3.7.0-bin/bin/../conf/zoo_sample.cfg
Stopping zookeeper ... STOPPED
Traceback (most recent call last):
  File "/home/runner/work/zookeeper/zookeeper/tools/ci/test-connectivity.py", 
line 48, in 
subprocess.run([f'{client_binpath}', 'sync', '/'], check=True)
  File "/usr/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 
'['/home/runner/work/zookeeper/zookeeper/bin/zkCli.sh', 'sync', '/']' returned 
non-zero exit status 1.
Error: Process completed with exit code 1.
{noformat}

I guess it could cause by asynchronous start.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4742) Config watch path get truncated abnormally for chroot "/zoo" or alikes

2023-09-15 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4742:
-

 Summary: Config watch path get truncated abnormally for chroot 
"/zoo" or alikes
 Key: ZOOKEEPER-4742
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4742
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.8.2, 3.9.0, 3.7.1
Reporter: Kezhu Wang
Assignee: Kezhu Wang


This is a leftover of ZOOKEEPER-4565 and splitted from 
[pr#1996|https://github.com/apache/zookeeper/pull/1996] to make ZOOKEEPER-4601 
concentrate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4704) Flaky test: ReconfigRollingRestartCompatibilityTest.testRollingRestartWithExtendedMembershipConfig

2023-06-10 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4704:
-

 Summary: Flaky test: 
ReconfigRollingRestartCompatibilityTest.testRollingRestartWithExtendedMembershipConfig
 Key: ZOOKEEPER-4704
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4704
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Affects Versions: 3.8.1, 3.7.1
Reporter: Kezhu Wang


https://github.com/apache/zookeeper/actions/runs/5229706434/jobs/9442845028#step:7:619

{quote}
 org.opentest4j.AssertionFailedError: waiting for server 2 being up ==> 
expected:  but was: 
at 
org.apache.zookeeper.server.quorum.ReconfigRollingRestartCompatibilityTest.testRollingRestartWithExtendedMembershipConfig(ReconfigRollingRestartCompatibilityTest.java:263)
{quote}

Same commit pass in my fork 
https://github.com/kezhuw/zookeeper/actions/runs/5229684461.

And an independent report appears in 
[ZOOKEEPER-4628|https://issues.apache.org/jira/browse/ZOOKEEPER-4628?focusedCommentId=17632044&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17632044],
 unfortunate, the run cache has been cleared.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4702) State change events during close is indeterminate

2023-06-01 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4702:
-

 Summary: State change events during close is indeterminate
 Key: ZOOKEEPER-4702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4702
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Kezhu Wang


Placing {{sleep(100)}} before {{disconnect}} in {{ClientCnxn::close}} could 
easily reproduce this. {{KeeperState.Disconnected}} is not always issued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4698) Persistent watch events lost after reconnection

2023-05-22 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4698:
-

 Summary: Persistent watch events lost after reconnection
 Key: ZOOKEEPER-4698
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4698
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.8.1, 3.7.1
Reporter: Kezhu Wang


I found this in reply to [apache#1950 
(comment)|https://github.com/apache/zookeeper/pull/1950#issuecomment-1553742525].
 But it turns out a known issue [apache#1106 
(comment)|https://github.com/apache/zookeeper/pull/1106#issuecomment-543860329].

I think it is worth to note separately in jira for potential future discussions 
and fix. I have pushed a [test 
case|https://github.com/kezhuw/zookeeper/commit/31d89e9829380559066fc2b83e3d38462380c5d4]
 for this. It fails as expected.

{noformat}
[ERROR] Failures: 
[ERROR]   WatchEventWhenAutoResetTest.testPersistentRecursiveWatch:237 do not 
receive a NodeDataChanged ==> expected: not 
[ERROR]   WatchEventWhenAutoResetTest.testPersistentWatch:211 do not receive a 
NodeDataChanged ==> expected: not 
{noformat}

It is hard to fix this with sole {{DataTree}}. Two independent comments 
[pointed|https://github.com/apache/zookeeper/pull/1106#issuecomment-1366449561] 
[out|https://github.com/kezhuw/zookeeper/commit/31d89e9829380559066fc2b83e3d38462380c5d4#diff-cfd09b7021c88da6631872e8a4a271f830162f7c5a63a140839ba029048493fdR227-R230]
 this. I guess we have to walk through txn log to deliver a correct fix. 

{quote}
Watches will not be received while disconnected from a server. When a client 
reconnects, any previously registered watches will be reregistered and 
triggered if needed. In general this all occurs transparently. There is one 
case where a watch may be missed: a watch for the existence of a znode not yet 
created will be missed if the znode is created and deleted while disconnected.
{quote}

This is [what our programer's guide 
says|https://zookeeper.apache.org/doc/r3.8.1/zookeeperProgrammers.html#ch_zkWatches].
 It is well-known, at least for me, that we can lose some transiently 
intermediate events in reconnection. But in case of persistent watch, we can 
lose more. This forces clients to rebuild their knowledge on reconnection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4697) Add Builder to construct ZooKeeper and possible descendants

2023-05-19 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4697:
-

 Summary: Add Builder to construct ZooKeeper and possible 
descendants
 Key: ZOOKEEPER-4697
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4697
 Project: ZooKeeper
  Issue Type: New Feature
  Components: java client
Reporter: Kezhu Wang


We have 10 constructor variants for {{ZooKeeper}} and 4 for {{ZooKeeperAdmin}} 
now. It is enough for us to resort to a builder or others.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4695) Forbid multiple mutations of one key in multi

2023-05-16 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4695:
-

 Summary: Forbid multiple mutations of one key in multi
 Key: ZOOKEEPER-4695
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4695
 Project: ZooKeeper
  Issue Type: Wish
  Components: c client, java client, server
Reporter: Kezhu Wang


I guess this might be part of ZOOKEEPER-1289, but let me list this as separated 
issue for now till we got a solution for ZOOKEEPER-1289 or other committers 
close this as "Duplicate".

Currently, when there are multiple mutations for one key in multi, there will 
be multiple watch events delivered for that key. Let's assume {{delete}} and 
{{create}} for key "/foo" in a {{multi}} operation. Client will receive two 
events with {{NodeDeleted}} and {{NodeCreated}}, then client will expose a 
state that "/foo" does not exist. But in normal reads, we should never 
encounter such a state as {{multi}} should behave atomic.

It is absolutely a breaking change as there is new failure path or code in 
client.

I think there are alternatives. One should be merging multiple mutations to one 
in server side, may be solely for watch events. I guess it might be rare for 
clients to depend on concrete watch types for changes. But I think this 
approach might be relatively hard.

References:
 * Etcd rejects multiple mutations for one key in there txns. See 
[etcd-io/etcd#4363|https://github.com/etcd-io/etcd/pull/4363] and 
[etcd-io/etcd#4376|https://github.com/etcd-io/etcd/pull/4376]
 * ZOOKEEPER-4655([#1950|https://github.com/apache/zookeeper/pull/1950]) 
proposed {{WatchedEvent.zxid}} to carry the {{zxid}} triggering the delivering 
event. I think this issue undermine that proposal.
* The discovery process and reason I open this issue. 
[#1950|https://github.com/apache/zookeeper/pull/1950#issuecomment-1544436457]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4680) OpCode.check is treated as a quorum write operation white it is not

2023-03-06 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4680:
-

 Summary: OpCode.check is treated as a quorum write operation white 
it is not
 Key: ZOOKEEPER-4680
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4680
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.8.1, 3.7.2
Reporter: Kezhu Wang
Assignee: Kezhu Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4670) Export ZooKeeper server from QuorumPeerMain, ZooKeeperServerMain and ZooKeeperServerEmbedded

2023-02-05 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4670:
-

 Summary: Export ZooKeeper server from QuorumPeerMain, 
ZooKeeperServerMain and ZooKeeperServerEmbedded
 Key: ZOOKEEPER-4670
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4670
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Kezhu Wang


This allows clients to do inspections without reflections.

See also CURATOR-535 and https://github.com/apache/curator/pull/421.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4667) DataTree.processTxn ignores `OpCode.create2` in `OpCode.multi`

2023-01-28 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4667:
-

 Summary: DataTree.processTxn ignores `OpCode.create2` in 
`OpCode.multi`
 Key: ZOOKEEPER-4667
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4667
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.7.1, 3.8.0, 3.9.0
Reporter: Kezhu Wang


Official Java client does not use {{{}OpCode.create2{}}}, but I think ZooKeeper 
is intending to support {{OpCode.create2}} nested in {{OpCode.multi}} in server 
side as {{FinalRequestProcessor}} handled it.

See also ZOOKEEPER-1297.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4625) No reliable way to remove watch without interfering others on same paths

2022-10-19 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4625:
-

 Summary: No reliable way to remove watch without interfering 
others on same paths
 Key: ZOOKEEPER-4625
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4625
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.8.0
Reporter: Kezhu Wang


It is possible that one node path could be watched more than once by different 
watchers reusing same ZooKeeper session. ZOOKEEPER-1910 reported this, but 
resorted to "checkWatches" to circumvent this.

I think it might be possible do some tracking works in client to support 
"removeWatches" without fearing client usages.

Here are some links that lead to this issue:
* CURATOR-654: DistributedBarrier watcher leak and its 
[pr|https://github.com/apache/curator/pull/435]
* [Why removeWatches sends OpCode.checkWatches to the 
server?|https://lists.apache.org/thread/0kcnklcxs0s5656c1sbh3crgdodbb0qg] from 
mailing list.
* [Drop for watcher|https://github.com/kezhuw/zookeeper-client-rust/issues/2] 
from an rust implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4601) Define getConfig Watcher behavior in chroot ZooKeeper

2022-07-26 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4601:
-

 Summary: Define getConfig Watcher behavior in chroot ZooKeeper
 Key: ZOOKEEPER-4601
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4601
 Project: ZooKeeper
  Issue Type: Improvement
  Components: java client
Reporter: Kezhu Wang


After ZOOKEEPER-4565, {{getConfig}} watcher will receive path for zookeeper 
config node "/zookeeper/config". But if the path {{getConfig}} watcher received 
is somewhat sensitive to chroot path.

* With chroot path "/zookeeper", {{getConfig}} will receive path "/config".
* With other chroot paths, {{getConfig}} will receive path "/zookeeper/config".

I think we should define what path {{getConfig}} will get formally to avoid 
unnoticed behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4569) Xid out of caused by early return from eager acl check

2022-07-08 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4569:
-

 Summary: Xid out of caused by early return from eager acl check
 Key: ZOOKEEPER-4569
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4569
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.7.1, 3.8.0, 3.6.3
Reporter: Kezhu Wang


{{ClientCnxn}} 
[enfores|https://github.com/apache/zookeeper/blob/de7c5869d372e46af43979134d0e30b49d2319b1/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java#L917]
 ordered replies from server. This assumes that requests on server are go 
through same process pipeline without early return. ZOOKEEPER-3418 breaks this 
assumption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ZOOKEEPER-4548) Negative ZooKeeperServer#getInProcess

2022-05-24 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4548:
-

 Summary: Negative ZooKeeperServer#getInProcess
 Key: ZOOKEEPER-4548
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4548
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Reporter: Kezhu Wang


{{ZooKeeperServer#submitRequestNow}} passes request to 
{{firstProcessor.processRequest}} before {{incInProcess}}. This could make 
{{getInProcess}} negative after asynchronous {{decInProcess}} if there is long 
pause between {{firstProcessor.processRequest}} and {{incInProcess}} due to 
potential heavy loading.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ZOOKEEPER-4512) Flaky test: QuorumPeerMainTest#testLeaderOutOfView

2022-04-05 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4512:
-

 Summary: Flaky test: QuorumPeerMainTest#testLeaderOutOfView
 Key: ZOOKEEPER-4512
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4512
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Affects Versions: 3.8.0, 3.7.1
Reporter: Kezhu Wang


I saw two assertion failures.

{code:none}
org.opentest4j.AssertionFailedError: Corrupt peer should join quorum with 
servers having same server configuration ==> expected:  but was: 
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderOutOfView(QuorumPeerMainTest.java:904)
{code}
* https://github.com/apache/zookeeper/runs/5770639335?check_suite_focus=true
* https://github.com/apache/zookeeper/runs/5622644945?check_suite_focus=true

{code:none}
 org.opentest4j.AssertionFailedError: expected:  but was: 
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderOutOfView(QuorumPeerMainTest.java:881)
{code}
* https://github.com/apache/zookeeper/runs/5759448665?check_suite_focus=true
* 
https://ci-hadoop.apache.org/blue/organizations/jenkins/zookeeper-precommit-github-pr/detail/PR-1852/1/pipeline/#step-35-log-741



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4511) Flaky test: FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics

2022-04-05 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4511:
-

 Summary: Flaky test: 
FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics
 Key: ZOOKEEPER-4511
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4511
 Project: ZooKeeper
  Issue Type: Bug
  Components: tests
Reporter: Kezhu Wang


* https://github.com/kezhuw/zookeeper/runs/5830250287?check_suite_focus=true
* https://github.com/apache/zookeeper/runs/5759834147?check_suite_focus=true


{code:none}
org.opentest4j.AssertionFailedError: expected: <1> but was: <0>
at 
org.apache.zookeeper.server.persistence.FileTxnSnapLogMetricsTest.testFileTxnSnapLogMetrics(FileTxnSnapLogMetricsTest.java:86)
{code}

This test tries to write some txns to trigger snapshot with remaning txns in 
txn log. But snapshot taking is asynchronous, thus all txns could be written to 
snapshot. So in restarting, it is possible that no txns to load after snapshot 
restored. This will fail assertion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4508) ZooKeeper client run to endless loop in ClientCnxn.SendThread.run if all server down

2022-04-01 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4508:
-

 Summary: ZooKeeper client run to endless loop in 
ClientCnxn.SendThread.run if all server down
 Key: ZOOKEEPER-4508
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4508
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.8.0, 3.7.0, 3.6.3
Reporter: Kezhu Wang


The observable behavior is that client will not get expired event from watcher. 
The cause if twofold:
1. `updateLastSendAndHeard` is called in reconnection so the session timeout 
don't decrease.
2. There is not break out from `ClientCnxn.SendThread.run` after session 
timeout.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4475) Persistent recursive watcher got NodeChildrenChanged event

2022-02-22 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4475:
-

 Summary: Persistent recursive watcher got NodeChildrenChanged event
 Key: ZOOKEEPER-4475
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4475
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Kezhu Wang


Currently, {{NodeChildrenChanged}} events are sent to all persistent watchers 
unconditionally in client. This requires server to never deliver 
{{NodeChildrenChanged}} for node's descendants. It can not be true. So we need 
to filter {{NodeChildrenChanged}} for persistent recursive watchers.

Splits from ZOOKEEPER-4466.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4474) ZooDefs.opNames is unused

2022-02-21 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4474:
-

 Summary: ZooDefs.opNames is unused
 Key: ZOOKEEPER-4474
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4474
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Kezhu Wang


It is public but what is the supposed use case ? Is {{Request.op2String}} 
sufficient ? It is not sync with {{ZooDefs.OpCode}}, you can't index it by 
{{OpCode}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4472) Support persistent watchers removing individually

2022-02-20 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4472:
-

 Summary: Support persistent watchers removing individually
 Key: ZOOKEEPER-4472
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4472
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.7.0, 3.6.3
Reporter: Kezhu Wang


Persistent watchers could only be removed with {{WatcherType.Any}} now. I think 
it is meaningful to remove them individually as they are by naming persistent 
and will not auto removed in server side.

Together with proposed solution from [ZOOKEEPER-4466], it will be clear that 
ZooKeeper has four kind of watchers:
# Standard data watcher(which includes data and exist watcher in client side).
# Standard child watcher.
# Persistent node watcher(aka. data and child watcher for node).
# Persistent recursive watcher(aka. data watcher for node and its descendants).

See also [ZOOKEEPER-4471]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4471) Remove WatcherType.Children break persistent watcher's child events

2022-02-20 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4471:
-

 Summary: Remove WatcherType.Children break persistent watcher's 
child events
 Key: ZOOKEEPER-4471
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4471
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.7.0, 3.6.3
Reporter: Kezhu Wang


{{AddWatchMode.PERSISTENT}} was divided as data and child watch in server side. 
When remove {{WatcherType.Children}}, child part of {{AddWatchMode.PERSISTENT}} 
is removed but not its data part. This could introduce trick usage of 
persistent data watch while there is no official api for this. It is better 
forbid this by dedicate {{WatcherType.Children}} to standard child watch only.

I 
[commits|https://github.com/kezhuw/zookeeper/commit/f7a996646074114830bdc2361e8ff679d08c00bc]
 a modified {{RemoveWatchesTest.testRemoveAllChildWatchesOnAPath}} in my local 
repo to reproduce this.

I think it is better to support {{removeWatches}} for two persistent watchers 
too. But it might be a separate issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4467) Missing op code (addWatch) in Request.op2String

2022-02-10 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4467:
-

 Summary: Missing op code (addWatch) in Request.op2String
 Key: ZOOKEEPER-4467
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4467
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Kezhu Wang


{noformat}
Processing request:: sessionid:0x100095a44b2 type:unknown 106 cxid:0x7 
zxid:0xfffe txntype:unknown reqpath:/abc
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ZOOKEEPER-4466) Watchers of different modes interfere on overlapping pathes

2022-02-10 Thread Kezhu Wang (Jira)
Kezhu Wang created ZOOKEEPER-4466:
-

 Summary: Watchers of different modes interfere on overlapping 
pathes
 Key: ZOOKEEPER-4466
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4466
 Project: ZooKeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.6.3, 3.7, 3.6.4
Reporter: Kezhu Wang


I used to think watchers of different modes are orthogonal. I found there are 
not, when I wrote tests for unfinished rust client. And I wrote [test 
cases|https://github.com/kezhuw/zookeeper/commit/79b05a95d2669a4acd16a4d544f24e2083a264f2#diff-8d31d27ea951fbc1f4fbda48d45748318f7124502839d825b77ad3fb8551bf43L152]
 in java and confirmed.

I copied test case here for evaluation. You also clone from [my 
fork|https://github.com/kezhuw/zookeeper/tree/watch-overlapping-path-with-different-modes-test-case].

{code:java}
// 
zookeeper-server/src/test/java/org/apache/zookeeper/test/PersistentRecursiveWatcherTest.java

@Test
public void testPathOverlapWithStandardWatcher() throws Exception {
try (ZooKeeper zk = createClient(new CountdownWatcher(), hostPort)) {
CountDownLatch nodeCreated = new CountDownLatch(1);
zk.addWatch("/a", persistentWatcher, PERSISTENT_RECURSIVE);
zk.exists("/a", event -> nodeCreated.countDown());

zk.create("/a", new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, 
CreateMode.PERSISTENT);
zk.create("/a/b", new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, 
CreateMode.PERSISTENT);
zk.delete("/a/b", -1);
zk.delete("/a", -1);

assertEvent(events, Watcher.Event.EventType.NodeCreated, "/a");
assertEvent(events, Watcher.Event.EventType.NodeCreated, "/a/b");
assertEvent(events, Watcher.Event.EventType.NodeDeleted, "/a/b");
assertEvent(events, Watcher.Event.EventType.NodeDeleted, "/a");

assertTrue(nodeCreated.await(5, TimeUnit.SECONDS));
}
}

@Test
public void testPathOverlapWithPersistentWatcher() throws Exception {
try (ZooKeeper zk = createClient(new CountdownWatcher(), hostPort)) {
zk.addWatch("/a", persistentWatcher, PERSISTENT_RECURSIVE);
zk.addWatch("/a/b", event -> {}, PERSISTENT);
zk.create("/a", new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, 
CreateMode.PERSISTENT);
zk.create("/a/b", new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, 
CreateMode.PERSISTENT);
zk.create("/a/b/c", new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, 
CreateMode.PERSISTENT);
zk.delete("/a/b/c", -1);
zk.delete("/a/b", -1);
zk.delete("/a", -1);
assertEvent(events, Watcher.Event.EventType.NodeCreated, "/a");
assertEvent(events, Watcher.Event.EventType.NodeCreated, "/a/b");
assertEvent(events, Watcher.Event.EventType.NodeCreated, "/a/b/c");
assertEvent(events, Watcher.Event.EventType.NodeDeleted, "/a/b/c");
assertEvent(events, Watcher.Event.EventType.NodeDeleted, "/a/b");
assertEvent(events, Watcher.Event.EventType.NodeDeleted, "/a");
}
}
{code}

I skimmed the code and found two possible causes:
# {{ZKWatchManager.materialize}} materializes all persistent watchers(include 
recursive ones) for {{NodeChildrenChanged}} event.
# {{WatcherModeManager}} trackes only one watcher mode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)