Hi all, I took a thread dump to check where the thread is stack when CuratorFrameworkImpl.close() -> EnsembleTracker.close() -> Watchers removal... is invoked in this specific case (ZK server down). Basically the close() is blocked waiting that watches removal is done in foreground, BUT this can NOT happen because the ZK server is down. In 5.7.1 this was really much faster, so I assume this was maybe done in background or in a different manner. I have seen that there has been some relevant changes in WatcherRemoval classes. Could you help me to debug this problem?
"main" #1 prio=5 os_prio=0 cpu=703,45ms elapsed=11,04s tid=0x00007f1144019dd0 nid=0x229807 waiting on condition [0x00007f1149dfb000] java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@17.0.14/Native Method) - parking to wait for <0x00000005b4258b38> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.14/LockSupport.java:252) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.14/AbstractQueuedSynchronizer.java:717) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@17.0.14/AbstractQueuedSynchronizer.java:1074) at java.util.concurrent.CountDownLatch.await(java.base@17.0.14/CountDownLatch.java:276) at org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:417) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:80) at org.apache.curator.framework.imps.RemoveWatchesBuilderImpl.pathInForeground(RemoveWatchesBuilderImpl.java:232) at org.apache.curator.framework.imps.RemoveWatchesBuilderImpl.internalRemoval(RemoveWatchesBuilderImpl.java:86) at org.apache.curator.framework.imps.WatcherRemovalManager.removeWatchers(WatcherRemovalManager.java:59) at org.apache.curator.framework.imps.WatcherRemovalFacade.removeWatchers(WatcherRemovalFacade.java:54) at org.apache.curator.framework.imps.EnsembleTracker.close(EnsembleTracker.java:101) at org.apache.curator.framework.imps.CuratorFrameworkImpl.close(CuratorFrameworkImpl.java:382) at com.cheva.grantor.CuratorCloseSlow.tesCuratorCloseSlow(CuratorCloseSlow.java:28) Regards, EVaristo En domingo, 9 de marzo de 2025, 08:31:44 CET, Evaristo José Camarero <evaristojo...@yahoo.es> escribió: Hi all, Hi took recent 5.8.0 release and some project tests were running really slow compared with 5.7.1 I took a closer look and CuratorFramework.close method is really slow when ZK server is stop. I have included a test that makes reproduction easy I am running Manjaro with OpenJDK 17 When test is running with Curator 5.7.1 closing Curator instance takes 1200 millisWhen test is running with Curator 5.8.0 closing Curator instance takes 20000 millis Looks to me there is something wrong here, BUT wanted to double check with you. Best regards, Cheva package com.cheva.grantor; import static java.util.concurrent.TimeUnit.SECONDS;import static org.junit.jupiter.api.Assertions.assertTrue; import java.time.Duration;import java.time.Instant; import org.apache.curator.framework.CuratorFramework;import org.apache.curator.framework.CuratorFrameworkFactory;import org.apache.curator.retry.RetryOneTime;import org.apache.curator.test.BaseClassForTests;import org.junit.jupiter.api.Test; class CuratorCloseSlow extends BaseClassForTests { @Test void tesCuratorCloseSlow() throws Exception { Instant t0; try (CuratorFramework cf = CuratorFrameworkFactory.newClient(server.getConnectString(), new RetryOneTime(1_000))) { cf.start(); assertTrue(cf.blockUntilConnected(2, SECONDS)); cf.create().forPath("/jejeje"); server.stop(); Thread.sleep(100L); t0 = Instant.now(); } Instant t1 = Instant.now(); long closeDurationMillis = Duration.between(t0, t1).toMillis(); System.out.println("Close Duration took " + closeDurationMillis + " secs"); assertTrue(closeDurationMillis < 2_000L); }}