[ https://issues.apache.org/jira/browse/CURATOR-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jordan Zimmerman closed CURATOR-559. ------------------------------------ Resolution: Fixed > Inconsistent ZK timeouts > ------------------------ > > Key: CURATOR-559 > URL: https://issues.apache.org/jira/browse/CURATOR-559 > Project: Apache Curator > Issue Type: Bug > Components: Framework > Affects Versions: 4.2.0, 4.3.0 > Reporter: Grant Digby > Assignee: Jordan Zimmerman > Priority: Blocker > Fix For: 5.0.0, 4.3.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > I've configured a reasonable timeout using BoundedExponentialBackoffRetry, > and generally it works as I'd expect if ZK is down when I make a call like > "create.forPath". But if ZK is unavailable when I call acquire on an > InterProcessReadWriteLock, it takes far longer before it finally times out. > I call acquire which is wrapped in "RetryLoop.callWithRetry" and it goes onto > call findProtectedNodeInForeground which is also wrapped in > "RetryLoop.callWithRetry". If I've configured the > BoundedExponentialBackoffRetry to retry 20 times, the inner retry tries 20 > times for every one of the 20 outer retry loops, so it retries 400 times. > > This class recreates it, if you put break points at the commented sections > and bring ZK down you can see the different times until it disconnects and > the stack traces which I've included below. > > {code:java} > public class GoCurator { > public static void main(String[] args) throws Exception { > CuratorFramework cf = CuratorFrameworkFactory.newClient( > "localhost:2181", > new BoundedExponentialBackoffRetry(200, 10000, 20) > ); > cf.start(); > String root = "/myRoot"; > if(cf.checkExists().forPath(root) == null) { > // Stacktrace A showing what happens if ZK is down for this call > cf.create().forPath(root); > } > InterProcessReadWriteLock lcok = new InterProcessReadWriteLock(cf, > "/grant/myLock"); > // See stacktrace B showing the nested re-try if ZK is down for this call > lcok.readLock().acquire(); > lcok.readLock().release(); > System.out.println("done"); > } {code} > > Stacktrace A (if ZK is down when I'm calling create().forPath). This shows > the single retry loop so it exist after the correct number of attempts: > > {code:java} > java.lang.Thread.State: WAITING > at java.lang.Object.wait(Object.java:-1) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1499) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1487) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2617) > at > org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242) > at > org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231) > at > org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100) > at > org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228) > at > org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219) > at > org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41) > at com.gebatech.curator.GoCurator.main(GoCurator.java:25) {code} > Stacktrace B (if ZK is down when I call > InterProcessReadWriteLock#readLock#acquire). This shows the nested re-try > loop so it doesn't exit until 20*20 attempts. > > {code:java} > java.lang.Thread.State: WAITING > at sun.misc.Unsafe.park(Unsafe.java:-1) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:434) > at > org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:56) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100) > at > org.apache.curator.framework.imps.CreateBuilderImpl.findProtectedNodeInForeground(CreateBuilderImpl.java:1239) > at > org.apache.curator.framework.imps.CreateBuilderImpl.access$1700(CreateBuilderImpl.java:51) > at > org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1167) > at > org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156) > at > org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100) > at > org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153) > at > org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607) > at > org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597) > at > org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:575) > at > org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51) > at > org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver.createsTheLock(StandardLockInternalsDriver.java:54) > at > org.apache.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:225) > at > org.apache.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:237) > at > org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:89) > at com.gebatech.curator.GoCurator.main(GoCurator.java:29) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)