[ 
https://issues.apache.org/jira/browse/CURATOR-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan Zimmerman resolved CURATOR-559.
--------------------------------------
    Resolution: Fixed

For Curator 5.0.0 TestThreadLocalRetryLoop now uses a foreground Curator 
operation so that the tests are reliable.

> Inconsistent ZK timeouts
> ------------------------
>
>                 Key: CURATOR-559
>                 URL: https://issues.apache.org/jira/browse/CURATOR-559
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 4.2.0, 4.3.0
>            Reporter: Grant Digby
>            Assignee: Jordan Zimmerman
>            Priority: Blocker
>             Fix For: 5.0.0, 4.3.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> I've configured a reasonable timeout using BoundedExponentialBackoffRetry, 
> and generally it works as I'd expect if ZK is down when I make a call like 
> "create.forPath". But if ZK is unavailable when I call acquire on an 
> InterProcessReadWriteLock, it takes far longer before it finally times out.
> I call acquire which is wrapped in "RetryLoop.callWithRetry" and it goes onto 
> call findProtectedNodeInForeground which is also wrapped in 
> "RetryLoop.callWithRetry". If I've configured the 
> BoundedExponentialBackoffRetry to retry 20 times, the inner retry tries 20 
> times for every one of the 20 outer retry loops, so it retries 400 times.
>  
> This class recreates it, if you put break points at the commented sections 
> and bring ZK down you can see the different times until it disconnects and 
> the stack traces which I've included below.
>  
> {code:java}
> public class GoCurator {
> public static void main(String[] args) throws Exception {
>     CuratorFramework cf = CuratorFrameworkFactory.newClient(
>             "localhost:2181",
>             new BoundedExponentialBackoffRetry(200, 10000, 20)
>     );
>     cf.start();
>     String root = "/myRoot";
>     if(cf.checkExists().forPath(root) == null) {
>         // Stacktrace A showing what happens if ZK is down for this call
>         cf.create().forPath(root);
>     }
>     InterProcessReadWriteLock lcok = new InterProcessReadWriteLock(cf, 
> "/grant/myLock");
>     // See stacktrace B showing the nested re-try if ZK is down for this call
>     lcok.readLock().acquire();
>     lcok.readLock().release();
>     System.out.println("done");
> } {code}
>  
> Stacktrace A (if ZK is down when I'm calling create().forPath). This shows 
> the single retry loop so it exist after the correct number of attempts:
>  
> {code:java}
>  java.lang.Thread.State: WAITING
>   at java.lang.Object.wait(Object.java:-1)
>   at java.lang.Object.wait(Object.java:502)
>   at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1499)
>   at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1487)
>   at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2617)
>   at 
> org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242)
>   at 
> org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231)
>   at 
> org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
>   at 
> org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228)
>   at 
> org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219)
>   at 
> org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41)
>   at com.gebatech.curator.GoCurator.main(GoCurator.java:25) {code}
> Stacktrace B (if ZK is down when I call 
> InterProcessReadWriteLock#readLock#acquire). This shows the nested re-try 
> loop so it doesn't exit until 20*20 attempts.
>  
> {code:java}
>  java.lang.Thread.State: WAITING
>   at sun.misc.Unsafe.park(Unsafe.java:-1)
>   at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>   at 
> org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:434)
>   at 
> org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:56)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
>   at 
> org.apache.curator.framework.imps.CreateBuilderImpl.findProtectedNodeInForeground(CreateBuilderImpl.java:1239)
>   at 
> org.apache.curator.framework.imps.CreateBuilderImpl.access$1700(CreateBuilderImpl.java:51)
>   at 
> org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1167)
>   at 
> org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156)
>   at 
> org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
>   at 
> org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153)
>   at 
> org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607)
>   at 
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597)
>   at 
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:575)
>   at 
> org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51)
>   at 
> org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver.createsTheLock(StandardLockInternalsDriver.java:54)
>   at 
> org.apache.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:225)
>   at 
> org.apache.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:237)
>   at 
> org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:89)
>   at com.gebatech.curator.GoCurator.main(GoCurator.java:29) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to