[jira] [Updated] (ACCUMULO-1449) Connector/ZooCache code enters infinite loop when Zookeeper connection lost.

John Vines (JIRA) Fri, 25 Oct 2013 19:01:03 -0700

     [ 
https://issues.apache.org/jira/browse/ACCUMULO-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


John Vines updated ACCUMULO-1449:
---------------------------------

    Fix Version/s:     (was: 1.5.1)
                       (was: 1.6.0)
                   1.7.0

> Connector/ZooCache code enters infinite loop when Zookeeper connection lost.
> ----------------------------------------------------------------------------
>
>                 Key: ACCUMULO-1449
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1449
>             Project: Accumulo
>          Issue Type: Sub-task
>          Components: client
>    Affects Versions: 1.5.0
>         Environment: accumulo-1.5.0-RC4, zookeeper-3.4.5, hadoop-1.0.4, 
> CentOS 6.4
>            Reporter: Luke Brassard
>             Fix For: 1.7.0
>
>
> While using 1.5.0-RC4 a long-lived {{Connector}} went into an infinite loop 
> of Zookeeper "ConnectionLoss" and "Session expired" failures. In a 
> multithreaded application, all using the same {{Connector}}, there were 
> errors whenever there were calls to {{conn.createScanner()}} and 
> {{conn.createBatchScanner()}}. Here are a couple stacktraces:
> {code}
> 013-05-22 09:12:28,250 [zookeeper.ZooCache] WARN : Zookeeper error, will retry
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /accumulo/5e982cc9-6959-4064-9712-2ff3dc1003d8
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>       at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:208)
>       at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:130)
>       at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:233)
>       at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:188)
>       at 
> org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:151)
>       at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:24)
>       at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:46)
>       at 
> org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78)
>       at 
> org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64)
>       at 
> org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75)
>       at 
> org.apache.accumulo.core.client.impl.ConnectorImpl.createScanner(ConnectorImpl.java:137)
> {code}    
> {code}    
> 2013-05-22 09:12:23,849 [zookeeper.ZooCache] WARN : Zookeeper error, will 
> retry
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /accumulo/5e982cc9-6959-4064-9712-2ff3dc1003d8
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>       at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:208)
>       at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:130)
>       at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:233)
>       at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:188)
>       at 
> org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:151)
>       at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:24)
>       at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:46)
>       at 
> org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78)
>       at 
> org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64)
>       at 
> org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75)
>       at 
> org.apache.accumulo.core.client.impl.ConnectorImpl.createBatchScanner(ConnectorImpl.java:89)
> {code}
> The method {{ZooCache.retry(ZooRunnable op)}} (ZooCache.java:128) has a 
> {{while(true)}} loop that should probably have a max retries or timeout that 
> will eventually cause the method to throw an exception that can be handled 
> appropriately by the client. As it is currently, this loop will never be 
> exited when Zookeeper continues to error.
> Note: There may have been a network hiccup that triggered the bug, but there 
> was no way to recover without restarting the application.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (ACCUMULO-1449) Connector/ZooCache code enters infinite loop when Zookeeper connection lost.

Reply via email to