Luke Brassard created ACCUMULO-1449: ---------------------------------------
Summary: Connector/ZooCache code enters infinite loop when Zookeeper connection lost. Key: ACCUMULO-1449 URL: https://issues.apache.org/jira/browse/ACCUMULO-1449 Project: Accumulo Issue Type: Bug Components: client Affects Versions: 1.5.1 Environment: accumulo-1.5.0-RC4, zookeeper-3.4.5, hadoop-1.0.4, CentOS 6.4 Reporter: Luke Brassard While using 1.5.0-RC4 a long-lived {{Connector}} went into an infinite loop of Zookeeper "ConnectionLoss" and "Session expired" failures. In a multithreaded application, all using the same {{Connector}}, there were errors whenever there were calls to {{conn.createScanner()}} and {{conn.createBatchScanner()}}. Here are a couple stacktraces: {code} 013-05-22 09:12:28,250 [zookeeper.ZooCache] WARN : Zookeeper error, will retry org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /accumulo/5e982cc9-6959-4064-9712-2ff3dc1003d8 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:208) at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:130) at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:233) at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:188) at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:151) at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:24) at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:46) at org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78) at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64) at org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75) at org.apache.accumulo.core.client.impl.ConnectorImpl.createScanner(ConnectorImpl.java:137) {code} {code} 2013-05-22 09:12:23,849 [zookeeper.ZooCache] WARN : Zookeeper error, will retry org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /accumulo/5e982cc9-6959-4064-9712-2ff3dc1003d8 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:208) at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:130) at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:233) at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:188) at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:151) at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:24) at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:46) at org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78) at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64) at org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75) at org.apache.accumulo.core.client.impl.ConnectorImpl.createBatchScanner(ConnectorImpl.java:89) {code} The method {{ZooCache.retry(ZooRunnable op)}} (ZooCache.java:128) has a {{while(true)}} loop that should probably have a max retries or timeout that will eventually cause the method to throw an exception that can be handled appropriately by the client. As it is currently, this loop will never be exited when Zookeeper continues to error. Note: There may have been a network hiccup that triggered the bug, but there was no way to recover without restarting the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira