[
https://issues.apache.org/jira/browse/CURATOR-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867791#comment-13867791
]
Evaristo commented on CURATOR-72:
---------------------------------
Hi there,
I am trying to produce a test that shows the problem but I am failing to
isolate a test. Anyhow I think I have some interesting information activating
CURATOR DEBUG information just in case that you could help me to provide some
light about the problem.
As said previously with CURATOR-72 I see RECONNECTED-LOST (LOST should not
happen), but I see also some other strange things.
I have run my app tests with CURATOR 2.1.0 and CURATOR-72 branch code
My app is using several recipes (leaderelection, nodeCache,
pathchildrenCache...), and the app has several CF clients
I am using RetryOneTime(1000) retry policy for CF clients and STO 7500 msecs
and CTO 1500 msecs
Using CURATOR 2.1.0 my app works fine and listeners look ok. When I stop ZK
servers the number of background retries looks to follow the retry policy
pattern.
Using CURATOR-72 branch with current code I see the following:
- When I stop ZK servers the number of background retries exceptions is
huge in the log (for sure it does not follow the retyr policy and CTO
parameters), so I believe some recipe is retrying like hell and not following
the retry policy (in this case the process takes a much higher CPU process that
with CURATOR-2-1.0). There is a huge amount of messages with
ConnectionLossException and messages like:
[ConnectionState] 2014-01-10 14:12:19 [ERROR] - ConnectionState.java:194 |
Connection timed out for connection string
(127.0.0.1:30101,127.0.0.1:30102,127.0.0.1:30103) and timeout (1500) / elapsed
(6095)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:191)
at
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:86)
at
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:743)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:729)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$2(CuratorFrameworkImpl.java:710)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
[CuratorFrameworkImpl] 2014-01-10 14:12:19 [DEBUG] -
CuratorFrameworkImpl.java:697 | Retry policy did not allow retry
[CuratorFrameworkImpl] 2014-01-10 14:12:19 [ERROR] -
CuratorFrameworkImpl.java:512 | Background exception was not retry-able or
retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:191)
at
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:86)
at
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:743)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:729)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$2(CuratorFrameworkImpl.java:710)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
[ConnectionState] 2014-01-10 14:12:19 [ERROR] - ConnectionState.java:194 |
Connection timed out for connection string
(127.0.0.1:30101,127.0.0.1:30102,127.0.0.1:30103) and timeout (1500) / elapsed
(6097)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:191)
at
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:86)
at
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:743)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:729)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$2(CuratorFrameworkImpl.java:710)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
[CuratorFrameworkImpl] 2014-01-10 14:12:19 [ERROR] -
CuratorFrameworkImpl.java:512 | Background exception was not retry-able or
retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:191)
at
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:86)
at
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:743)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:729)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$2(CuratorFrameworkImpl.java:710)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
[ConnectionState] 2014-01-10 14:12:19 [ERROR] - ConnectionState.java:194 |
Connection timed out for connection string
(127.0.0.1:30101,127.0.0.1:30102,127.0.0.1:30103) and timeout (1500) / elapsed
(6098)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss
at
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:191)
at
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:86)
at
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:743)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:729)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$2(CuratorFrameworkImpl.java:710)
at
org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
(Thousands of exceptions; one every 1 msec aprox).
At the end the number of ConnectionLoss Exceptions is huge and highly frequent.
When starting ZK server again, the RECONNECTED is fired but after that LOST is
also fired, so I guess that still there is race condition (although now there
is an issue that triggers so many ConnectionLoss Exceptions)
I hope it helps
Regards,
Evaristo
> Background operations don't wait for connection timeout
> -------------------------------------------------------
>
> Key: CURATOR-72
> URL: https://issues.apache.org/jira/browse/CURATOR-72
> Project: Apache Curator
> Issue Type: Bug
> Components: Framework
> Affects Versions: 2.3.0
> Reporter: Evaristo Camarero
> Assignee: Jordan Zimmerman
> Fix For: 2.4.0
>
> Attachments: TestListener.java, TestListenerConnectedAtStart.java,
> TestListenerSequence.java, test.java
>
>
> Background operations don't wait for the configured connection timeout before
> failing. Attached test shows the problem.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)