[
https://issues.apache.org/jira/browse/CURATOR-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789093#comment-13789093
]
Antal Sasvári commented on CURATOR-45:
--------------------------------------
Was this patch also tested with autoRequeue enabled?
I have changed TestLeaderSelectorEdges.flappingTest() to enable autoRequeue()
for leaderSelector1, and it seems that more and more ephemeral nodes keep
getting created and the deleted (with increasing sequence numbers), and
leaderSelector1 is getting and loosing leadership all the time.
It looks like the new LE ephemeral node would be constantly deleted in the
background, and then recreated again because of autoRequeue.
> LeaderSelector threw exception, but still created ephemeral node, breaking
> everything
> -------------------------------------------------------------------------------------
>
> Key: CURATOR-45
> URL: https://issues.apache.org/jira/browse/CURATOR-45
> Project: Apache Curator
> Issue Type: Bug
> Components: Framework, Recipes
> Affects Versions: 2.2.0-incubating
> Reporter: Shevek
> Assignee: Jordan Zimmerman
> Fix For: 2.3.0
>
> Attachments: CURATOR-45.patch
>
>
> ZooKeeper hiccupped, and then this happened:
> 2013-06-19 02:23:35,561 DEBUG [LeaderSelector-1]
> com.netflix.curator.RetryLoop.takeException (RetryLoop.java:184) - Retry-able
> exception received
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for
> /[REMOVED]/election/_c_1ccdb2b9-7f9a-4570-9555-201c91ec2dcb-lock-
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> ~[zookeeper-3.5.0.jar:3.5.0--1]
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> ~[zookeeper-3.5.0.jar:3.5.0--1]
> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:876)
> ~[zookeeper-3.5.0.jar:3.5.0--1]
> at
> com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:625)
> ~[curator-framework-1.3.5-SNAPSHOT.jar:?]
> at
> com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:609)
> ~[curator-framework-1.3.5-SNAPSHOT.jar:?]
> at
> com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106)
> [curator-client-1.3.5-SNAPSHOT.jar:?]
> at
> com.netflix.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:605)
> [curator-framework-1.3.5-SNAPSHOT.jar:?]
> at
> com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:428)
> [curator-framework-1.3.5-SNAPSHOT.jar:?]
> at
> com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:41)
> [curator-framework-1.3.5-SNAPSHOT.jar:?]
> at
> com.netflix.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:218)
> [curator-recipes-1.3.5-SNAPSHOT.jar:?]
> at
> com.netflix.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:218)
> [curator-recipes-1.3.5-SNAPSHOT.jar:?]
> at
> com.netflix.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:74)
> [curator-recipes-1.3.5-SNAPSHOT.jar:?]
> at
> com.netflix.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:314)
> [curator-recipes-1.3.5-SNAPSHOT.jar:?]
> at
> com.netflix.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:373)
> [curator-recipes-1.3.5-SNAPSHOT.jar:?]
> at
> com.netflix.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:46)
> [curator-recipes-1.3.5-SNAPSHOT.jar:?]
> at
> com.netflix.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:195)
> [curator-recipes-1.3.5-SNAPSHOT.jar:?]
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> [?:1.6.0_27]
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> [?:1.6.0_27]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> [?:1.6.0_27]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [?:1.6.0_27]
> at java.lang.Thread.run(Thread.java:679) [?:1.6.0_27]
> However, the ephemeral node got created, and this hung leader election for
> this path.
> I'm investigating to work out where to put an extra guaranteed-delete. I see
> the case in LockInternals, which sometimes triggers to do this cleanup, but
> it didn't trigger in this case.
> You must really love our bugs by now.
--
This message was sent by Atlassian JIRA
(v6.1#6144)