[
https://issues.apache.org/jira/browse/CURATOR-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kezhu Wang closed CURATOR-435.
------------------------------
Resolution: Duplicate
The exception stack is same as CURATOR-436. I think it has been fixed by that
jira.
> ZNodes are created with incompatible CreateMode under racing conditions
> -----------------------------------------------------------------------
>
> Key: CURATOR-435
> URL: https://issues.apache.org/jira/browse/CURATOR-435
> Project: Apache Curator
> Issue Type: Bug
> Components: Client, Recipes
> Affects Versions: 2.12.0
> Environment: Apache Storm 1.1.1 (uses Zookeeper 3.4.6 and Curator
> 2.12.0)
> Apache Zookeeper 3.4.10
> OpenJDK 1.8.0_131
> Debian 8
> Docker 17.09.0-ce
> Rancher 1.6.9
> Reporter: Daniel Klessing
> Priority: Major
> Attachments: zk_leader-lock_SUCCESS_nimbus.log,
> zk_leader-lock_SUCCESS_zookeeper.log, zk_leader-lock_nimbus.log,
> zk_leader-lock_zookeeper-1.log, zk_leader-lock_zookeeper-2.log,
> zk_leader-lock_zookeeper-3.log
>
>
> We have a microservice-based software stack that starts its core services
> relatively simultaneously. For this we rely on Docker and Rancher.
> We noticed that occasionally the Apache Storm Nimbus, that relies on Apache
> Curator's LeaderLatch, cannot agree on a leader. This comes due a missing
> ZNode:
> {code}
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /storm/leader-lock
> at
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453)
> ~[storm-core-1.1.1.jar:1.1.1]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_131]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_131]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_131]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
> ~[clojure-1.7.0.jar:?]
> at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313)
> ~[clojure-1.7.0.jar:?]
> at
> org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296)
> ~[storm-core-1.1.1.jar:1.1.1]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_131]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_131]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_131]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
> ~[clojure-1.7.0.jar:?]
> at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313)
> ~[clojure-1.7.0.jar:?]
> at
> org.apache.storm.daemon.nimbus$get_cluster_info.invoke(nimbus.clj:1544)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getClusterInfo(nimbus.clj:2006)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3920)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3904)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518)
> ~[storm-core-1.1.1.jar:1.1.1]
> at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18)
> ~[storm-core-1.1.1.jar:1.1.1]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [?:1.8.0_131]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [?:1.8.0_131]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
> 11:29:13.229 [timer] INFO org.apache.storm.daemon.nimbus - not a leader,
> skipping assignments
> 11:29:13.229 [timer] INFO org.apache.storm.daemon.nimbus - not a leader,
> skipping cleanup
> {code}
> even though a create request has been issued for this ZNode:
> {code}
> 2017-10-06 11:27:04,387 [myid:3] - INFO [ProcessThread(sid:3
> cport:-1)::PrepRequestProcessor@648] - Got user-level KeeperException when
> processing sessionid:0x35ef170475e0001 type:create cxid:0x1 zxid:0x100000013
> txntype:-1 reqpath:n/a Error Path:/storm/leader-lock Error:KeeperErrorCode =
> NoNode for /storm/leader-lock
> {code}
> Verifying the Zookeeper data via {{zkCli.sh}} shows that the ZNode is in fact
> not present:
> {code}
> [zk: localhost:2181(CONNECTED) 0] ls /storm
> [assignments, backpressure, nimbuses, logconfigs, storms, errors,
> supervisors, workerbeats, blobstore]
> {code}
> We noticed, that in such a case, the following log entry in the Nimbus log is
> *missing*
> {code}
> [Curator-Framework-0] WARN
> org.apache.storm.shade.org.apache.curator.utils.ZKPaths - The version of
> ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT
> will be used instead.
> {code}
> and if everything works, the log entry is *present*.
> We assume that the correct {{CreateMode}} for the ZNode someone cannot be
> determined (as "Container" ZNodes are only available in Zookeeper 3.5.1
> onwards).
> Please notice the startup phase in which several attempts to connect to
> Zookeeper are not successful until it is fully reachable.
> Full logs are attached. Zookeeper is run as a 3-node cluster in this setup.
> Happens with a single node instance also.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)