[ https://issues.apache.org/jira/browse/CURATOR-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kezhu Wang closed CURATOR-435. ------------------------------ Resolution: Duplicate The exception stack is same as CURATOR-436. I think it has been fixed by that jira. > ZNodes are created with incompatible CreateMode under racing conditions > ----------------------------------------------------------------------- > > Key: CURATOR-435 > URL: https://issues.apache.org/jira/browse/CURATOR-435 > Project: Apache Curator > Issue Type: Bug > Components: Client, Recipes > Affects Versions: 2.12.0 > Environment: Apache Storm 1.1.1 (uses Zookeeper 3.4.6 and Curator > 2.12.0) > Apache Zookeeper 3.4.10 > OpenJDK 1.8.0_131 > Debian 8 > Docker 17.09.0-ce > Rancher 1.6.9 > Reporter: Daniel Klessing > Priority: Major > Attachments: zk_leader-lock_SUCCESS_nimbus.log, > zk_leader-lock_SUCCESS_zookeeper.log, zk_leader-lock_nimbus.log, > zk_leader-lock_zookeeper-1.log, zk_leader-lock_zookeeper-2.log, > zk_leader-lock_zookeeper-3.log > > > We have a microservice-based software stack that starts its core services > relatively simultaneously. For this we rely on Docker and Rancher. > We noticed that occasionally the Apache Storm Nimbus, that relies on Apache > Curator's LeaderLatch, cannot agree on a leader. This comes due a missing > ZNode: > {code} > org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /storm/leader-lock > at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453) > ~[storm-core-1.1.1.jar:1.1.1] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_131] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_131] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_131] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131] > at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.7.0.jar:?] > at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) > ~[clojure-1.7.0.jar:?] > at > org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296) > ~[storm-core-1.1.1.jar:1.1.1] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_131] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_131] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_131] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131] > at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.7.0.jar:?] > at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) > ~[clojure-1.7.0.jar:?] > at > org.apache.storm.daemon.nimbus$get_cluster_info.invoke(nimbus.clj:1544) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getClusterInfo(nimbus.clj:2006) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3920) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3904) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162) > ~[storm-core-1.1.1.jar:1.1.1] > at > org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) > ~[storm-core-1.1.1.jar:1.1.1] > at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) > ~[storm-core-1.1.1.jar:1.1.1] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_131] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_131] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] > 11:29:13.229 [timer] INFO org.apache.storm.daemon.nimbus - not a leader, > skipping assignments > 11:29:13.229 [timer] INFO org.apache.storm.daemon.nimbus - not a leader, > skipping cleanup > {code} > even though a create request has been issued for this ZNode: > {code} > 2017-10-06 11:27:04,387 [myid:3] - INFO [ProcessThread(sid:3 > cport:-1)::PrepRequestProcessor@648] - Got user-level KeeperException when > processing sessionid:0x35ef170475e0001 type:create cxid:0x1 zxid:0x100000013 > txntype:-1 reqpath:n/a Error Path:/storm/leader-lock Error:KeeperErrorCode = > NoNode for /storm/leader-lock > {code} > Verifying the Zookeeper data via {{zkCli.sh}} shows that the ZNode is in fact > not present: > {code} > [zk: localhost:2181(CONNECTED) 0] ls /storm > [assignments, backpressure, nimbuses, logconfigs, storms, errors, > supervisors, workerbeats, blobstore] > {code} > We noticed, that in such a case, the following log entry in the Nimbus log is > *missing* > {code} > [Curator-Framework-0] WARN > org.apache.storm.shade.org.apache.curator.utils.ZKPaths - The version of > ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT > will be used instead. > {code} > and if everything works, the log entry is *present*. > We assume that the correct {{CreateMode}} for the ZNode someone cannot be > determined (as "Container" ZNodes are only available in Zookeeper 3.5.1 > onwards). > Please notice the startup phase in which several attempts to connect to > Zookeeper are not successful until it is fully reachable. > Full logs are attached. Zookeeper is run as a 3-node cluster in this setup. > Happens with a single node instance also. -- This message was sent by Atlassian Jira (v8.20.10#820010)