[ 
https://issues.apache.org/jira/browse/CURATOR-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kezhu Wang closed CURATOR-435.
------------------------------
    Resolution: Duplicate

The exception stack is same as CURATOR-436. I think it has been fixed by that 
jira.

> ZNodes are created with incompatible CreateMode under racing conditions
> -----------------------------------------------------------------------
>
>                 Key: CURATOR-435
>                 URL: https://issues.apache.org/jira/browse/CURATOR-435
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client, Recipes
>    Affects Versions: 2.12.0
>         Environment: Apache Storm 1.1.1 (uses Zookeeper 3.4.6 and Curator 
> 2.12.0)
> Apache Zookeeper 3.4.10
> OpenJDK 1.8.0_131
> Debian 8
> Docker 17.09.0-ce
> Rancher 1.6.9
>            Reporter: Daniel Klessing
>            Priority: Major
>         Attachments: zk_leader-lock_SUCCESS_nimbus.log, 
> zk_leader-lock_SUCCESS_zookeeper.log, zk_leader-lock_nimbus.log, 
> zk_leader-lock_zookeeper-1.log, zk_leader-lock_zookeeper-2.log, 
> zk_leader-lock_zookeeper-3.log
>
>
> We have a microservice-based software stack that starts its core services 
> relatively simultaneously. For this we rely on Docker and Rancher.
> We noticed that occasionally the Apache Storm Nimbus, that relies on Apache 
> Curator's LeaderLatch, cannot agree on a leader. This comes due a missing 
> ZNode:
> {code}
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for /storm/leader-lock
>       at 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_131]
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131]
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131]
>       at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
>       at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
> ~[clojure-1.7.0.jar:?]
>       at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) 
> ~[clojure-1.7.0.jar:?]
>       at 
> org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_131]
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_131]
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131]
>       at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
>       at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
> ~[clojure-1.7.0.jar:?]
>       at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) 
> ~[clojure-1.7.0.jar:?]
>       at 
> org.apache.storm.daemon.nimbus$get_cluster_info.invoke(nimbus.clj:1544) 
> ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getClusterInfo(nimbus.clj:2006)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3920)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3904)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518)
>  ~[storm-core-1.1.1.jar:1.1.1]
>       at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) 
> ~[storm-core-1.1.1.jar:1.1.1]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_131]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_131]
>       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
> 11:29:13.229 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, 
> skipping assignments
> 11:29:13.229 [timer] INFO  org.apache.storm.daemon.nimbus - not a leader, 
> skipping cleanup
> {code}
> even though a create request has been issued for this ZNode:
> {code}
> 2017-10-06 11:27:04,387 [myid:3] - INFO  [ProcessThread(sid:3 
> cport:-1)::PrepRequestProcessor@648] - Got user-level KeeperException when 
> processing sessionid:0x35ef170475e0001 type:create cxid:0x1 zxid:0x100000013 
> txntype:-1 reqpath:n/a Error Path:/storm/leader-lock Error:KeeperErrorCode = 
> NoNode for /storm/leader-lock
> {code}
> Verifying the Zookeeper data via {{zkCli.sh}} shows that the ZNode is in fact 
> not present:
> {code}
> [zk: localhost:2181(CONNECTED) 0] ls /storm
> [assignments, backpressure, nimbuses, logconfigs, storms, errors, 
> supervisors, workerbeats, blobstore]
> {code}
> We noticed, that in such a case, the following log entry in the Nimbus log is 
> *missing*
> {code}
> [Curator-Framework-0] WARN  
> org.apache.storm.shade.org.apache.curator.utils.ZKPaths - The version of 
> ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT 
> will be used instead.
> {code}
> and if everything works, the log entry is *present*.
> We assume that the correct {{CreateMode}} for the ZNode someone cannot be 
> determined (as "Container" ZNodes are only available in Zookeeper 3.5.1 
> onwards).
> Please notice the startup phase in which several attempts to connect to 
> Zookeeper are not successful until it is fully reachable.
> Full logs are attached. Zookeeper is run as a 3-node cluster in this setup. 
> Happens with a single node instance also.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to