[jira] [Commented] (STORM-3773) Worker Reassignment - Difference between Storm 2.x and Storm 1.x

Pedro Azevedo (Jira) Wed, 21 Aug 2024 08:35:05 -0700


    [ 
https://issues.apache.org/jira/browse/STORM-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875540#comment-17875540
 ]


Pedro Azevedo commented on STORM-3773:
--------------------------------------

On another note, I've seen this error when shutting down my nimbus.


{code:java}
2024-08-21T15:09:46.647Z Nimbus [INFO] Shutting down master 
2024-08-21T15:09:46.648Z CuratorFrameworkImpl [INFO] backgroundOperationsLoop 
exiting 2024-08-21T15:09:46.752Z ClientCnxn [INFO] EventThread shut down for 
session: 0x4000010be7caa5d 2024-08-21T15:09:46.752Z ZooKeeper [INFO] Session: 
0x4000010be7caa5d closed 2024-08-21T15:09:46.752Z CuratorFrameworkImpl [INFO] 
backgroundOperationsLoop exiting 2024-08-21T15:09:46.812Z ProcessFunction 
[ERROR] Internal error processing getLeader java.lang.IllegalStateException: 
Expected state [STARTED] was [STOPPED] at 
org.apache.storm.shade.org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:835)
 ~[storm-shaded-deps-2.6.1.jar:2.6.1] at 
org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl.checkState(CuratorFrameworkImpl.java:462)
 ~[storm-shaded-deps-2.6.1.jar:2.6.1] at 
org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl.getChildren(CuratorFrameworkImpl.java:507)
 ~[storm-shaded-deps-2.6.1.jar:2.6.1] at 
org.apache.storm.zookeeper.ClientZookeeper.getChildren(ClientZookeeper.java:209)
 ~[storm-client-2.6.1.jar:2.6.1] at 
org.apache.storm.cluster.ZKStateStorage.get_children(ZKStateStorage.java:155) 
~[storm-client-2.6.1.jar:2.6.1] at 
org.apache.storm.cluster.StormClusterStateImpl.nimbuses(StormClusterStateImpl.java:279)
 ~[storm-client-2.6.1.jar:2.6.1] at 
org.apache.storm.daemon.nimbus.Nimbus.getLeader(Nimbus.java:4907) 
~[storm-server-2.6.1.jar:2.6.1] at 
org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:5168)
 ~[storm-client-2.6.1.jar:2.6.1] at 
org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:5144)
 ~[storm-client-2.6.1.jar:2.6.1] at 
org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:40) 
[storm-shaded-deps-2.6.1.jar:2.6.1] at 
org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:40) 
[storm-shaded-deps-2.6.1.jar:2.6.1] at 
org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:171)
 [storm-client-2.6.1.jar:2.6.1] at 
org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:492)
 [storm-shaded-deps-2.6.1.jar:2.6.1] at 
org.apache.storm.thrift.server.Invocation.run(Invocation.java:19) 
[storm-shaded-deps-2.6.1.jar:2.6.1] at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 [?:?] at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 [?:?] at java.base/java.lang.Thread.run(Thread.java:829) [?:?]{code}

> Worker Reassignment - Difference between Storm 2.x  and Storm 1.x
> -----------------------------------------------------------------
>
>                 Key: STORM-3773
>                 URL: https://issues.apache.org/jira/browse/STORM-3773
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Surajeet
>            Priority: Major
>
> We are currently on Storm 1.2.1 and was in the process of upgrading it to 
> Storm 2.2.0
>  Observed the below while upgrading it to 2.2.0:
> 1) In a storm cluster (4 nodes) with 8 topologies running  ( with a mapping 
> of 1-1 between worker and topologies), when i bring down nimbus,supervisor in 
> one of the node (let's say Node 1, which is not nimbus leader) the workers 
> running on that node gets reassigned to other 3, even though it is running on 
> that node (Node 1). So i have 2 worker process for the same topology running 
> at the same time ( saw the behaviour with or without using pacemaker). The 
> worker process does get killed when nimbus and supervisor is brought up in 
> Node 1
> 2) Observed from worker logs that it sends heartbeat to local supervisor and 
> nimbus leader , which with 1.2.1 used to happen using Zookeeper ( i saw this 
> behaviour in 2.2.0 with or without using Pacemaker). 
>  If i bring down nimbus and supervisor on node where nimbus is a leader, it 
> reassigns worker processes and in some cases leads to zombie worker 
> processess ( is not killed when storm kill is executed)
> These above behaviour (reassignment of worker) doesn't happen with Storm 1.2.1
> Since this is a fundamental design change between 1.x and 2.x , are there any 
> documentation which describes it in detail? ( couldn't find from Release 
> Notes)
> (I am raising this as a bug because its preventing us from moving to 2.2.0 
> due to the issue mentioned in 2) )
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (STORM-3773) Worker Reassignment - Difference between Storm 2.x and Storm 1.x

Reply via email to