[ https://issues.apache.org/jira/browse/STORM-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001353#comment-15001353 ]
ASF GitHub Bot commented on STORM-794: -------------------------------------- Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/542#issuecomment-155947307 @revans2 Upmerged. > Trident Topology with some situation seems not handle deactivate during > graceful shutdown > ----------------------------------------------------------------------------------------- > > Key: STORM-794 > URL: https://issues.apache.org/jira/browse/STORM-794 > Project: Apache Storm > Issue Type: Bug > Components: storm-core > Affects Versions: 0.9.3 > Reporter: Jungtaek Lim > Assignee: Jungtaek Lim > > I met an issue from Trident Topology in production env. > Normally, when we kill a topology via UI, Nimbus changes Topology status to > "killed", and when Spout determines new status, it becomes deactivated so > bolts can handle remain tuples within wait-time. > AFAIK that's how Storm guarantees graceful shutdown. > But, Trident Topology seems not handle "deactivate" while we try shutdown > topology gracefully. > MasterBatchCoordinator never stops making next transaction, so Trident Spout > never stops emitting, bolts (function) always take care of tuples. > Topology setting > - 1 worker, 1 acker > - max spout pending: 1 > - TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS : 5 > -- It may be weird but MasterBatchCoordinator's default value is 1 > * Nimbus log > {code} > 2015-04-20 09:59:07.954 INFO [pool-5-thread-41][nimbus] Delaying event > :remove for 120 secs for BFDC-topology-DynamicCollect-68c9d7b4-72-1429491015 > ... > 2015-04-20 09:59:07.955 INFO [pool-5-thread-41][nimbus] Updated > BFDC-topology-DynamicCollect-68c9d7b4-72-1429491015 with status {:type > :killed, :kill-time-secs 120} > ... > 2015-04-20 10:01:07.956 INFO [timer][nimbus] Killing topology: > BFDC-topology-DynamicCollect-68c9d7b4-72-1429491015 > ... > 2015-04-20 10:01:14.448 INFO [timer][nimbus] Cleaning up > BFDC-topology-DynamicCollect-68c9d7b4-72-1429491015 > {code} > * Supervisor log > {code} > 2015-04-20 10:01:07.960 INFO [Thread-1][supervisor] Removing code for storm > id BFDC-topology-DynamicCollect-68c9d7b4-72-1429491015 > 2015-04-20 10:01:07.962 INFO [Thread-2][supervisor] Shutting down and > clearing state for id 9719259e-528c-4336-abf9-592c1bb9a00b. Current > supervisor time: 1429491667. State: :disallowed, Heartbeat: > #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1429491667, > :storm-id "BFDC-topology-DynamicCollect-68c9d7b4-72-1429491015", :executors > #{[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [11 11] [12 12] [13 > 13] [14 14] [-1 -1] [1 1]}, :port 6706} > 2015-04-20 10:01:07.962 INFO [Thread-2][supervisor] Shutting down > 5bc084a2-b668-4610-86f6-9b93304d40a8:9719259e-528c-4336-abf9-592c1bb9a00b > 2015-04-20 10:01:08.974 INFO [Thread-2][supervisor] Shut down > 5bc084a2-b668-4610-86f6-9b93304d40a8:9719259e-528c-4336-abf9-592c1bb9a00b > {code} > * Worker log > {code} > 2015-04-20 10:01:07.985 INFO [Thread-33][worker] Shutting down worker > BFDC-topology-DynamicCollect-68c9d7b4-72-1429491015 > 5bc084a2-b668-4610-86f6-9b93304d40a8 6706 > 2015-04-20 10:01:07.985 INFO [Thread-33][worker] Shutting down receive thread > 2015-04-20 10:01:07.988 WARN [Thread-33][ExponentialBackoffRetry] maxRetries > too large (300). Pinning to 29 > 2015-04-20 10:01:07.988 INFO > [Thread-33][StormBoundedExponentialBackoffRetry] The baseSleepTimeMs [100] > the maxSleepTimeMs [1000] the maxRetries [300] > 2015-04-20 10:01:07.988 INFO [Thread-33][Client] New Netty Client, connect > to localhost, 6706, config: , buffer_size: 5242880 > 2015-04-20 10:01:07.991 INFO [client-schedule-service-1][Client] Reconnect > started for Netty-Client-localhost/127.0.0.1:6706... [0] > 2015-04-20 10:01:07.996 INFO [Thread-33][loader] Shutting down > receiving-thread: [BFDC-topology-DynamicCollect-68c9d7b4-72-1429491015, 6706] > ... > 2015-04-20 10:01:08.044 INFO [Thread-33][Client] Closing Netty Client > Netty-Client-localhost/127.0.0.1:6706 > 2015-04-20 10:01:08.044 INFO [Thread-33][Client] Waiting for pending batchs > to be sent with Netty-Client-localhost/127.0.0.1:6706..., timeout: 600000ms, > pendings: 1 > {code} > I found activating log, but cannot find deactivating log. > {code} > 2015-04-20 09:50:24.556 INFO [Thread-30-$mastercoord-bg0][executor] > Activating spout $mastercoord-bg0:(1) > {code} > Please note that it doesn't work when I just push button to "deactivate" > topology via UI. > We're changing our Topology to normal Spout-Bolt, but personally I'd like to > see it resolved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)