Github user tedxia commented on the pull request: https://github.com/apache/storm/pull/268#issuecomment-61752674 I test this patch on our product cluster, with five machine, each with 6 workers as max; The topology based on trident run about 5 hours without fails. Then I kill one worker called A, then I found the log below on worker B.Worker B don't exit as worker A died. ``` 2014-11-04 17:18:08 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-A/xxx.xxx.xxx.xxx:21812... [47] 2014-11-04 17:18:12 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-A/xxx.xxx.xxx.xxx:21812... [48] 2014-11-04 17:18:16 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-A/xxx.xxx.xxx.xxx:21812... [49] 2014-11-04 17:18:20 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-A/xxx.xxx.xxx.xxx:21812... [50] 2014-11-04 17:18:24 b.s.m.n.Client [INFO] Closing Netty Client Netty-Client-A/xxx.xxx.xxx.xxx:21812 2014-11-04 17:18:24 b.s.m.n.Client [INFO] Waiting for pending batchs to be sent with Netty-Client-A/xxx.xxx.xxx.xxx:21812..., timeout: 600000ms, pendings: 0 2014-11-04 17:18:24 b.s.m.n.Client [INFO] Client is being closed, and does not take requests any more, drop the messages... 2014-11-04 17:18:24 b.s.m.n.Client [INFO] Client is being closed, and does not take requests any more, drop the messages... ``` As worker A died, nimbus reschedule a new worker F, then worker B connect to worker F. ``` 2014-11-04 17:16:53 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-A/xxx.xxx.xxx.xxx:21812... [21] 2014-11-04 17:16:54 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-F/xxx.xxx.xxx.xxx:21813... [17] 2014-11-04 17:16:54 b.s.m.n.Client [INFO] connection established to a remote host Netty-Client-F/xxx.xxx.xxx.xxx:21813, [id: 0xbf721a18, /xxx.xxx.xxx.xxx:63811 => F/xxx.xxx.xxx.xxx:21813] 2014-11-04 17:16:55 b.s.m.n.Client [INFO] Reconnect started for Netty-Client-A/10.2.201.65:21812... [22] ``` worker B connect to worker F successful before worker B close connection with Worker A. Because this is our product cluster, I rewrite the hostname and ip in the log.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---