Github user tedxia commented on the pull request:

    https://github.com/apache/storm/pull/268#issuecomment-61752674
  
    I test this patch on our product cluster, with five machine, each with 6 
workers as max;
    
    The topology based on trident run about 5 hours without fails.
    
    
    Then I kill one worker called A, then I found the log below on worker 
B.Worker B don't exit as worker A died. 
    ```
    2014-11-04 17:18:08 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-A/xxx.xxx.xxx.xxx:21812... [47]
    2014-11-04 17:18:12 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-A/xxx.xxx.xxx.xxx:21812... [48]
    2014-11-04 17:18:16 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-A/xxx.xxx.xxx.xxx:21812... [49]
    2014-11-04 17:18:20 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-A/xxx.xxx.xxx.xxx:21812... [50]
    2014-11-04 17:18:24 b.s.m.n.Client [INFO] Closing Netty Client 
Netty-Client-A/xxx.xxx.xxx.xxx:21812
    2014-11-04 17:18:24 b.s.m.n.Client [INFO] Waiting for pending batchs to be 
sent with Netty-Client-A/xxx.xxx.xxx.xxx:21812..., timeout: 600000ms, pendings: 0
    2014-11-04 17:18:24 b.s.m.n.Client [INFO] Client is being closed, and does 
not take requests any more, drop the messages...
    2014-11-04 17:18:24 b.s.m.n.Client [INFO] Client is being closed, and does 
not take requests any more, drop the messages...
    ```
    
    As worker A died, nimbus reschedule a new worker F, then worker B connect 
to worker F.
    ```
    2014-11-04 17:16:53 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-A/xxx.xxx.xxx.xxx:21812... [21]
    2014-11-04 17:16:54 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-F/xxx.xxx.xxx.xxx:21813... [17]
    2014-11-04 17:16:54 b.s.m.n.Client [INFO] connection established to a 
remote host Netty-Client-F/xxx.xxx.xxx.xxx:21813, [id: 0xbf721a18, 
/xxx.xxx.xxx.xxx:63811 => F/xxx.xxx.xxx.xxx:21813]
    2014-11-04 17:16:55 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-A/10.2.201.65:21812... [22]
    ```
    worker B connect to worker F successful before worker B close connection 
with Worker A.
    
    Because this is our product cluster, I rewrite the hostname and ip in the 
log.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to