[ 
https://issues.apache.org/jira/browse/STORM-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rick Kellogg updated STORM-946:
-------------------------------
    Component/s: storm-core

> We should remove Closed Client form cached-node+port->socket in worker
> ----------------------------------------------------------------------
>
>                 Key: STORM-946
>                 URL: https://issues.apache.org/jira/browse/STORM-946
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 0.10.0, 0.11.0
>            Reporter: xiajun
>
> The client may be Closed status after reconnect failed, and we will remove 
> closed client from Context to escape memory leak.
> But there is also reference for the closed Client in cached-node+port->socket 
> in worker, for this reason we should also remove closed Client from 
> cached-node+port->socket.  
> Meanwhile there is another reason for us to do so. Think about this 
> situation: worker A connect to worker B1 B2, but for some reason worker B1 B2 
> died at the same, then nimbus reschedule worker B1 B1. And new B1 B2 may 
> partly rescheduled at the some host:port as old B1 B2, that is (old B1: 
> host1+port1, old B2: host2+port2, new B1: host2+port2, new B2: host3+port3). 
> Worker A realized worker B1 B2 died and start reconnect to worker B1 B2, but 
> before new worker B1 and old B2 have the same host+port, and by the current 
> logic, we will remove old B1 Client and and create new Client for new worker 
> B2, and do nothing to old B2 and new B1 because they have the same host+port. 
> This will result the topology stop processing tuples. Once we remove closed 
> Client from cached-node+port->socket before refresh-connections, this  will 
> not happen again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to