[ https://issues.apache.org/jira/browse/STORM-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rick Kellogg updated STORM-946: ------------------------------- Component/s: storm-core > We should remove Closed Client form cached-node+port->socket in worker > ---------------------------------------------------------------------- > > Key: STORM-946 > URL: https://issues.apache.org/jira/browse/STORM-946 > Project: Apache Storm > Issue Type: Bug > Components: storm-core > Affects Versions: 0.10.0, 0.11.0 > Reporter: xiajun > > The client may be Closed status after reconnect failed, and we will remove > closed client from Context to escape memory leak. > But there is also reference for the closed Client in cached-node+port->socket > in worker, for this reason we should also remove closed Client from > cached-node+port->socket. > Meanwhile there is another reason for us to do so. Think about this > situation: worker A connect to worker B1 B2, but for some reason worker B1 B2 > died at the same, then nimbus reschedule worker B1 B1. And new B1 B2 may > partly rescheduled at the some host:port as old B1 B2, that is (old B1: > host1+port1, old B2: host2+port2, new B1: host2+port2, new B2: host3+port3). > Worker A realized worker B1 B2 died and start reconnect to worker B1 B2, but > before new worker B1 and old B2 have the same host+port, and by the current > logic, we will remove old B1 Client and and create new Client for new worker > B2, and do nothing to old B2 and new B1 because they have the same host+port. > This will result the topology stop processing tuples. Once we remove closed > Client from cached-node+port->socket before refresh-connections, this will > not happen again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)