[ 
https://issues.apache.org/jira/browse/STORM-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463945#comment-16463945
 ] 

zhangbiao commented on STORM-3055:
----------------------------------

thr problem is caused by context's connection cache.
for example supervisor with id 'a' restart with local version store corrupt , 
then it will generate an other id 'b' (as an example).
when 'b' is up, then nimbus will assign some task on 'b', if old assignment is 
[a:6700, c:6700],  the new assignment is [b:6700, c:6700]
 then task c:6700 will first connect [b:6700] then close and remove connection 
[a:6700],  
since a, b is the same ip so b:6700 will share connection a:6700. but the same 
connection will close by remove 

> never refresh connection
> ------------------------
>
>                 Key: STORM-3055
>                 URL: https://issues.apache.org/jira/browse/STORM-3055
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.1.1
>            Reporter: zhangbiao
>            Priority: Major
>
> in our enviroment some worker's connection to other worker being closed and 
> never reconnect,
> the log show's that 
> 2018-05-02 10:28:49.302 o.a.s.m.n.Client 
> Thread-90-disruptor-worker-transfer-queue [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.31.1:6800 is being closed
> ......
> 2018-05-02 11:00:29.540 o.a.s.m.n.Client 
> Thread-90-disruptor-worker-transfer-queue [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.31.1:6800 is being closed
> the log shows that it never can reconnect again. i can only fix it after 
> restart the topo, 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to