ryan.jin created STORM-2596: ------------------------------- Summary: Storm Worker not reconnect the Netty Client Key: STORM-2596 URL: https://issues.apache.org/jira/browse/STORM-2596 Project: Apache Storm Issue Type: Bug Components: storm-core Affects Versions: 1.1.0 Reporter: ryan.jin Priority: Critical
I have report the simliar bugs at [STORM-2561|https://issues.apache.org/jira/browse/STORM-2561] on the version of 0.10.1. And these days I upgrade the storm to 1.1.0, but today the bug is appeared agagin. The worker.log shows {code:java} $ cat worker.log|grep '10.24.40.254:6812'|more 2017-06-22 15:14:25.295 o.a.s.m.n.Client main [INFO] creating Netty Client, connecting to 10.24.40.254:6812, bufferSize: 5242880 2017-06-23 11:23:32.570 o.a.s.m.n.StormClientHandler client-worker-1 [INFO] Connection to /10.24.40.254:6812 failed: 2017-06-23 11:23:35.654 o.a.s.m.n.Client refresh-connections-timer [INFO] closing Netty Client Netty-Client-/10.24.40.254:6812 2017-06-23 11:23:35.655 o.a.s.m.n.Client refresh-connections-timer [INFO] waiting up to 600000 ms to send 0 pending messages to Netty-Client-/10.24.40.254 :6812 2017-06-23 14:57:03.352 o.a.s.m.n.Client Thread-10-disruptor-worker-transfer-queue [ERROR] discarding 1 messages because the Netty client to Netty-Client- /10.24.40.254:6812 is being closed 2017-06-23 14:57:59.777 o.a.s.m.n.Client Thread-10-disruptor-worker-transfer-queue [ERROR] discarding 1 messages because the Netty client to Netty-Client- /10.24.40.254:6812 is being closed 2017-06-23 14:59:16.038 o.a.s.m.n.Client Thread-10-disruptor-worker-transfer-queue [ERROR] discarding 1 messages because the Netty client to Netty-Client- /10.24.40.254:6812 is being closed 2017-06-23 15:01:27.092 o.a.s.m.n.Client Thread-10-disruptor-worker-transfer-queue [ERROR] discarding 1 messages because the Netty client to Netty-Client- /10.24.40.254:6812 is being closed 2017-06-23 15:04:08.654 o.a.s.m.n.Client Thread-10-disruptor-worker-transfer-queue [ERROR] discarding 1 messages because the Netty client to Netty-Client- /10.24.40.254:6812 is being closed 2017-06-23 15:06:59.777 o.a.s.m.n.Client Thread-10-disruptor-worker-transfer-queue [ERROR] discarding 1 messages because the Netty client to Netty-Client- /10.24.40.254:6812 is being closed {code} The worker close the netty client on 2017-06-23 11:23:35.654, and never start the netty client. So the messages later on that worker are been discarded. On that time Storm Node(10.24.40.254:6812) is OOM. {code:java} 2017-06-23 11:22:59.623 g.a.s.s.t.SolrPersistApi pool-10-thread-8 [INFO] write 200 doc at:invoketrace success cost 228060 2017-06-23 11:22:59.625 g.a.s.s.t.SolrPersistApi pool-10-thread-5 [INFO] write 66 doc at:invoketrace success cost 226739 2017-06-23 11:22:59.626 g.a.s.s.t.SolrPersistApi pool-10-thread-7 [INFO] write 200 doc at:invoketrace success cost 167869 2017-06-23 11:23:32.242 STDERR Thread-2 [INFO] java.lang.OutOfMemoryError: Java heap space 2017-06-23 11:23:32.253 STDERR Thread-2 [INFO] Dumping heap to artifacts/heapdump ... @ {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)