James Xu created STORM-95:
-----------------------------
Summary: Topology hangs with worker processor threads
TIMED_WAITING. Edit
Key: STORM-95
URL: https://issues.apache.org/jira/browse/STORM-95
Project: Apache Storm (Incubating)
Issue Type: Improvement
Reporter: James Xu
https://github.com/nathanmarz/storm/issues/763
Hi Nathan,
We are this issue very frequently now while using Storm 0.8.2. There are no
errors in any worker logs/ supervisor.log/nimbus.log . However the topology
stops processing the tuples.
On collesting the thread dump of the worker processor we can see all the
threads are going into TIMED_WAITING states and toplogy hangs.
The following is the brief on our toplogy.
We are using BaseIRich Spout and bolts.
We have one file reader spout and three processing bolts.(24, 48 and 24
executors)
Each tuple will contain 100 messages of size 10kb each totaling 1mb.
We aim to process 30 mil such records within 6 hrs.
We are running it on SUSE Linux 11 entreprise server.
We are using all the recomended versions (Storm 0.8.2,Java 1.7, Zookeeper
3.4.5, ZeroMQ - 2.1.7, JZMQ-)
Below are the list of variuos combination of the storm configuration we tried.
Conf -3
worker.childopts: "-Xmx3072m"
topology.acker.executors: 20
topology.max.spout.pending: 50
topology.message.timeout.secs: 300
topology.executor.receive.buffer.size: 16384 #batched
topology.executor.send.buffer.size: 16384
Conf-2
worker.childopts: "-Xmx3072m"
topology.acker.executors: 20
topology.max.spout.pending: 300
topology.message.timeout.secs: 300
topology.executor.receive.buffer.size: 16384 #batched
topology.executor.send.buffer.size: 16384
Conf-1
worker.childopts: "-Xmx3072m"
topology.acker.executors: 20
topology.max.spout.pending: 1000
topology.message.timeout.secs: 300
Also attaching the thread dumps for your reference.
We desperately need your help to resolve this issue as we are looking to go
live soon.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)