[
https://issues.apache.org/jira/browse/STORM-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15370998#comment-15370998
]
Abhishek Agarwal commented on STORM-1949:
-----------------------------------------
[~sriharsha] We can disable the back-pressure. But I want to know what will be
the alternative for topologies with non-acking? If backpressure is not enabled,
there should be some other way to bound the send/receive queues.
> Backpressure can cause spout to stop emitting and stall topology
> ----------------------------------------------------------------
>
> Key: STORM-1949
> URL: https://issues.apache.org/jira/browse/STORM-1949
> Project: Apache Storm
> Issue Type: Bug
> Reporter: Roshan Naik
>
> Problem can be reproduced by this [Word count
> topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java]
> within a IDE.
> I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt
> instances.
> The problem is more easily reproduced with WC topology as it causes an
> explosion of tuples due to splitting a sentence tuple into word tuples. As
> the bolts have to process more tuples than the spout is producing, spout
> needs to operate slower.
> The amount of time it takes for the topology to stall can vary.. but
> typically under 10 mins.
> *My theory:* I suspect there is a race condition in the way ZK is being
> utilized to enable/disable back pressure. When congested (i.e pressure
> exceeds high water mark), the bolt's worker records this congested situation
> in ZK by creating a node. Once the congestion is reduced below the low water
> mark, it deletes this node.
> The spout's worker has setup a watch on the parent node, expecting a callback
> whenever there is change in the child nodes. On receiving the callback the
> spout's worker lists the parent node to check if there are 0 or more child
> nodes.... it is essentially trying to figure out the nature of state change
> in ZK to determine whether to throttle or not. Subsequently it setsup
> another watch in ZK to keep an eye on future changes.
> When there are multiple bolts, there can be rapid creation/deletion of these
> ZK nodes. Between the time the worker receives a callback and sets up the
> next watch.. many changes may have undergone in ZK which will go unnoticed by
> the spout.
> The condition that the bolts are no longer congested may not get noticed as a
> result of this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)