[ 
https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043409#comment-16043409
 ] 

Stig Rohde Døssing commented on STORM-2359:
-------------------------------------------

{quote}
The resets are managed internally by the ACKer bolt. The spout only gets 
notified if the timeout expires or if tuple-tree is fully processed. 
{quote}
How will this work? The current implementation has a pending map in both the 
acker and the spout, which rotate every topology.message.timeout.secs. If the 
acker doesn't forward reset requests to the spout, the spout will just expire 
the tuple tree on its own when the message timeout has passed.

{quote}
That case would be more accurately classified as "progress is being made" ... 
but slower than expected.
The case of 'progress is not being made' is when a worker that is processing 
part of the tuple tree dies.
{quote}
Yes, you are right. But it is currently possible that the topology may degrade 
to no progress being made even if each individual tuple could be processed 
under the message timeout, because tuples can expire while queued and get 
reemitted, where they can then be delayed by their own duplicates which are 
ahead in the queue. For IRichBolts, this can be mitigated by the bolt being 
written to accept and queue tuples internally, where the bolt can then reset 
their timeouts manually if necessary, but for IBasicBolt this is not possible.

Just to give a concrete example, we had an IBasicBolt enrich tuples with some 
database data. Most tuples were processed very quickly, but a few were slow. 
Even the slow tuples never took longer than our message timeout individually. 
We then had an instance where a bunch of slow tuples happened to come in on the 
stream close to each other. The first few were processed before they expired, 
but the rest expired while queued. The spout then reemitted the expired tuples, 
and they got into the queue behind their own expired instances. Since the bolt 
won't skip expired tuples, the freshly emitted tuples also expired, which 
caused another reemit. This repeated until the topology was restarted so the 
queues could be cleared.

> Revising Message Timeouts
> -------------------------
>
>                 Key: STORM-2359
>                 URL: https://issues.apache.org/jira/browse/STORM-2359
>             Project: Apache Storm
>          Issue Type: Sub-task
>          Components: storm-core
>    Affects Versions: 2.0.0
>            Reporter: Roshan Naik
>
> A revised strategy for message timeouts is proposed here.
> Design Doc:
>  
> https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to