[ 
https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739639#comment-16739639
 ] 

Stig Rohde Døssing commented on STORM-2359:
-------------------------------------------

I've been taking a look at the feasibility of automatically resetting timeouts 
for tuples that are still being processed, and I think we can do it without 
much overhead.

The idea is to track the anchor ids of each non-system message that enters an 
executor in/out queue. For the inbound queue, the anchor is no longer in 
progress when the associated tuple is acked or failed. For the outbound queue 
(pendingEmits in the Executor), the anchor is no longer in progress when the 
associated tuple gets flushed from pendingEmits.

Occasionally a thread will check the set of in-progress anchors for the worker 
and send reset messages for all of them to the relevant ackers. In order to 
avoid sending too many messages, this thread snapshots the anchor set when it 
runs, and only sends reset messages for anchors that have been in progress 
sufficiently long in that worker.

Since there may be more than one tuple per anchor, anchors are tracked as a 
count in a multiset, rather than just presence in a set.

I've updated the spreadsheet with benchmark numbers for TVL with this 
functionality enabled. For the 90k example I also did a run where the grace 
period is disabled, to show the penalty for sending resets in the worst case, 
i.e. all in progress tuples have their timeouts reset every time the resetter 
thread runs.

The code is available at https://github.com/srdo/storm/tree/auto-reset-timeout. 
Only the latest commit is new.

> Revising Message Timeouts
> -------------------------
>
>                 Key: STORM-2359
>                 URL: https://issues.apache.org/jira/browse/STORM-2359
>             Project: Apache Storm
>          Issue Type: Sub-task
>          Components: storm-core
>    Affects Versions: 2.0.0
>            Reporter: Roshan Naik
>            Assignee: Stig Rohde Døssing
>            Priority: Major
>         Attachments: STORM-2359.ods, STORM-2359.ods
>
>
> A revised strategy for message timeouts is proposed here.
> Design Doc:
>  
> https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to