[ 
https://issues.apache.org/jira/browse/STORM-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992758#comment-15992758
 ] 

Arun Mahadevan commented on STORM-2489:
---------------------------------------

[~wangkui], the initial tuples expired because the trigger was not fired 
exactly after the window interval but after a delay. When I tested in local 
mode with spout emitting without a delay, the trigger happened after 6s (for a 
4s tumbling window). This may be because the system is overwhelmed with data 
and not able to schedule the trigger thread on time. In this case the initial 
tuples (0 - 2s) will not be considered in the first window. 

Typically the window duration should be such that all the tuples within a 
window can be processed before the next window trigger, otherwise the next 
window trigger will be delayed and it will lead to incorrect results. You 
should use a real cluster with multiple hosts/workers and split the data among 
these workers to handle such high data rates.

> Overlap and data loss on WindowedBolt based on Duration
> -------------------------------------------------------
>
>                 Key: STORM-2489
>                 URL: https://issues.apache.org/jira/browse/STORM-2489
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.2
>         Environment: windows 10, eclipse, jdk1.7
>            Reporter: wangkui
>            Assignee: Arun Mahadevan
>         Attachments: TumblingWindowIssue.java
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The attachment is my test script, one of my test results is:
> ```
> expired=1...55
> get=56...4024
> new=56...4024
> Recived=3969,RecivedTotal=3969
> expired=56...4020
> get=4021...8191
> new=4025...8191
> Recived=4171,RecivedTotal=8140
> SendTotal=12175
> expired=4021...8188
> get=8189...12175
> new=8192...12175
> Recived=3987,RecivedTotal=12127
> ```
> This test result shows that some tuples appear in the expired list directly, 
> we lost these data if we just use get() to get tuples, this is the first bug.
> The second: the tuples of get() has overlap, the getNew() seems alright.
> The problem not happen definitely, may need to try several times.
> Actually, I'm newbie about storm, so I'm not sure this is a bug indeed, or, I 
> use it in wrong way?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to