[ https://issues.apache.org/jira/browse/STORM-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992758#comment-15992758 ]
Arun Mahadevan commented on STORM-2489: --------------------------------------- [~wangkui], the initial tuples expired because the trigger was not fired exactly after the window interval but after a delay. When I tested in local mode with spout emitting without a delay, the trigger happened after 6s (for a 4s tumbling window). This may be because the system is overwhelmed with data and not able to schedule the trigger thread on time. In this case the initial tuples (0 - 2s) will not be considered in the first window. Typically the window duration should be such that all the tuples within a window can be processed before the next window trigger, otherwise the next window trigger will be delayed and it will lead to incorrect results. You should use a real cluster with multiple hosts/workers and split the data among these workers to handle such high data rates. > Overlap and data loss on WindowedBolt based on Duration > ------------------------------------------------------- > > Key: STORM-2489 > URL: https://issues.apache.org/jira/browse/STORM-2489 > Project: Apache Storm > Issue Type: Bug > Components: storm-core > Affects Versions: 1.0.2 > Environment: windows 10, eclipse, jdk1.7 > Reporter: wangkui > Assignee: Arun Mahadevan > Attachments: TumblingWindowIssue.java > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The attachment is my test script, one of my test results is: > ``` > expired=1...55 > get=56...4024 > new=56...4024 > Recived=3969,RecivedTotal=3969 > expired=56...4020 > get=4021...8191 > new=4025...8191 > Recived=4171,RecivedTotal=8140 > SendTotal=12175 > expired=4021...8188 > get=8189...12175 > new=8192...12175 > Recived=3987,RecivedTotal=12127 > ``` > This test result shows that some tuples appear in the expired list directly, > we lost these data if we just use get() to get tuples, this is the first bug. > The second: the tuples of get() has overlap, the getNew() seems alright. > The problem not happen definitely, may need to try several times. > Actually, I'm newbie about storm, so I'm not sure this is a bug indeed, or, I > use it in wrong way? -- This message was sent by Atlassian JIRA (v6.3.15#6346)