[ 
https://issues.apache.org/jira/browse/FLINK-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16465709#comment-16465709
 ] 

Stephan Ewen commented on FLINK-7001:
-------------------------------------

Improving windowing performance is something we are also very interested in. 
The thoughts so far were the following:

  - The current window operator is very generic, assuming **unaligned** 
windows, like session windows.

  - Using an **aligned window** implementation wherever possible (tumbling and 
sliding windows) which uses panes (and still duplicates data in sliding 
windows) would already help by having only one timer per window, rather than 
per window and key.

  - An optimization that avoids duplicating data through panes would be great, 
but it is not very simple. We need a way to efficiently merging panes on 
firing, and need to build the blocks for that such that it also works on 
RocksDB.

  - We need to bear in mind that all solutions need to be able to handle late 
events and early/late firing triggers.

  - Because the window operator is a very core stateful operator, we need to 
have a solution to maintain compatibility with prior savepoints.


I have to look at the multi-query optimization part suggested above. I would 
like to understand what the tradeoff in latency and complexity is, versus how 
often does this implementation help.

> Improve performance of Sliding Time Window with pane optimization
> -----------------------------------------------------------------
>
>                 Key: FLINK-7001
>                 URL: https://issues.apache.org/jira/browse/FLINK-7001
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataStream API
>            Reporter: Jark Wu
>            Assignee: Jark Wu
>            Priority: Major
>
> Currently, the implementation of time-based sliding windows treats each 
> window individually and replicates records to each window. For a window of 10 
> minute size that slides by 1 second the data is replicated 600 fold (10 
> minutes / 1 second). We can optimize sliding window by divide windows into 
> panes (aligned with slide), so that we can avoid record duplication and 
> leverage the checkpoint.
> I will attach a more detail design doc to the issue.
> The following issues are similar to this issue: FLINK-5387, FLINK-6990



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to