[ 
https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066306#comment-16066306
 ] 

ASF GitHub Bot commented on FLINK-6969:
---------------------------------------

Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/4183
  
    Hi @fhueske , jincheng and me have discussed offline and have an unified 
opinion.
    
    We think the approach of window(with custom trigger) + custom operator will 
increase the latency when multiple window aggregate is applied. For example, 
defer 1 hour to compute with two level window aggregates. The output and 
watermark of the first level window have been deferred 1 hour. In fact, the 
output is in order now. However, the second window will still defer 1 hour to 
compute which result in the result is delayed for two hours. 
    
    We think we only need to hold the watermark back at source, then all the 
downstream window operators will defer the given offset to compute. And the 
end-to-end latency is the given offset.
    
    Therefor, we propose to add a custom operator (to offset the watermark) 
after the source (the DataStream in the physical plan tree root). And no custom 
trigger needed.
    
    What do you think? 
    
    
    



> Add support for deferred computation for group window aggregates
> ----------------------------------------------------------------
>
>                 Key: FLINK-6969
>                 URL: https://issues.apache.org/jira/browse/FLINK-6969
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: sunjincheng
>
> Deferred computation is a strategy to deal with late arriving data and avoid 
> updates of previous results. Instead of computing a result as soon as it is 
> possible (i.e., when a corresponding watermark was received), deferred 
> computation adds a configurable amount of slack time in which late data is 
> accepted before the result is compute. For example, instead of computing a 
> tumbling window of 1 hour at each full hour, we can add a deferred 
> computation interval of 15 minute to compute the result quarter past each 
> full hour.
> This approach adds latency but can reduce the number of update esp. in use 
> cases where the user cannot influence the generation of watermarks. It is 
> also useful if the data is emitted to a system that cannot update result 
> (files or Kafka). The deferred computation interval should be configured via 
> the {{QueryConfig}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to