[
https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072367#comment-16072367
]
ASF GitHub Bot commented on FLINK-6969:
---------------------------------------
Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/4183
Thanks for your thoughts @sunjincheng121 and @wuchong.
I thought about this again and agree with you. We should have a separate
parameters to specify watermark adjustments and early firings.
I propose the following:
1. We add a parameter `lateDataTimeOffset` which adjusts the watermarks at
the source (actually at all sources of a query) by injecting a custom operator.
The parameter can be positive or negative and adjusts the watermarks. I think
the name is good because the watermarks control the lateness of records. Also,
Table API / SQL users should not need to know about the concept of watermarks.
**This is done as part of this issue / PR.**
2. We add a parameter `earlyResultTimeOffset` which defines the time when
the first early result (e.g., of a windowed aggregate) is computed. The
parameter should be negative, i.e., a value of `-30.mins` results in early
results which start 30 minute before the watermark reaches the end of a window.
3. The same `updateRate` parameter as in
[FLINK-6649](https://issues.apache.org/jira/browse/FLINK-6649) / PR #4157 is
used to control how often early results are updated. I don't think we need a
special parameters for early result or late data updates.
**2. and 3. are addressed in a separate issue.**
What do you think?
> Add support for deferred computation for group window aggregates
> ----------------------------------------------------------------
>
> Key: FLINK-6969
> URL: https://issues.apache.org/jira/browse/FLINK-6969
> Project: Flink
> Issue Type: New Feature
> Components: Table API & SQL
> Reporter: Fabian Hueske
> Assignee: sunjincheng
>
> Deferred computation is a strategy to deal with late arriving data and avoid
> updates of previous results. Instead of computing a result as soon as it is
> possible (i.e., when a corresponding watermark was received), deferred
> computation adds a configurable amount of slack time in which late data is
> accepted before the result is compute. For example, instead of computing a
> tumbling window of 1 hour at each full hour, we can add a deferred
> computation interval of 15 minute to compute the result quarter past each
> full hour.
> This approach adds latency but can reduce the number of update esp. in use
> cases where the user cannot influence the generation of watermarks. It is
> also useful if the data is emitted to a system that cannot update result
> (files or Kafka). The deferred computation interval should be configured via
> the {{QueryConfig}}.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)