[
https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066186#comment-16066186
]
ASF GitHub Bot commented on FLINK-6969:
---------------------------------------
Github user sunjincheng121 commented on the issue:
https://github.com/apache/flink/pull/4183
Hi @fhueske @wuchong Thanks for your reviewing and comments. Thanks!
1. For the param rename I am not sure whether it can be shared with `early
fire` feature or not. I suggest using current name, and we can change it if we
need when we dev the `early fire` feature. And feel free to rename it in this
PR. if you and @wuchong Insist on to renaming, I am fine about that. :)
2. About timestamp and watermark:
- Timestamp:
I think we can emit records with the correct timestamps(late than
watermark,but corresponds to window time ), I thinks the code
`timestampedCollector.setAbsoluteTimestamp(window.maxTimestamp())` of
`WindowOperator#emitWindowContents` can guarantee that logic. That's meant it
late than watermark,but is correct.
- Watermark:
In current flink framework,GroupWindow and OverWindow related the
`Watermark`. So If i understand you correctly, you worry about GroupWindow
followed by a GroupWindow or a OverWindow. Let's see follows:
- For followed by GroupWindow case:
As we know `deferredComputationTime` is global configuration, i.e.
In one job all GroupWindow will using the same
TRIGGER(`DeferredComputationTrigger`), that only fires when current watermark
not smaller than ` (window.maxTimestamp +
queryConfig.getDeferredComputationTime)`. The records will be late emitted by
the Level N window, So dose the Level N+1 window. The late always is
`getDeferredComputationTime` time. i.e., This approach adds latency but can
reduce the number of update esp.(that we wanted)
- For followed by OverWindow case:
I think this approach works well for row-time OverWindow, because
Over Clause using timestamp value range the window. I think it works well If we
emit correct timestamp for records.(And we did it by
`timestampedCollector.setAbsoluteTimestamp(window.maxTimestamp())` of
`WindowOperator#emitWindowContents`)
- For select, filter...etc. I think also work well (adds latency).
3. About `Trigger+ AssignerWithPunctuatedWatermarks` (@wuchong comment
above), Trigger is late `deferredComputationTime` and Watermark change it
smaller (`deferredComputationTime`), If we have
`window1(..).winodw2(...).window3()`. the delay is increasing.
**So, the current approach only improved Level 1 group window, and
end-to-end latency is `deferredComputationTime`.**
**If we want let all the window only fired late `deferredComputationTime`,
we should think about SLA mechanism. (Which we had discussed before).**
The above description just from point of my view. So feel free to correct
me if there are any incorrect analysis.
Please let me know what you think?
Best,
SunJincheng
> Add support for deferred computation for group window aggregates
> ----------------------------------------------------------------
>
> Key: FLINK-6969
> URL: https://issues.apache.org/jira/browse/FLINK-6969
> Project: Flink
> Issue Type: New Feature
> Components: Table API & SQL
> Reporter: Fabian Hueske
> Assignee: sunjincheng
>
> Deferred computation is a strategy to deal with late arriving data and avoid
> updates of previous results. Instead of computing a result as soon as it is
> possible (i.e., when a corresponding watermark was received), deferred
> computation adds a configurable amount of slack time in which late data is
> accepted before the result is compute. For example, instead of computing a
> tumbling window of 1 hour at each full hour, we can add a deferred
> computation interval of 15 minute to compute the result quarter past each
> full hour.
> This approach adds latency but can reduce the number of update esp. in use
> cases where the user cannot influence the generation of watermarks. It is
> also useful if the data is emitted to a system that cannot update result
> (files or Kafka). The deferred computation interval should be configured via
> the {{QueryConfig}}.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)