Github user sunjincheng121 commented on the issue:
https://github.com/apache/flink/pull/4183
Hi @fhueske @wuchong Thanks for your reviewing and comments. Thanks!
1. For the param rename I am not sure whether it can be shared with `early
fire` feature or not. I suggest using current name, and we can change it if we
need when we dev the `early fire` feature. And feel free to rename it in this
PR. if you and @wuchong Insist on to renaming, I am fine about that. :)
2. About timestamp and watermark:
- Timestamp:
I think we can emit records with the correct timestamps(late than
watermark,but corresponds to window time ), I thinks the code
`timestampedCollector.setAbsoluteTimestamp(window.maxTimestamp())` of
`WindowOperator#emitWindowContents` can guarantee that logic. That's meant it
late than watermark,but is correct.
- Watermark:
In current flink frameworkï¼GroupWindow and OverWindow related the
`Watermark`. So If i understand you correctly, you worry about GroupWindow
followed by a GroupWindow or a OverWindow. Let's see follows:
- For followed by GroupWindow case:
As we know `deferredComputationTime` is global configuration, i.e.
In one job all GroupWindow will using the same
TRIGGER(`DeferredComputationTrigger`), that only fires when current watermark
not smaller than ` (window.maxTimestamp +
queryConfig.getDeferredComputationTime)`. The records will be late emitted by
the Level N window, So dose the Level N+1 window. The late always is
`getDeferredComputationTime` time. i.e., This approach adds latency but can
reduce the number of update esp.(that we wanted)
- For followed by OverWindow case:
I think this approach works well for row-time OverWindow, because
Over Clause using timestamp value range the window. I think it works well If we
emit correct timestamp for records.(And we did it by
`timestampedCollector.setAbsoluteTimestamp(window.maxTimestamp())` of
`WindowOperator#emitWindowContents`)
- For select, filter...etc. I think also work well (adds latency).
3. About `Trigger+ AssignerWithPunctuatedWatermarks` (@wuchong comment
above), Trigger is late `deferredComputationTime` and Watermark change it
smaller (`deferredComputationTime`), If we have
`window1(..).winodw2(...).window3()`. the delay is increasing.
**So, the current approach only improved Level 1 group window, and
end-to-end latency is `deferredComputationTime`.**
**If we want let all the window only fired late `deferredComputationTime`,
we should think about SLA mechanism. (Which we had discussed before).**
The above description just from point of my view. So feel free to correct
me if there are any incorrect analysis.
Please let me know what you think?
Best,
SunJincheng
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---