Thank you for your reply @Leonard
Firstly the example result is a little strange for me too, the print
> window_time looks incorrect, Could you post your entire example especially
> your session time zone?
You can modify any of the tests in WindowAggregateITCase[1], e.g.
testEventTimeTumbleWindow:
@TestTemplate
def testEventTimeTumbleWindow(): Unit = {
val sql =
"""
|SELECT
| `name`,
| window_start,
| window_end,
| window_time,
| COUNT(*),
| SUM(`bigdec`),
| MAX(`double`),
| MIN(`float`),
| COUNT(DISTINCT `string`),
| concat_distinct_agg(`string`)
|FROM TABLE(
| TUMBLE(TABLE T1, DESCRIPTOR(rowtime), INTERVAL '5' SECOND))
|GROUP BY `name`, window_start, window_end, window_time
""".stripMargin
val sink = new TestingAppendSink
tEnv.sqlQuery(sql).toDataStream.addSink(sink)
env.execute()
}
and you get the misleading results for timestamp_ltz:
a,2020-10-10T00:00,2020-10-10T00:00:05,2020-10-09T16:00:04.999Z,4,11.10,5.0,1.0,2,Hi|Comment#1
a,2020-10-10T00:00:05,2020-10-10T00:00:10,2020-10-09T16:00:09.999Z,1,3.33,null,3.0,1,Comment#2
b,2020-10-10T00:00:05,2020-10-10T00:00:10,2020-10-09T16:00:09.999Z,2,6.66,6.0,3.0,2,Hello|Hi
b,2020-10-10T00:00:15,2020-10-10T00:00:20,2020-10-09T16:00:19.999Z,1,4.44,4.0,4.0,1,Hi
b,2020-10-10T00:00:30,2020-10-10T00:00:35,2020-10-09T16:00:34.999Z,1,3.33,3.0,3.0,1,Comment#3
null,2020-10-10T00:00:30,2020-10-10T00:00:35,2020-10-09T16:00:34.999Z,1,7.77,7.0,7.0,0,null
We aims to address window correctness issue in DST timezone, there’re
> detailed explanation in CALCITE-4563.
Could you please explain that a bit more? I don't understand the problem.
>From my point of view, the problem you're describing there originates
exactly from the fact that we mix up TIMESTAMP_LTZ with TIMESTAMP. The way
I see it is that we want to put TIMESTAMP_LTZ into the windows of TIMESTAMP
type. TIMESTAMP_LTZ has Instant semantics, and as such I don't really
understand how DST comes to play there. Instant clearly identifies a point
in time and thus should be nicely grouped into equal windows.
What you're describing in the linked JIRA, in my opinion, is that you have
a TIMESTAMP_LTZ time attribute (instant semantics), but you want to group
by wall clock semantics (TIMESTAMP). I think this should be achieved, if
necessary, by first casting the time attribute to TIMESTAMP and then
performing the grouping. The casting would already take care of the DST
shift.
I still believe that window_start, window_end and window_time should return
the same type based on the input time attribute type.
Happy to hear your thoughts.
Best,
Dawid
On Fri, 8 Nov 2024 at 08:21, Leonard Xu <[email protected]> wrote:
> Thanks Dawid for bringing this ticket to dev mailing list and Timo’s
> kindly ping.
>
> Firstly the example result is a little strange for me too, the print
> window_time looks incorrect, Could you post your entire example especially
> your session time zone?
>
> Back to the window_start/end return type, both window TVF and legacy
> SqlGroupedWindowFunction share same return type TIMESTAMP which means
> timestamp literal, and it’s by design. We aims to address window
> correctness issue in DST timezone, there’re detailed explanation in
> CALCITE-4563.
>
>
> Best,
> Leonard
>
> [1]https://issues.apache.org/jira/browse/CALCITE-4563
>
>
>
> >> I wanted to bring your attention to FLINK-36665[1].
> >> I believe the current behaviour is confusing and I'd like to fix it.
> >> However, since window operations are a very important feature I'd like
> to
> >> gather feedback on to what extent we should keep backwards
> compatibility.
> >> 1. How should newly submitted queries behave? Are we fine with
> changing
> >> the inference of these functions or would you prefer to have a
> feature flag
> >> that would let us revert to the old inference logic? My preference
> would be
> >> to simply change the inference. The current behaviour is very
> confusing and
> >> I'd keep the behaviour for restored queries (see 2.)
> >> 2. My plan for migrated queries (queries restored from a compiled
> plan)
> >> is that they won't be impacted. They'll keep producing the same
> results. We
> >> have the output types serialized in the compiled plan which we can
> use to
> >> produce the same type as before.
> >> What do you think?
> >> Best,
> >> Dawid
> >> [1] https://issues.apache.org/jira/browse/FLINK-36665
> >
>
>