Hi Zihao, Martijn,

+1 for introducing a new window type, as this is not a change to the
trigger mechanism itself, but rather a fundamental redefinition of how data
is partitioned into windows.


Best,

Feng





On Sat, May 23, 2026 at 12:07 PM zihao chen <[email protected]> wrote:

> Hi Martijn,
>
> Thanks for your insightful feedback and careful review.
>
> Your point about avoiding the mixing of physical concerns with
> logical semantics makes perfect sense, and it prompted me to rethink
> the design more thoroughly.
>
> I would like to share an updated direction below and see whether this
> aligns better with your expectations.
> 1. Original Proposal — Withdrawn
>
> I initially proposed extending the existing TUMBLE window with an
> optional STAGGER parameter, inspired by the existing DataStream
> WindowStagger, which shifts window boundaries.
>
> However, I agree with your analysis that doing so in SQL would
> silently break the deterministic alignment contract of TUMBLE.
>
> Therefore, I would like to withdraw this part of the proposal.
> 2. Hints and PTF — Deferred for Now
>
>    - Regarding Hints
>
> I agree that a hint is probably not the right abstraction here. Staggering
> changes the resulting window boundaries, while hints in
>
> Flink are generally treated as plan-intervention mechanisms that do
>
> not alter query semantics.
>
> In addition, there is currently no precedent for window-related hints
>
> in Flink SQL.
>
>
>    - Regarding PTF (Process Table Functions)
>
> I agree that PTF could ultimately become a powerful extension point
>
> for custom or user-defined windows.
>
> However, building a comprehensive PTF-based windowing framework is
>
> itself a substantial design effort and likely deserves a dedicated
>
> discussion.
>
> To keep the scope of this FLIP manageable, I would prefer to leave
>
> PTF integration as future work for now.
>
> ------------------------------
> 3. Revised Proposal — Introduce a New TVF:STAGGER_TUMBLE
>
> Since staggering fundamentally changes the window definition, I now
> believe it should be treated as a logical semantic change rather than
> a pure physical optimization.
>
> Therefore, instead of modifying TUMBLE, the cleaner approach would
> be to introduce a separate TVF with an explicit contract:
>
> STAGGER_TUMBLE(
>     TABLE data,
>     DESCRIPTOR(timecol),
>     size,
>     stagger_strategy
> )
>
> -- stagger_strategy:
> --   'RANDOM'
> --   'NATURAL'
> --   'KEY_BASED'
>
> For KEY_BASED, the requirement of a keyed context (for example,
> Window Aggregation with GROUP BY) would be validated at compile
> time.
>
> Key properties of this approach:
>
>    -
>
>    *Zero impact on TUMBLE*
>
>    The semantic contract of the existing TUMBLE TVF remains fully
>    preserved.
>    -
>
>    *Explicit semantics*
>
>    STAGGER_TUMBLE would define its own semantics explicitly,
>    including that window boundaries may vary depending on the selected
>    stagger strategy.
>
> ------------------------------
> 4. Future Work
>
> A potentially cleaner long-term direction may be to separate:
>
>    -
>
>    logical window boundary assignment, and
>    -
>
>    physical emission scheduling
>
> In other words, preserving perfectly aligned window boundaries while
> staggering only the emission timing.
>
> That would constitute a true physical optimization without changing
> query results.
>
> This could potentially evolve into an optional parameter such as
> shift_window_boundary in STAGGER_TUMBLE, and can be explored in a
> follow-up FLIP.
> ------------------------------
>
> Does this revised direction address your core concerns?
>
> I would also greatly appreciate feedback from others on the mailing
> list.
>
> If there is general consensus around this direction, I will update
> the FLIP document accordingly. Otherwise, I am happy to continue
> iterating on the design.
>
> Best regards,
>
> Zihao
>
> Martijn Visser <[email protected]> 于2026年5月21日周四 01:05写道:
>
> > Hi Zihao,
> >
> > Thanks for the FLIP. I am worried that the proposal is mixing physical
> > concerns (the downstream bursts of data) into logical semantics. I
> > think a more natural escape hatch are hints. I also think that
> > KEY_BASED is not really a physical optimization anyway, since it
> > shifts window_start / window_end values in the output and therefore
> > changes the result set. That makes it a poor fit for both a TVF
> > argument and a hint, and probably a better fit for a PTF where the
> > user explicitly owns the boundary assignment function.
> >
> > Looking forward to your thoughts.
> >
> > Best regards,
> >
> > Martijn
> >
> > Op wo 20 mei 2026 om 14:32 schreef rocxing <[email protected]>:
> > >
> > > Hi Zihao and all,
> > >
> > >
> > > Thanks a lot for this practical proposal.
> > > This is a valuable feature for Flink SQL users, and we have also
> > encountered exactly the same pain points in our production environments.
> > > Furthermore, the KEY_BASED deterministic stagger strategy is a good way
> > to eliminate non-determinism problems.
> > >
> > >
> > > Best regards,
> > > Pengxiang Wang
> >
>

Reply via email to