Hi Martijn, I hope you are doing well.
I wanted to follow up on the revised proposal for STAGGER_TUMBLE that I shared last week. I am particularly interested to hear whether this updated direction addresses the concerns you raised about mixing physical concerns with logical semantics. Your feedback would be greatly appreciated. If you have any additional thoughts or suggestions, I would be happy to incorporate them. Thank you very much for your time and guidance. Best regards, Zihao Feng Jin <[email protected]> 于2026年5月25日周一 18:58写道: > Hi Zihao, Martijn, > > > +1 for introducing a new window type, as this is not a change to the > trigger mechanism itself, but rather a fundamental redefinition of how data > is partitioned into windows. > > > Best, > > Feng > > > > > > On Sat, May 23, 2026 at 12:07 PM zihao chen <[email protected]> wrote: > > > Hi Martijn, > > > > Thanks for your insightful feedback and careful review. > > > > Your point about avoiding the mixing of physical concerns with > > logical semantics makes perfect sense, and it prompted me to rethink > > the design more thoroughly. > > > > I would like to share an updated direction below and see whether this > > aligns better with your expectations. > > 1. Original Proposal — Withdrawn > > > > I initially proposed extending the existing TUMBLE window with an > > optional STAGGER parameter, inspired by the existing DataStream > > WindowStagger, which shifts window boundaries. > > > > However, I agree with your analysis that doing so in SQL would > > silently break the deterministic alignment contract of TUMBLE. > > > > Therefore, I would like to withdraw this part of the proposal. > > 2. Hints and PTF — Deferred for Now > > > > - Regarding Hints > > > > I agree that a hint is probably not the right abstraction here. > Staggering > > changes the resulting window boundaries, while hints in > > > > Flink are generally treated as plan-intervention mechanisms that do > > > > not alter query semantics. > > > > In addition, there is currently no precedent for window-related hints > > > > in Flink SQL. > > > > > > - Regarding PTF (Process Table Functions) > > > > I agree that PTF could ultimately become a powerful extension point > > > > for custom or user-defined windows. > > > > However, building a comprehensive PTF-based windowing framework is > > > > itself a substantial design effort and likely deserves a dedicated > > > > discussion. > > > > To keep the scope of this FLIP manageable, I would prefer to leave > > > > PTF integration as future work for now. > > > > ------------------------------ > > 3. Revised Proposal — Introduce a New TVF:STAGGER_TUMBLE > > > > Since staggering fundamentally changes the window definition, I now > > believe it should be treated as a logical semantic change rather than > > a pure physical optimization. > > > > Therefore, instead of modifying TUMBLE, the cleaner approach would > > be to introduce a separate TVF with an explicit contract: > > > > STAGGER_TUMBLE( > > TABLE data, > > DESCRIPTOR(timecol), > > size, > > stagger_strategy > > ) > > > > -- stagger_strategy: > > -- 'RANDOM' > > -- 'NATURAL' > > -- 'KEY_BASED' > > > > For KEY_BASED, the requirement of a keyed context (for example, > > Window Aggregation with GROUP BY) would be validated at compile > > time. > > > > Key properties of this approach: > > > > - > > > > *Zero impact on TUMBLE* > > > > The semantic contract of the existing TUMBLE TVF remains fully > > preserved. > > - > > > > *Explicit semantics* > > > > STAGGER_TUMBLE would define its own semantics explicitly, > > including that window boundaries may vary depending on the selected > > stagger strategy. > > > > ------------------------------ > > 4. Future Work > > > > A potentially cleaner long-term direction may be to separate: > > > > - > > > > logical window boundary assignment, and > > - > > > > physical emission scheduling > > > > In other words, preserving perfectly aligned window boundaries while > > staggering only the emission timing. > > > > That would constitute a true physical optimization without changing > > query results. > > > > This could potentially evolve into an optional parameter such as > > shift_window_boundary in STAGGER_TUMBLE, and can be explored in a > > follow-up FLIP. > > ------------------------------ > > > > Does this revised direction address your core concerns? > > > > I would also greatly appreciate feedback from others on the mailing > > list. > > > > If there is general consensus around this direction, I will update > > the FLIP document accordingly. Otherwise, I am happy to continue > > iterating on the design. > > > > Best regards, > > > > Zihao > > > > Martijn Visser <[email protected]> 于2026年5月21日周四 01:05写道: > > > > > Hi Zihao, > > > > > > Thanks for the FLIP. I am worried that the proposal is mixing physical > > > concerns (the downstream bursts of data) into logical semantics. I > > > think a more natural escape hatch are hints. I also think that > > > KEY_BASED is not really a physical optimization anyway, since it > > > shifts window_start / window_end values in the output and therefore > > > changes the result set. That makes it a poor fit for both a TVF > > > argument and a hint, and probably a better fit for a PTF where the > > > user explicitly owns the boundary assignment function. > > > > > > Looking forward to your thoughts. > > > > > > Best regards, > > > > > > Martijn > > > > > > Op wo 20 mei 2026 om 14:32 schreef rocxing <[email protected]>: > > > > > > > > Hi Zihao and all, > > > > > > > > > > > > Thanks a lot for this practical proposal. > > > > This is a valuable feature for Flink SQL users, and we have also > > > encountered exactly the same pain points in our production > environments. > > > > Furthermore, the KEY_BASED deterministic stagger strategy is a good > way > > > to eliminate non-determinism problems. > > > > > > > > > > > > Best regards, > > > > Pengxiang Wang > > > > > >
