Hi Zihao, Martijn,
+1 for introducing a new window type, as this is not a change to the trigger mechanism itself, but rather a fundamental redefinition of how data is partitioned into windows. Best, Feng On Sat, May 23, 2026 at 12:07 PM zihao chen <[email protected]> wrote: > Hi Martijn, > > Thanks for your insightful feedback and careful review. > > Your point about avoiding the mixing of physical concerns with > logical semantics makes perfect sense, and it prompted me to rethink > the design more thoroughly. > > I would like to share an updated direction below and see whether this > aligns better with your expectations. > 1. Original Proposal — Withdrawn > > I initially proposed extending the existing TUMBLE window with an > optional STAGGER parameter, inspired by the existing DataStream > WindowStagger, which shifts window boundaries. > > However, I agree with your analysis that doing so in SQL would > silently break the deterministic alignment contract of TUMBLE. > > Therefore, I would like to withdraw this part of the proposal. > 2. Hints and PTF — Deferred for Now > > - Regarding Hints > > I agree that a hint is probably not the right abstraction here. Staggering > changes the resulting window boundaries, while hints in > > Flink are generally treated as plan-intervention mechanisms that do > > not alter query semantics. > > In addition, there is currently no precedent for window-related hints > > in Flink SQL. > > > - Regarding PTF (Process Table Functions) > > I agree that PTF could ultimately become a powerful extension point > > for custom or user-defined windows. > > However, building a comprehensive PTF-based windowing framework is > > itself a substantial design effort and likely deserves a dedicated > > discussion. > > To keep the scope of this FLIP manageable, I would prefer to leave > > PTF integration as future work for now. > > ------------------------------ > 3. Revised Proposal — Introduce a New TVF:STAGGER_TUMBLE > > Since staggering fundamentally changes the window definition, I now > believe it should be treated as a logical semantic change rather than > a pure physical optimization. > > Therefore, instead of modifying TUMBLE, the cleaner approach would > be to introduce a separate TVF with an explicit contract: > > STAGGER_TUMBLE( > TABLE data, > DESCRIPTOR(timecol), > size, > stagger_strategy > ) > > -- stagger_strategy: > -- 'RANDOM' > -- 'NATURAL' > -- 'KEY_BASED' > > For KEY_BASED, the requirement of a keyed context (for example, > Window Aggregation with GROUP BY) would be validated at compile > time. > > Key properties of this approach: > > - > > *Zero impact on TUMBLE* > > The semantic contract of the existing TUMBLE TVF remains fully > preserved. > - > > *Explicit semantics* > > STAGGER_TUMBLE would define its own semantics explicitly, > including that window boundaries may vary depending on the selected > stagger strategy. > > ------------------------------ > 4. Future Work > > A potentially cleaner long-term direction may be to separate: > > - > > logical window boundary assignment, and > - > > physical emission scheduling > > In other words, preserving perfectly aligned window boundaries while > staggering only the emission timing. > > That would constitute a true physical optimization without changing > query results. > > This could potentially evolve into an optional parameter such as > shift_window_boundary in STAGGER_TUMBLE, and can be explored in a > follow-up FLIP. > ------------------------------ > > Does this revised direction address your core concerns? > > I would also greatly appreciate feedback from others on the mailing > list. > > If there is general consensus around this direction, I will update > the FLIP document accordingly. Otherwise, I am happy to continue > iterating on the design. > > Best regards, > > Zihao > > Martijn Visser <[email protected]> 于2026年5月21日周四 01:05写道: > > > Hi Zihao, > > > > Thanks for the FLIP. I am worried that the proposal is mixing physical > > concerns (the downstream bursts of data) into logical semantics. I > > think a more natural escape hatch are hints. I also think that > > KEY_BASED is not really a physical optimization anyway, since it > > shifts window_start / window_end values in the output and therefore > > changes the result set. That makes it a poor fit for both a TVF > > argument and a hint, and probably a better fit for a PTF where the > > user explicitly owns the boundary assignment function. > > > > Looking forward to your thoughts. > > > > Best regards, > > > > Martijn > > > > Op wo 20 mei 2026 om 14:32 schreef rocxing <[email protected]>: > > > > > > Hi Zihao and all, > > > > > > > > > Thanks a lot for this practical proposal. > > > This is a valuable feature for Flink SQL users, and we have also > > encountered exactly the same pain points in our production environments. > > > Furthermore, the KEY_BASED deterministic stagger strategy is a good way > > to eliminate non-determinism problems. > > > > > > > > > Best regards, > > > Pengxiang Wang > > >
