Hi, Shengkai.
> I think we shouldn't remove the operator if we can not give a solution to > help users upgrade their jobs. But I think we can delay the discussion until > we need to remove the operator. +1 for it. -- Best! Xuyang 在 2023-12-08 19:22:40,"Shengkai Fang" <fskm...@gmail.com> 写道: Hi, Xuyang. Thanks for your response. I just thought an another way to solve this question instead of introducing a new configuration. When using legacy syntax like `GROUP BY TUMBLE(xxx), f0`, the rewritten sql can be GROUP BY f0, window_start, window_end(window_start and window_end is produced by WINDOW TVF). We can use field index here and use a new Calc node to alias them to avoid field name conflict in WindowAggregate node. The solution is much better than before and I think it can solve the problem I mentioned before. What about using config named "table.optimizer.window-rewrite-enabled" +1 IIRC, currently compatibility across middle versions of SQL is not guaranteed. Should we add constraints on this part? I think we shouldn't remove the operator if we can not give a solution to help users upgrade their jobs. But I think we can delay the discussion until we need to remove the operator. Best, Shengkai Xuyang <xyzhong...@163.com> 于2023年12月8日周五 18:07写道: Hi, Martijn, thanks for your share. >On the topic of syntax for early/late fires, there is existing >configuration for the legacy group windows: > >SET table.exec.emit.early-fire.enabled = true; >SET table.exec.emit.early-fire.delay = 5s; >SET table.exec.emit.late-fire.enabled = true; >SET table.exec.emit.late-fire.delay = 0; >SET table.exec.emit.allow-lateness = 5s; > >We should stick with the syntax for the TVFs, and not modify that. Agree with you. We should follow the syntax defined in Flip-145. As for how to let these options to only take effect on a single window agg instead of all window aggs, we need to think of another way. >On the topic of column naming, for other situations where a user wants >to use a value that's already reserved, we require the user to include >backticks to indicate that Flink should not use the reserved keyword >implementation. Why isn't that sufficient in this case? I rather stay >consistent with this behaviour instead of introducing new config >options. > I just thought an another way to solve this question instead of introducing a new configuration. When using legacy syntax like `GROUP BY TUMBLE(xxx), f0`, the rewritten sql can be GROUP BY f0, window_start, window_end(window_start and window_end is produced by WINDOW TVF). We can use field index here and use a new Calc node to alias them to avoid field name conflict in WindowAggregate node. What about this idea, cc @Shengkai Fang ? >I would propose to rename the FLIP to >something like "Add Missing Table Valued Functions Features to Replace >Legacy Group Window Aggregation". IMO, the original title "deprecate" maybe more clear. Because although the doc has deprecated the legacy Group Window Aggregation syntax just like what I said in this Flip, but the DEPRECATED tag in doc is attached when doing the Flip-145. Let's review the content in Flip-145 again. The Flip-145 said "The existing Grouped window functions, i.e. GROUP BY TUMBLE... are still supported, but will be deprecated". So as a work to follow Flip-145, this Flip officially deprecates the legacy window syntax at this time. WDTK? -- Best! Xuyang At 2023-12-07 16:51:44, "Martijn Visser" <martijnvis...@apache.org> wrote: >Hi Xuyang, > >Thanks a lot for starting this discussion. > >At first, I was a bit confused because the FLIP talks about >deprecating the Legacy Group Window Aggregations, but they have >already been marked as deprecated in the documentation [1]. >My understanding was that the big challenge was that we don't yet >support SESSION windows in the Window TVF, and that the other features >you've mentioned in the discussion threads are additional >capabilities. > >However, when reading up on the actual FLIP (your discussion email >didn't include a link [2] to it) I now understand the situation. The >appendix table is actually the most valuable for me, because it gives >me the overview of the missing capabilities between TVF implementation >and the Legacy Group Window Aggregations. > >On the topic of syntax for early/late fires, there is existing >configuration for the legacy group windows: > >SET table.exec.emit.early-fire.enabled = true; >SET table.exec.emit.early-fire.delay = 5s; >SET table.exec.emit.late-fire.enabled = true; >SET table.exec.emit.late-fire.delay = 0; >SET table.exec.emit.allow-lateness = 5s; > >We should stick with the syntax for the TVFs, and not modify that. > >On the topic of column naming, for other situations where a user wants >to use a value that's already reserved, we require the user to include >backticks to indicate that Flink should not use the reserved keyword >implementation. Why isn't that sufficient in this case? I rather stay >consistent with this behaviour instead of introducing new config >options. > >Overall, +1 on this topic. I would propose to rename the FLIP to >something like "Add Missing Table Valued Functions Features to Replace >Legacy Group Window Aggregation". > >Best regards, > >Martijn > >[1] >https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/window-agg/#group-window-aggregation >[2] >https://cwiki.apache.org/confluence/display/FLINK/FLIP-392%3A+Deprecate+the+Legacy+Group+Window+Aggregation > > >On Wed, Dec 6, 2023 at 9:32 AM Xuyang <xyzhong...@163.com> wrote: >> >> Hi, Shengkai. Thanks to share your thought. Let me answer related questions。 >> >> >> > Could you give an example about the pass-through column. A session window >> > may contain multiple rows, which value is selected by the windowoperator? >> >> >> The table function make the entire inpyt row available in the output. Take >> an example in Flink, if the input row is <a,b,c> with schema <f0,f1,f2>, >> the output of the window tvf can also access the <a,b,c> with schema >> <f0,f1,f2>. The session window always output these multi rows with attached >> window_start, >> window_end and window_time column. >> >> >> > What's the behavior if all the data in the window have been removed? >> >> >> Align the behavior of existing group window agg, and do not output data when >> the window is empty. >> >> >> > Could you explain more details about the how window process CDC? For >> > example, what's the behavior if the window only gets the DELETE data from >> > the upstream Operator. >> >> >> Also align the behavior of existing group window agg. However, there is a >> bug in group window agg that currently group window agg has different >> results when only >> consuming -D records while using or not using minibatch. Refs more at >> FLINK-33760. >> >> >> > The subtitle is not correct here. >> >> >> Updated the doc to fix it. >> >> >> > It's better if we can introduce a syntax like the `emit` keyword to set >> > the emit strategy for every window. >> I agree with you. But I don't recommend using this syntax SESSION(data >> [PARTITION BY (keycols, ...)], DESCRIPTOR(timecol), gap, emit='') because it >> breaks the syntax in Flip-145. >> I think using query hint is a better idea. Anyway, this work should belong >> to another flip. >> >> >> > I think more work should be mentioned in the FLIP. What's the behavior if >> > the input schema contains a column named `window_start`? >> > In the current design, `window_start` is a reserved keyword in the window >> > TVF syntax, but it is legal in the legacy implementation. >> >> >> This is a good question. Perhaps we can introduce a config to add a specific >> config (named "table.window-additional-columns.prefix") to add a prefix >> of all window addition columns to solve this situation. For example, user >> can set the conf to "$", and the additional column from window will become >> "$window_start", "$window_end" and "$window_time". WDYT? >> >> >> > In the FLIP, you mention the FLIP should introduce an option to fall back >> > to the legacy behavior. Could you tell us what's the name of the option? >> > BTW, I think we should unify the implementation when window TVF can do all >> > the work that the legacy operator can do and there is no need to introduce >> > an option to fallback. >> >> >> What about using config named "table.optimizer.window-rewrite-enabled"? If >> I agree with you that util all features are aligned and everything is ok >> about window tvf, we should also remove this config about fallback. >> But I think to be on the safe side, we can observe one or two versions of >> this rewrite, and allows users to roll back when problems arise. >> >> >> > If we remove the legacy window operator in the future, how users upgrade >> > their jobs? Do you have any plan to support state migration from the >> > legacy window to Windows TVF? >> IIRC, currently compatibility across middle versions of SQL is not >> guaranteed. Should we add constraints on this part? >> >> >> Look for your feedback! >> >> >> >> >> >> >> >> -- >> >> Best! >> Xuyang >> >> >> >> >> >> At 2023-12-05 12:06:27, "liu ron" <ron9....@gmail.com> wrote: >> >Hi, xuyang >> > >> >Thanks for starting this FLIP discussion, currently there are two types of >> >window aggregation in Flink SQL, namely legacy group window aggregation and >> >window tvf aggregation, these two types of window aggregation are not fully >> >aligned in behavior, which will bring a lot of confusion to the users, so >> >there is a need to unify and align them. I think the final ideal state >> >should be that there is only one window tvf aggregation, which supports >> >Tumble, HOP, Cumulate and Session windows, and supports consuming CDC data >> >streams. There is also support for configuring EARLY-FIRE and LATER-FIRE. >> > >> >This FLIP is a continuation of FLIP-145, and also supports legacy group >> >window aggregation to flat-migrate to the new window tvf agregation, which >> >is very useful, especially for the support of CDC streams, a pain point >> >that users often feedback. Big +1 for this FLIP. >> > >> >Best, >> >Ron >> > >> >Xuyang <xyzhong...@163.com> 于2023年12月5日周二 11:11写道: >> > >> >> Hi, Feng and David. >> >> >> >> >> >> Thank you very much to share your thoughts. >> >> >> >> >> >> This flip does not include the official exposure of these experimental >> >> conf to users. Thus there is not adetailed description of this part. >> >> However, in view that some technical users may have added these >> >> experimental conf in actual production jobs, the processing >> >> of these conf while using window tvf syntax has been added to this flip. >> >> >> >> >> >> Overall, the behavior of using these experimental parameters is no >> >> different from before, and I think we should provide the compatibility >> >> about using these experimental conf. >> >> >> >> >> >> Look for your thoughs. >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> Best! >> >> Xuyang >> >> >> >> >> >> >> >> >> >> >> >> At 2023-12-05 09:17:49, "David Anderson" <dander...@apache.org> wrote: >> >> >The current situation (where we have both the legacy windows and the >> >> >TVF-based windows) is confusing for users, and I'd like to see us move >> >> >forward as rapidly as possible. >> >> > >> >> >Since the early fire, late fire, and allowed lateness features were never >> >> >documented or exposed to users, I don't feel that we need to provide >> >> >replacements for these internal, experimental features before officially >> >> >deprecating the legacy group window aggregations, and I'd rather not >> >> >wait. >> >> > >> >> >However, I'd be delighted to see a proposal for what that might look >> >> >like. >> >> > >> >> >Best, >> >> >David >> >> > >> >> >On Mon, Dec 4, 2023 at 12:45 PM Feng Jin <jinfeng1...@gmail.com> wrote: >> >> > >> >> >> Hi xuyang, >> >> >> >> >> >> Thank you for initiating this proposal. >> >> >> >> >> >> I'm glad to see that TVF's functionality can be fully supported. >> >> >> >> >> >> Regarding the early fire, late fire, and allow lateness features, how >> >> will >> >> >> they be provided to users? The documentation doesn't seem to provide a >> >> >> detailed description of this part. >> >> >> >> >> >> Since this FLIP will also involve a lot of feature development, I am >> >> more >> >> >> than willing to help, including development and code review. >> >> >> >> >> >> Best, >> >> >> Feng >> >> >> >> >> >> On Tue, Nov 28, 2023 at 8:31 PM Xuyang <xyzhong...@163.com> wrote: >> >> >> >> >> >> > Hi all. >> >> >> > I'd like to start a discussion of FLIP-392: Deprecate the Legacy >> >> >> > Group >> >> >> > Window Aggregation. >> >> >> > >> >> >> > >> >> >> > Although the current Flink SQL Window Aggregation documentation[1] >> >> >> > indicates that the legacy Group Window Aggregation >> >> >> > syntax has been deprecated, the new Window TVF Aggregation syntax has >> >> not >> >> >> > fully covered all of the features of the legacy one. >> >> >> > >> >> >> > >> >> >> > Compared to Group Window Aggergation, Window TVF Aggergation has >> >> several >> >> >> > advantages, such as two-stage optimization, >> >> >> > support for standard GROUPING SET syntax, and so on. However, it >> >> needs to >> >> >> > supplement and enrich the following features. >> >> >> > >> >> >> > >> >> >> > 1. Support for SESSION Window TVF Aggregation >> >> >> > 2. Support for consuming CDC stream >> >> >> > 3. Support for HOP window size with non-integer step length >> >> >> > 4. Support for configurations such as early fire, late fire and allow >> >> >> > lateness >> >> >> > (which are internal experimental configurations in Group Window >> >> >> > Aggregation and not public to users yet.) >> >> >> > 5. Unification of the Window TVF Aggregation operator in runtime at >> >> the >> >> >> > implementation layer >> >> >> > (In the long term, the cost to maintain the operators about Window >> >> >> > TVF >> >> >> > Aggregation and Group Window Aggregation is too expensive.) >> >> >> > >> >> >> > >> >> >> > This flip aims to continue the unfinished work in FLIP-145[2], which >> >> is >> >> >> to >> >> >> > fully enable the capabilities of Window TVF Aggregation >> >> >> > and officially deprecate the legacy syntax Group Window Aggregation, >> >> to >> >> >> > prepare for the removal of the legacy one in Flink 2.0. >> >> >> > >> >> >> > >> >> >> > I have already done some preliminary POC to validate the feasibility >> >> of >> >> >> > the related work in this flip as follows. >> >> >> > 1. POC for SESSION Window TVF Aggregation [3] >> >> >> > 2. POC for CUMULATE in Group Window Aggregation operator [4] >> >> >> > 3. POC for consuming CDC stream in Window Aggregation operator [5] >> >> >> > >> >> >> > >> >> >> > Looking forward to your feedback and thoughts! >> >> >> > >> >> >> > >> >> >> > >> >> >> > [1] >> >> >> > >> >> >> >> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/ >> >> >> > >> >> >> > [2] >> >> >> > >> >> >> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows >> >> >> > [3] https://github.com/xuyangzhong/flink/tree/FLINK-24024 >> >> >> > [4] >> >> >> > >> >> >> >> >> https://github.com/xuyangzhong/flink/tree/poc_legacy_group_window_agg_cumulate >> >> >> > [5] >> >> >> > >> >> >> >> >> https://github.com/xuyangzhong/flink/tree/poc_window_agg_consumes_cdc_stream >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > >> >> >> > Best! >> >> >> > Xuyang >> >> >> >> >>