Hi Jing,
Thanks for the reminder. The aim of this flip is letting the sql users to use those features in the Datastream API, we don't intend to extend flip-217. In my opinion, the watermark alignment with only one source can be configured by the options given in flip, and if the source connector does not implement flip-217, the task will run with an error, reminding the user to use `pipeline.watermark-alignment.allow- unaligned-source-splits`, but maybe these behaviors are not understood by everyone, I will add some tips about flip-217 in the flip to let users understand the behavior in the case of source splits. Best, Kui Yuan Jing Ge <j...@ververica.com.invalid> 于2023年3月7日周二 04:23写道: > Hi Kui, > > Thanks for pointing that out. I knew FLIP-217 which was done by an > engineer working in my team. As far as I am concerned, your FLIP should > answer the following questions: > > 1. How to enable the watermark alignment of a source splits with Flink SQL? > e.g. which options should be used if only one source is used? > > 2. Default behaviour. i.e. Flink SQL users should be aware that watermark > alignment of source split will only work for sources that implement > FLIP-217 properly. Should users take care of > `pipeline.watermark-alignment.allow-unaligned-source-splits` > while using Flink SQL? > > Best regards, > Jing > > > On Fri, Mar 3, 2023 at 8:46 AM Kui Yuan <catye...@gmail.com> wrote: > > > Hi all, > > > > Thanks for all. There are more questions and I will answer one by one. > > > > @Jark Thanks for your tips. For the first question, I will add more > details > > in the flip, and give a POC[1] so that pepole can know how I'm currently > > implementing these features. > > > > > IIRC, this is the first time we introduce the framework-level connector > > > options that the option is not recognized and handled by connectors. > > > The FLIP should cover how framework filters the watermark related > options > > > to avoid discover connector factory failed, and what happens if the > > > connector already supported the conflict options > > > > For the second question, We know that the default strategy is > 'on-periodic' > > in SQL layer, and the default interval is 200ms. The reason for emiting > > watermark periodically is that the time advancement of consecutive events > > may be very small, we don't need to calculate watermark for each event. > > Same for 'on-event' strategy, so my idea is that we can set a fixed gap > for > > 'on-event' strategy. > > > > > I'm not sure about the usage scenarios of event gap emit strategy. Do > > > you have any specific use case of this strategy? I'm confused why no > one > > > requested this strategy before no matter in DataStream or SQL, but > maybe > > > I missed something. I'm not against to add this option, but just want > to > > be > > > careful when adding new API because it's hard to remove in the future. > > > > As @Timo said, There is no default features like 'on-event-gap' in > > DataStream API, but the users can achieve the 'on-event-gap' feature by > > using `WatermarkGenerator` interface, just like the implemention in my > > POC[1]. However, If we don't provide it in SQL layer, there is no way > for > > users to use similar features. > > > > > Jark raised a very good point. I thought we only expose what is > > > contained in DataStream API already. If this strategy is not part of > > > DataStream API, would like to exclude it from the FLIP. We need to be > > > careful which strategies we offer by default. > > > > @Jark @Timo I'm sorry, perhaps I don't understand what are your concerns > > about CompiledPlan, maybe I missed something else, maybe you can look at > my > > POC first to see if there is somewhere to worry about. > > > > > Sorry, I forgot to remind you that Timo's concern about the changes to > > the > > > CompiledPlan looks like is still not covered in the FLIP. > > > > @Jing We could have more discussion about naming, but I prefer that the > > naming should be consistent with the DataStream API. > > About aligning splits/partitions/shards, maybe you missed FLIP-217[2] > which > > aims to support watermark alignment of source splits. > > > > > After reading the most up-to-date Flip, I didn't find any information > if > > > this solution will support aligning splits/partitions/shards [1]. Did I > > > miss anything? > > > > Best > > Kui Yuan > > > > [1] the POC: > > https://github.com/yuchengxin/flink/tree/yuankui/watermark_params > > [2] FLIP-217: > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-217%3A+Support+watermark+alignment+of+source+splits > > > > > > Jing Ge <j...@ververica.com.invalid> 于2023年3月3日周五 08:03写道: > > > > > Hi, > > > > > > Thanks Kui for driving this Flip and thanks all for the informative > > > discussion. > > > > > > @Timo > > > > > > Your suggestion about the naming convention is excellent. Thanks! I was > > > wondering why you, exceptionally, suggested 'scan.idle-timeout' instead > > of > > > 'scan.watermark.idle-timeout'. I must miss something here. > > > > > > There is one more NIT. I am just aware that "drift" is used for the > > > watermark alignment. It seems to be fine while using DataStream API, > > > because we will not really see it. But with the OPTIONS in SQL, a much > > > bigger group of users (including SRE, tech support, etc) will see the > > word > > > "drift". Given that "drift" wasn't used widely yet and with all > training > > > materials, Flink doc [1][2][3] (search with "lag"), "lag" has been used > > to > > > describe timestamp difference between watermark and its > > > corresponding event. Do we really need to introduce another term for > the > > > same thing? How about using 'scan.watermark.alignment.max-lag'='1min' > and > > > change the parameter name from maxAllowedWatermarkDrift to > > > maxAllowedWatermarkLag [4] because of naming consistency? Just my two > > cents > > > worth. > > > > > > @Kui > > > > > > After reading the most up-to-date Flip, I didn't find any information > if > > > this solution will support aligning splits/partitions/shards [1]. Did I > > > miss anything? > > > > > > +1 for the concern about Table API. We'd be better keep Table API and > SQL > > > synced for new features. > > > > > > Best regards, > > > Jing > > > > > > > > > [1] > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_ > > > [2] > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/event-time/built_in/#fixed-amount-of-lateness > > > > > > [3] > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka/ > > > [4] > > > > > > > > > https://github.com/apache/flink/blob/4aacff572a9e3996c5dee9273638831e4040c767/flink-core/src/main/java/org/apache/flink/api/common/eventtime/WatermarkStrategy.java#L169 > > > > > > > > > > > > On Wed, Mar 1, 2023 at 3:54 PM Timo Walther <twal...@apache.org> > wrote: > > > > > > > Reg. 2: > > > > > event gap emit strategy [...] no matter in DataStream or SQL > > > > > > > > Jark raised a very good point. I thought we only expose what is > > > > contained in DataStream API already. If this strategy is not part of > > > > DataStream API, would like to exclude it from the FLIP. We need to be > > > > careful which strategies we offer by default. > > > > > > > > Reg 1: > > > > This already has a JIRA ticket with additional thoughts on this > topic: > > > > https://issues.apache.org/jira/browse/FLINK-25221 > > > > > > > > Regards, > > > > Timo > > > > > > > > > > > > > > > > On 01.03.23 12:31, Jark Wu wrote: > > > > > Sorry, I forgot to remind you that Timo's concern about the changes > > to > > > > the > > > > > CompiledPlan looks like is still not covered in the FLIP. > > > > > > > > > > Best, > > > > > Jark > > > > > > > > > > On Wed, 1 Mar 2023 at 19:28, Jark Wu <imj...@gmail.com> wrote: > > > > > > > > > >> Hi Kui, > > > > >> > > > > >> Thank you for the great proposal, I think this is already in a > good > > > > shape. > > > > >> > > > > >> Just a kind reminder, according to the community guidelines[1], > > > > >> if there are unresponsive reviewers, a typical reasonable time > > > > >> to wait for responses is one week, but be pragmatic about it. > > > > >> > > > > >> Regarding the FLIP, I have some comments below: > > > > >> > > > > >> 1. IIRC, this is the first time we introduce the framework-level > > > > connector > > > > >> options that the option is not recognized and handled by > connectors. > > > > >> The FLIP should cover how framework filters the watermark related > > > > options > > > > >> to avoid discover connector factory failed, and what happens if > the > > > > >> connector > > > > >> already supported the conflict options. > > > > >> > > > > >> 2. I'm not sure about the usage scenarios of event gap emit > > strategy. > > > Do > > > > >> you have any specific use case of this strategy? I'm confused why > no > > > one > > > > >> requested this strategy before no matter in DataStream or SQL, but > > > maybe > > > > >> I missed something. I'm not against to add this option, but just > > want > > > to > > > > >> be > > > > >> careful when adding new API because it's hard to remove in the > > future. > > > > >> > > > > >> > > > > >> 3. Adding a "Public Interface"[2] section to summarize the > > > > >> proposed APIs and options would be better for developers to > > > > >> know the impact. Currently, the APIs are scattered in the long > > > > >> design sections. > > > > >> > > > > >> Best, > > > > >> Jark > > > > >> > > > > >> > > > > >> [1]: > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > > > >> [2]: > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template > > > > >> > > > > >> On Wed, 1 Mar 2023 at 16:56, Kui Yuan <catye...@gmail.com> wrote: > > > > >> > > > > >>> Hi all, > > > > >>> > > > > >>> Thanks for all discussions! > > > > >>> > > > > >>> Anyone else have questions or suggestions? if not, I will start a > > > vote > > > > >>> thread later. > > > > >>> > > > > >>> Best > > > > >>> Kui Yuan > > > > >>> > > > > >>> kui yuan <catye...@gmail.com> 于2023年2月27日周一 20:21写道: > > > > >>> > > > > >>>> Hi Timo, > > > > >>>> > > > > >>>> Thanks for your advice. I totally agree with your suggestion of > > > naming > > > > >>>> convention, I will rename these options and update the flip > later, > > > > >>> thanks > > > > >>>> very much. > > > > >>>> > > > > >>>> In our internal implementation we had put these options inside > the > > > > >>>> `FactoryUtil`, just as you expect. We have also taken into > > account > > > > the > > > > >>>> changes to the CompiledPlan and we have packaged these options > > > > >>>> appropriately to minimize intrusiveness and ensure the > > compatibility > > > > to > > > > >>> the > > > > >>>> `WatermarkPushDownSpec`. > > > > >>>> > > > > >>>>> A hint to the implementation: I would suggest that we add those > > > > >>> options > > > > >>>>> to `FactoryUtil`. All cross-connector options should end up > > there. > > > > >>>> > > > > >>>> > > > > >>>>> Please also consider the changes to the CompiledPlan in your > > FLIP. > > > > >>> This > > > > >>>>> change has implications on the JSON format as watermark > strategy > > of > > > > >>>>> ExecNode becomes more complex, see WatermarkPushDownSpec > > > > >>>> > > > > >>>> Best > > > > >>>> Kui Yuan > > > > >>>> > > > > >>>> Timo Walther <twal...@apache.org> 于2023年2月27日周一 18:05写道: > > > > >>>> > > > > >>>>> Hi Kui Yuan, > > > > >>>>> > > > > >>>>> thanks for working on this FLIP. Let me also give some comments > > > about > > > > >>>>> the proposed changes. > > > > >>>>> > > > > >>>>> I support the direction of this FLIP about handling these > > > > >>>>> watermark-specific properties through options and > /*+OPTIONS(...) > > > */ > > > > >>>>> hints. > > > > >>>>> > > > > >>>>> Regarding naming, I would like to keep the options in sync with > > > > >>> existing > > > > >>>>> options: > > > > >>>>> > > > > >>>>> > 'watermark.emit.strategy'='ON_EVENT' > > > > >>>>> > > > > >>>>> Let's use lower case (e.g. `on-event`) that matches with > > properties > > > > >>> like > > > > >>>>> sink.partitioner [1] or sink.delivery-guarantee [2]. > > > > >>>>> > > > > >>>>> > 'source.idle-timeout'='1min' > > > > >>>>> > > > > >>>>> According to FLIP-122 [3], we want to prefix all scan-source > > > related > > > > >>>>> properties with `scan.*`. This clearly includes idle-timeout > and > > > > >>>>> actually also watermark strategies which don't apply for lookup > > > > >>> sources. > > > > >>>>> > > > > >>>>> Summarizing the comments above, we should use the following > > > options: > > > > >>>>> > > > > >>>>> 'scan.watermark.emit.strategy'='on-event', > > > > >>>>> 'scan.watermark.emit.on-event.gap'='10000', > > > > >>>>> 'scan.idle-timeout'='1min', > > > > >>>>> 'scan.watermark.alignment.group'='alignment-group-1', > > > > >>>>> 'scan.watermark.alignment.max-drift'='1min', > > > > >>>>> 'scan.watermark.alignment.update-interval'='1s' > > > > >>>>> > > > > >>>>> I know that this makes the keys even longer, but given that > those > > > > >>>>> options are for power users this should be acceptable. It also > > > > clearly > > > > >>>>> indicates which options are for sinks, scans, and lookups. This > > > > >>>>> potentially also helps in allow lists. > > > > >>>>> > > > > >>>>> A hint to the implementation: I would suggest that we add those > > > > options > > > > >>>>> to `FactoryUtil`. All cross-connector options should end up > > there. > > > > >>>>> > > > > >>>>> Please also consider the changes to the CompiledPlan in your > > FLIP. > > > > This > > > > >>>>> change has implications on the JSON format as watermark > strategy > > of > > > > >>>>> ExecNode becomes more complex, see WatermarkPushDownSpec [4]. > > > > >>>>> > > > > >>>>> Regards, > > > > >>>>> Timo > > > > >>>>> > > > > >>>>> > > > > >>>>> [1] > > > > >>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.11/dev/table/connectors/kafka.html#sink-partitioner > > > > >>>>> [2] > > > > >>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/kafka/#sink-delivery-guarantee > > > > >>>>> [3] > > > > >>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory > > > > >>>>> [4] > > > > >>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/abilities/source/WatermarkPushDownSpec.java > > > > >>>>> > > > > >>>>> > > > > >>>>> On 24.02.23 04:55, kui yuan wrote: > > > > >>>>>> Hi all, > > > > >>>>>> > > > > >>>>>> I have updated the flip according to the discussion, and we > will > > > > >>> extend > > > > >>>>> the > > > > >>>>>> watermark-related features with both table options and > 'OPTIONS' > > > > >>> hint, > > > > >>>>> like > > > > >>>>>> this: > > > > >>>>>> > > > > >>>>>> ``` > > > > >>>>>> -- configure in table options > > > > >>>>>> CREATE TABLE user_actions ( > > > > >>>>>> ... > > > > >>>>>> user_action_time TIMESTAMP(3), > > > > >>>>>> WATERMARK FOR user_action_time AS user_action_time - > > INTERVAL > > > > '5' > > > > >>>>> SECOND > > > > >>>>>> ) WITH ( > > > > >>>>>> 'watermark.emit.strategy'='ON_PERIODIC', > > > > >>>>>> ... > > > > >>>>>> ); > > > > >>>>>> > > > > >>>>>> -- use 'OPTIONS' hint > > > > >>>>>> select ... from source_table /*+ > > > OPTIONS('watermark.emit.strategy'= > > > > >>>>>> 'ON_PERIODIC') */ > > > > >>>>>> ``` > > > > >>>>>> > > > > >>>>>> Does everybody have any other questions? > > > > >>>>>> > > > > >>>>>> Best > > > > >>>>>> Kui Yuan > > > > >>>>>> > > > > >>>>>> kui yuan <catye...@gmail.com> 于2023年2月23日周四 20:05写道: > > > > >>>>>> > > > > >>>>>>> Hi all, > > > > >>>>>>> > > > > >>>>>>> Thanks for all suggestions. > > > > >>>>>>> > > > > >>>>>>> We will extend the watermark-related features in SQL layer > with > > > > >>> dynamic > > > > >>>>>>> table options and 'OPTIONS' hint, just as everyone expects. I > > > will > > > > >>>>> modify > > > > >>>>>>> Flip-296 as discussed. > > > > >>>>>>> > > > > >>>>>>> @Martijn As far as I know, there is no hint interface in the > > > table > > > > >>> API, > > > > >>>>>>> so we can't use hint in table API directly. if we need to > > extend > > > > the > > > > >>>>> hint > > > > >>>>>>> interface in the table API, maybe we need another flip. > > However, > > > if > > > > >>> we > > > > >>>>>>> extend the watermark-related features in the dynamic table > > > options, > > > > >>>>> maybe > > > > >>>>>>> we are able to use them indirectly in the table API like > > this[1]: > > > > >>>>>>> > > > > >>>>>>> ``` > > > > >>>>>>> // register a table named "Orders" > > > > >>>>>>> tableEnv.executeSql("CREATE TABLE Orders (`user` BIGINT, > > product > > > > >>>>> STRING, > > > > >>>>>>> amount INT) WITH ('watermark.emit.strategy'='ON_EVENT'...)"); > > > > >>>>>>> ``` > > > > >>>>>>> > > > > >>>>>>> [1] > > > > >>>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/ > > > > >>>>>>> > > > > >>>>>>> Best > > > > >>>>>>> Kui Yuan > > > > >>>>>>> > > > > >>>>>>> Yun Tang <myas...@live.com> 于2023年2月23日周四 17:46写道: > > > > >>>>>>> > > > > >>>>>>>> Thanks for the warm discussions! > > > > >>>>>>>> > > > > >>>>>>>> I had an offline discussion with Kui about the replies. I > > think > > > I > > > > >>>>> could > > > > >>>>>>>> give some explanations on the original intention to > introduce > > > > >>> another > > > > >>>>>>>> WATERMARK_PARAMS. If we take a look at the current > datastream > > > API, > > > > >>> the > > > > >>>>>>>> watermark strategy does not belong to any specific > connector. > > > And > > > > >>> we > > > > >>>>>>>> thought the dynamic table options were more like the > > > > configurations > > > > >>>>> within > > > > >>>>>>>> some specific connector. > > > > >>>>>>>> > > > > >>>>>>>> From the review comments, I think most people feel good to > > > make > > > > it > > > > >>>>> part > > > > >>>>>>>> of the dynamic table options. I think this is fine if we > give > > > more > > > > >>>>> clear > > > > >>>>>>>> scope definition of the dynamic table options here. And I > also > > > > >>> agree > > > > >>>>> with > > > > >>>>>>>> Jingsong's concern about adding SQL syntax which is the most > > > > >>>>> concerning > > > > >>>>>>>> part before launching this discussion. > > > > >>>>>>>> > > > > >>>>>>>> For Martijn's concern, if we accept to make the > > > watermark-related > > > > >>>>> options > > > > >>>>>>>> part of dynamic table options, the problem becomes another > > > topic: > > > > >>> how > > > > >>>>> to > > > > >>>>>>>> support the dynamic table options in table API, which is > > > deserved > > > > >>> to > > > > >>>>> create > > > > >>>>>>>> another FLIP. > > > > >>>>>>>> > > > > >>>>>>>> Best > > > > >>>>>>>> Yun Tang > > > > >>>>>>>> ________________________________ > > > > >>>>>>>> From: Martijn Visser <martijnvis...@apache.org> > > > > >>>>>>>> Sent: Thursday, February 23, 2023 17:14 > > > > >>>>>>>> To: dev@flink.apache.org <dev@flink.apache.org> > > > > >>>>>>>> Subject: Re: [DISCUSS] FLIP-296: Watermark options for table > > > API & > > > > >>> SQL > > > > >>>>>>>> > > > > >>>>>>>> Hi, > > > > >>>>>>>> > > > > >>>>>>>> While I can understand that there's a desire to first focus > on > > > > >>> solving > > > > >>>>>>>> this > > > > >>>>>>>> problem for SQL, I do wonder if we should ignore the Table > API > > > at > > > > >>> this > > > > >>>>>>>> point. If we could include the syntax for the Table API, it > > > > >>>>> potentially > > > > >>>>>>>> could also be implemented by another contributor without > > needing > > > > to > > > > >>>>> create > > > > >>>>>>>> another FLIP. If we don't design it right now, my concern is > > > that > > > > >>> this > > > > >>>>>>>> will > > > > >>>>>>>> increase sparsity for the Table API which ultimately hurts > > > > >>> adoption. > > > > >>>>>>>> > > > > >>>>>>>> With regards to the syntax, I have a preference to solve > this > > > via > > > > >>> the > > > > >>>>>>>> connector options (e.g. like you can currently specify > things > > as > > > > >>>>>>>> scan.startup.specific-offsets or scan.bounded.mode for the > > Kafka > > > > >>>>>>>> connector). You could still use the dynamic table options to > > > > >>>>> override/add > > > > >>>>>>>> them. > > > > >>>>>>>> > > > > >>>>>>>> Best regards, > > > > >>>>>>>> > > > > >>>>>>>> Martijn > > > > >>>>>>>> > > > > >>>>>>>> On Thu, Feb 23, 2023 at 7:21 AM Shammon FY < > zjur...@gmail.com > > > > > > > >>> wrote: > > > > >>>>>>>> > > > > >>>>>>>>> Hi kui > > > > >>>>>>>>> > > > > >>>>>>>>> Thanks for your answer and +1 to yuxia too > > > > >>>>>>>>> > > > > >>>>>>>>>> we should not bind the watermark-related options to a > > > connector > > > > >>> to > > > > >>>>>>>> ensure > > > > >>>>>>>>> semantic clarity. > > > > >>>>>>>>> > > > > >>>>>>>>> In my opinion, adding watermark-related options to a > > connector > > > is > > > > >>>>> much > > > > >>>>>>>> more > > > > >>>>>>>>> clear. Currently users can define simple watermark strategy > > in > > > > >>> DDL, > > > > >>>>>>>> adding > > > > >>>>>>>>> more configuration items in connector options is easy to > > > > >>> understand > > > > >>>>>>>>> > > > > >>>>>>>>> Best, > > > > >>>>>>>>> Shammon > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> On Thu, Feb 23, 2023 at 10:52 AM Jingsong Li < > > > > >>> jingsongl...@gmail.com > > > > >>>>>> > > > > >>>>>>>>> wrote: > > > > >>>>>>>>> > > > > >>>>>>>>>> Thanks for your proposal. > > > > >>>>>>>>>> > > > > >>>>>>>>>> +1 to yuxia, consider watermark-related hints as option > > hints. > > > > >>>>>>>>>> > > > > >>>>>>>>>> Personally, I am cautious about adding SQL syntax, > > > > >>> WATERMARK_PARAMS > > > > >>>>> is > > > > >>>>>>>>>> also SQL syntax to some extent. > > > > >>>>>>>>>> > > > > >>>>>>>>>> We can use OPTIONS to meet this requirement if possible. > > > > >>>>>>>>>> > > > > >>>>>>>>>> Best, > > > > >>>>>>>>>> Jingsong > > > > >>>>>>>>>> > > > > >>>>>>>>>> On Thu, Feb 23, 2023 at 10:41 AM yuxia < > > > > >>> luoyu...@alumni.sjtu.edu.cn > > > > >>>>>> > > > > >>>>>>>>>> wrote: > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Hi, Yuan Kui. > > > > >>>>>>>>>>> Thanks for driving it. > > > > >>>>>>>>>>> IMO, the 'OPTIONS' hint may be not only specific to the > > > > >>> connector > > > > >>>>>>>>>> options. Just as a reference, we also have > > > `sink.parallelism`[1] > > > > >>> as > > > > >>>>> a > > > > >>>>>>>>>> connector options. It enables > > > > >>>>>>>>>>> user to specific the writer's parallelism dynamically > > > > per-query. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Personally, I perfer to consider watermark-related hints > as > > > > >>> option > > > > >>>>>>>>>> hints. So, user can define a default watermark strategy > for > > > the > > > > >>>>> table, > > > > >>>>>>>>> and > > > > >>>>>>>>>> if user dosen't needed to changes it, they need to do > > nothing > > > in > > > > >>>>> their > > > > >>>>>>>>>> query instead of specific it ervery time. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> [1] > > > > >>>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/filesystem/#sink-parallelism > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Best regards, > > > > >>>>>>>>>>> Yuxia > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Best regards, > > > > >>>>>>>>>>> Yuxia > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> ----- 原始邮件 ----- > > > > >>>>>>>>>>> 发件人: "kui yuan" <catye...@gmail.com> > > > > >>>>>>>>>>> 收件人: "dev" <dev@flink.apache.org> > > > > >>>>>>>>>>> 抄送: "Jark Wu" <imj...@gmail.com> > > > > >>>>>>>>>>> 发送时间: 星期三, 2023年 2 月 22日 下午 10:08:11 > > > > >>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-296: Watermark options for table > > API & > > > > >>> SQL > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Hi all, > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Thanks for the lively discussion and I will respond to > > these > > > > >>>>>>>> questions > > > > >>>>>>>>>> one > > > > >>>>>>>>>>> by one. However, there are also some common questions > and I > > > > will > > > > >>>>>>>> answer > > > > >>>>>>>>>>> together. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> @郑 Thanks for your reply. The features mentioned in this > > flip > > > > >>> are > > > > >>>>>>>> only > > > > >>>>>>>>>> for > > > > >>>>>>>>>>> those source connectors that implement the > > > > >>>>> SupportsWatermarkPushDown > > > > >>>>>>>>>>> interface, generating watermarks in other graph locations > > is > > > > >>> not in > > > > >>>>>>>> the > > > > >>>>>>>>>>> scope of this discussion. Perhaps another flip can be > > > proposed > > > > >>>>>>>> later to > > > > >>>>>>>>>>> implement this feature. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> @Shammon Thanks for your reply. In Flip-296, a rejected > > > > >>> alternative > > > > >>>>>>>> is > > > > >>>>>>>>>>> adding watermark related options in the connector > > options,we > > > > >>>>> believe > > > > >>>>>>>>> that > > > > >>>>>>>>>>> we should not bind the watermark-related options to a > > > connector > > > > >>> to > > > > >>>>>>>>> ensure > > > > >>>>>>>>>>> semantic clarity. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> What will happen if we add watermark related options in > > `the > > > > >>>>>>>>> connector > > > > >>>>>>>>>>>> options`? Will the connector ignore these options or > throw > > > an > > > > >>>>>>>>>> exception? > > > > >>>>>>>>>>>> How can we support this? > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> If user defines different watermark configurations for > one > > > > >>> table in > > > > >>>>>>>> two > > > > >>>>>>>>>>> places, I tend to prefer the first place would prevail, > but > > > we > > > > >>> can > > > > >>>>>>>>> also > > > > >>>>>>>>>>> throw exception or just print logs to prompt the user, > > which > > > > are > > > > >>>>>>>>>>> implementation details. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> If one table is used by two operators with different > > > watermark > > > > >>>>>>>>> params, > > > > >>>>>>>>>>>> what will happen? > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> @Martijn Thanks for your reply. I'm sorry that we are > not > > > > >>>>>>>> particularly > > > > >>>>>>>>>>> accurate, this hint is mainly for SQL, not table API. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> While the FLIP talks about watermark options for Table > > API & > > > > >>> SQL, > > > > >>>>>>>> I > > > > >>>>>>>>>> only > > > > >>>>>>>>>>>> see proposed syntax for SQL, not for the Table API. What > > is > > > > >>> your > > > > >>>>>>>>>> proposal > > > > >>>>>>>>>>>> for the Table API > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> @Jane Thanks for your reply. For the first question, If > the > > > > user > > > > >>>>>>>> uses > > > > >>>>>>>>>> this > > > > >>>>>>>>>>> hint on those sourse that does not implement the > > > > >>>>>>>>>> SupportsWatermarkPushDown > > > > >>>>>>>>>>> interface, it will be completely invalid. The task will > run > > > as > > > > >>>>>>>> normal > > > > >>>>>>>>> as > > > > >>>>>>>>>> if > > > > >>>>>>>>>>> the hint had not been used. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> What's the behavior if there are multiple table sources, > > > among > > > > >>>>>>>> which > > > > >>>>>>>>>>>> some do not support `SupportsWatermarkPushDown`? > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> @Jane feedback that 'WATERMARK_PARAMS' is difficult to > > > > remember, > > > > >>>>>>>>> perhaps > > > > >>>>>>>>>>> the naming issue can be put to the end of the discussion, > > > > >>> because > > > > >>>>>>>> more > > > > >>>>>>>>>>> people like @Martijn @Shuo are considering whether these > > > > >>>>>>>> configurations > > > > >>>>>>>>>>> should be put into the DDL or the 'OPTIONS' hint. Here's > > > what I > > > > >>>>>>>>>>> think, Putting these configs into DDL or putting them > into > > > > >>>>> 'OPTIONS' > > > > >>>>>>>>> hint > > > > >>>>>>>>>>> is actually the same thing, because the 'OPTIONS' hint is > > > > mainly > > > > >>>>>>>> used > > > > >>>>>>>>> to > > > > >>>>>>>>>>> configure the properties of conenctor. The reason why I > > want > > > to > > > > >>> use > > > > >>>>>>>> a > > > > >>>>>>>>> new > > > > >>>>>>>>>>> hint is to make sure the semantics clear, in my opinion > the > > > > >>>>>>>>> configuration > > > > >>>>>>>>>>> of watermark should not be mixed up with connector. > > However, > > > a > > > > >>> new > > > > >>>>>>>> hint > > > > >>>>>>>>>>> does make it more difficult to use to some extent, for > > > example, > > > > >>>>>>>> when a > > > > >>>>>>>>>> user > > > > >>>>>>>>>>> uses both 'OPTIONS' hint and 'WATERMARK_PARAMS' hint. For > > > this > > > > >>>>>>>> point, > > > > >>>>>>>>>> maby > > > > >>>>>>>>>>> it is more appropriate to use uniform 'OPTIONS' hint. > > > > >>>>>>>>>>> On the other hand, if we enrich more watermark option > keys > > in > > > > >>>>>>>> 'OPTIONS' > > > > >>>>>>>>>>> hints, The question will be what we treat the > definatrions > > of > > > > >>>>>>>>> 'OPTIONS' > > > > >>>>>>>>>>> hint, is this only specific to the connector options or > > could > > > > be > > > > >>>>>>>> more? > > > > >>>>>>>>>>> Maybe @Jark could share more insights here. In my opion, > > > > >>> 'OPTIONS' > > > > >>>>>>>> is > > > > >>>>>>>>>> only > > > > >>>>>>>>>>> related to the connector options, which is not like the > > > gernal > > > > >>>>>>>>> watermark > > > > >>>>>>>>>>> options. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Shuo Cheng <njucs...@gmail.com> 于2023年2月22日周三 19:17写道: > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> Hi Kui, > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Thanks for driving the discussion. It's quite useful to > > > > >>> introduce > > > > >>>>>>>>>> Watermark > > > > >>>>>>>>>>>> options. I have some questions: > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> What kind of hints is "WATERMARK_PARAMS"? > > > > >>>>>>>>>>>> Currently, we have two kinds of hints in Flink: Dynamic > > > Table > > > > >>>>>>>>> Options & > > > > >>>>>>>>>>>> Query Hints. As described in the Flip, > "WATERMARK_PARAMS" > > is > > > > >>> more > > > > >>>>>>>>> like > > > > >>>>>>>>>>>> Dynamic Table Options. So two questions arise here: > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> 1) Are these watermark options to be exposed as > connector > > > > WITH > > > > >>>>>>>>>> options? Aa > > > > >>>>>>>>>>>> described in SQL Hints doc[1], "Dynamic Table Options > > allow > > > > to > > > > >>>>>>>>>> specify or > > > > >>>>>>>>>>>> override table options dynamically", which implies that > > > these > > > > >>>>>>>> options > > > > >>>>>>>>>> can > > > > >>>>>>>>>>>> also be configured in WITH options. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> 2) Do we really need a new hint name like > > > 'WATERMARK_PARAMS', > > > > >>>>>>>> table > > > > >>>>>>>>>>>> options use "OPTIONS" as hint name, like '/*+ > > > > >>>>>>>>>>>> OPTIONS('csv.ignore-parse-errors'='true') */', maybe we > > can > > > > >>> enrich > > > > >>>>>>>>> more > > > > >>>>>>>>>>>> table option keys for watermark, e.g., /*+ > > > > >>>>>>>>>>>> OPTIONS('watermark.emit-strategy'='ON_PERIODIC') */. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> [1] > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sql/queries/hints/ > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> On Wed, Feb 22, 2023 at 10:22 AM kui yuan < > > > catye...@gmail.com > > > > > > > > > >>>>>>>>> wrote: > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>>> Hi devs, > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> I'd like to start a discussion thread for FLIP-296[1]. > > This > > > > >>>>>>>> comes > > > > >>>>>>>>>> from an > > > > >>>>>>>>>>>>> offline discussion with @Yun Tang, and we hope to > enrich > > > > table > > > > >>>>>>>> API > > > > >>>>>>>>> & > > > > >>>>>>>>>> SQL > > > > >>>>>>>>>>>> to > > > > >>>>>>>>>>>>> support many watermark-related features which were only > > > > >>>>>>>> implemented > > > > >>>>>>>>>> at > > > > >>>>>>>>>>>> the > > > > >>>>>>>>>>>>> datastream API level. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> Basically, we want to introduce watermark options in > > table > > > > >>> API & > > > > >>>>>>>>> SQL > > > > >>>>>>>>>> via > > > > >>>>>>>>>>>>> SQL hint named 'WATERMARK_PARAMS' to support features: > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> 1、Configurable watermark emit strategy > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> 2、Dealing with idle sources > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> 3、Watermark alignment > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> Last but not least, thanks to Qingsheng and Jing Zhang > > for > > > > the > > > > >>>>>>>>>> initial > > > > >>>>>>>>>>>>> reviews. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> Looking forward to your thoughts and any feedback is > > > > >>>>>>>> appreciated! > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> [1] > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240884405 > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> Best > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> Yuan Kui > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>>>> > > > > >>> > > > > >> > > > > > > > > > > > > > > > > > > >