Re: Re:Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed
Congratulations, cool works! Best Yun Tang From: Feifan Wang Sent: Wednesday, March 27, 2024 10:27 To: dev@flink.apache.org Subject: Re:Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed Congratulations ! 在 2024-03-22 12:04:39,"Hangxiang Yu" 写道: >Congratulations! >Thanks for the efforts. > >On Fri, Mar 22, 2024 at 10:00 AM Yanfei Lei wrote: > >> Congratulations! >> >> Best regards, >> Yanfei >> >> Xuannan Su 于2024年3月22日周五 09:21写道: >> > >> > Congratulations! >> > >> > Best regards, >> > Xuannan >> > >> > On Fri, Mar 22, 2024 at 9:17 AM Charles Zhang >> wrote: >> > > >> > > Congratulations! >> > > >> > > Best wishes, >> > > Charles Zhang >> > > from Apache InLong >> > > >> > > >> > > Jeyhun Karimov 于2024年3月22日周五 04:16写道: >> > > >> > > > Great news! Congratulations! >> > > > >> > > > Regards, >> > > > Jeyhun >> > > > >> > > > On Thu, Mar 21, 2024 at 2:00 PM Yuxin Tan >> wrote: >> > > > >> > > > > Congratulations! Thanks for the efforts. >> > > > > >> > > > > >> > > > > Best, >> > > > > Yuxin >> > > > > >> > > > > >> > > > > Samrat Deb 于2024年3月21日周四 20:28写道: >> > > > > >> > > > > > Congratulations ! >> > > > > > >> > > > > > Bests >> > > > > > Samrat >> > > > > > >> > > > > > On Thu, 21 Mar 2024 at 5:52 PM, Ahmed Hamdy < >> hamdy10...@gmail.com> >> > > > > wrote: >> > > > > > >> > > > > > > Congratulations, great work and great news. >> > > > > > > Best Regards >> > > > > > > Ahmed Hamdy >> > > > > > > >> > > > > > > >> > > > > > > On Thu, 21 Mar 2024 at 11:41, Benchao Li > > >> > > > wrote: >> > > > > > > >> > > > > > > > Congratulations, and thanks for the great work! >> > > > > > > > >> > > > > > > > Yuan Mei 于2024年3月21日周四 18:31写道: >> > > > > > > > > >> > > > > > > > > Thanks for driving these efforts! >> > > > > > > > > >> > > > > > > > > Congratulations >> > > > > > > > > >> > > > > > > > > Best >> > > > > > > > > Yuan >> > > > > > > > > >> > > > > > > > > On Thu, Mar 21, 2024 at 4:35 PM Yu Li >> wrote: >> > > > > > > > > >> > > > > > > > > > Congratulations and look forward to its further >> development! >> > > > > > > > > > >> > > > > > > > > > Best Regards, >> > > > > > > > > > Yu >> > > > > > > > > > >> > > > > > > > > > On Thu, 21 Mar 2024 at 15:54, ConradJam < >> jam.gz...@gmail.com> >> > > > > > wrote: >> > > > > > > > > > > >> > > > > > > > > > > Congrattulations! >> > > > > > > > > > > >> > > > > > > > > > > Leonard Xu 于2024年3月20日周三 21:36写道: >> > > > > > > > > > > >> > > > > > > > > > > > Hi devs and users, >> > > > > > > > > > > > >> > > > > > > > > > > > We are thrilled to announce that the donation of >> Flink CDC >> > > > > as a >> > > > > > > > > > > > sub-project of Apache Flink has completed. We invite >> you to >> > > > > > > explore >> > > > > > > > > > the new >> > > > > > > > > > > > resources available: >> > > > > > > > > > > > >> > > > > > > > > > > > - GitHub Repository: >> https://github.com/apache/flink-cdc >> > > > > > > > > > > > - Flink CDC Documentation: >> > > > > > > > > > > > >> https://nightlies.apache.org/flink/flink-cdc-docs-stable >> > > > > > > > > > > > >> > > > > > > > > > > > After Flink community accepted this donation[1], we >> have >> > > > > > > completed >> > > > > > > > > > > > software copyright signing, code repo migration, code >> > > > > cleanup, >> > > > > > > > website >> > > > > > > > > > > > migration, CI migration and github issues migration >> etc. >> > > > > > > > > > > > Here I am particularly grateful to Hang Ruan, >> Zhongqaing >> > > > > Gong, >> > > > > > > > > > Qingsheng >> > > > > > > > > > > > Ren, Jiabao Sun, LvYanquan, loserwang1024 and other >> > > > > > contributors >> > > > > > > > for >> > > > > > > > > > their >> > > > > > > > > > > > contributions and help during this process! >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > For all previous contributors: The contribution >> process has >> > > > > > > > slightly >> > > > > > > > > > > > changed to align with the main Flink project. To >> report >> > > > bugs >> > > > > or >> > > > > > > > > > suggest new >> > > > > > > > > > > > features, please open tickets >> > > > > > > > > > > > Apache Jira (https://issues.apache.org/jira). Note >> that >> > > > we >> > > > > > will >> > > > > > > > no >> > > > > > > > > > > > longer accept GitHub issues for these purposes. >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > Welcome to explore the new repository and >> documentation. >> > > > Your >> > > > > > > > feedback >> > > > > > > > > > and >> > > > > > > > > > > > contributions are invaluable as we continue to >> improve >> > > > Flink >> > > > > > CDC. >> > > > > > > > > > > > >> > > > > > > > > > > > Thanks everyone for your support and happy exploring >> Flink >> > > > > CDC! >> > > > > > > > > > > > >> > > > > > > > > > > > Best, >> > > > > > > > > > > > Leonard >> > > > > > > > > > > > [1] >> > > > > > > > >>
Re: [DISCUSS] FLIP-423 ~FLIP-428: Introduce Disaggregated State Storage and Management in Flink 2.0
Hi devs, It seems there is no more concern or suggestion for a while. We'd like to start voting recently. Best, Zakelly On Wed, Mar 27, 2024 at 11:46 AM Zakelly Lan wrote: > Hi everyone, > > Piotr and I had a long discussion about the checkpoint duration issue. We > think that the lambda serialization approach I proposed in last mail may > bring in more issues, the most serious one is that users may not be able to > modify their code in serialized lambda to perform a bug fix. > > But fortunately we found a possible solution. By introducing a set of > declarative APIs and a `declareProcess()` function that users should > implement in some newly introduced AbstractStreamOperator, we could get the > declaration of record processing in runtime, broken down to requests and > callbacks (lambdas) like FLIP-424 introduced. Thus we avoid the problem of > lambda (de)serialization and instead we retrieve callbacks every time > before a task runs. The next step is to provide an API allowing users to > assign an unique id to each state request and callback, or automatically > assign one by declaration order. Thus we can find the corresponding > callback in runtime for each restored state request based on the id, then > the whole pipeline can be resumed. > > Note that all these above are internal and won't expose to average users. > Exposing this on Stream APIs can be discussed later. I will prepare another > FLIP(s) describing the whole optimized checkpointing process, and in the > meantime, we can proceed on current FLIPs. The new FLIP(s) are built on top > of current ones and we can achieve this incrementally. > > > Best, > Zakelly > > On Thu, Mar 21, 2024 at 12:31 AM Zakelly Lan > wrote: > >> Hi Piotrek, >> >> Thanks for your comments! >> >> As we discussed off-line, you agreed that we can not checkpoint while some >>> records are in the middle of being >>> processed. That we would have to drain the in-progress records before >>> doing >>> the checkpoint. You also argued >>> that this is not a problem, because the size of this buffer can be >>> configured. >>> >>> I'm really afraid of such a solution. I've seen in the past plenty of >>> times, that whenever Flink has to drain some >>> buffered records, eventually that always brakes timely checkpointing (and >>> hence ability for Flink to rescale in >>> a timely manner). Even a single record with a `flatMap` like operator >>> currently in Flink causes problems during >>> back pressure. That's especially true for example for processing >>> watermarks. At the same time, I don't see how >>> this value could be configured by even Flink's power users, let alone an >>> average user. The size of that in-flight >>> buffer not only depends on a particular query/job, but also the "good" >>> value changes dynamically over time, >>> and can change very rapidly. Sudden spikes of records or backpressure, >>> some >>> hiccup during emitting watermarks, >>> all of those could change in an instant the theoretically optimal buffer >>> size of let's say "6000" records, down to "1". >>> And when those changes happen, those are the exact times when timely >>> checkpointing matters the most. >>> If the load pattern suddenly changes, and checkpointing takes suddenly >>> tens >>> of minutes instead of a couple of >>> seconds, it means you can not use rescaling and you are forced to >>> overprovision the resources. And there also >>> other issues if checkpointing takes too long. >>> >> >> I'm gonna clarify some misunderstanding here. First of all, is the sync >> phase of checkpointing for the current plan longer than the synchronous >> execution model? The answer is yes, it is a trade-off for parallel >> execution model. I think the cost is worth the improvement. Now the >> question is, how much longer are we talking about? The PoC result I >> provided is that it takes 3 seconds to drain 6000 records of a simple job, >> and I said it is not a big deal. Even though you would say we may encounter >> long watermark/timer processing that make the cp wait, thus I provide >> several ways to optimize this: >> >>1. Instead of only controlling the in-flight records, we could >>control the in-flight watermark. >>2. Since we have broken down the record processing into several state >>requests with at most one subsequent callback for each request, the cp can >>be processed after current RUNNING requests (NOT records) and their >>callbacks finish. Which means, even though we have a lot of records >>in-flight (I mean in 'processElement' here), once only a small group of >>state requests finishes, the cp can proceed. They will form into 1 or 2 >>multiGets to rocksdb, which takes less time. Moreover, the timer >> processing >>is also split into several requests, so cp won't wait for the whole timer >>to finish. The picture attached describes this idea. >> >> And the size of this buffer can be configured. I'm not counting on >> average users to configure
[jira] [Created] (FLINK-34945) Support recover shuffle descriptor and partition metrics from tiered storage
Junrui Li created FLINK-34945: - Summary: Support recover shuffle descriptor and partition metrics from tiered storage Key: FLINK-34945 URL: https://issues.apache.org/jira/browse/FLINK-34945 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Junrui Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-423 ~FLIP-428: Introduce Disaggregated State Storage and Management in Flink 2.0
Hi everyone, Piotr and I had a long discussion about the checkpoint duration issue. We think that the lambda serialization approach I proposed in last mail may bring in more issues, the most serious one is that users may not be able to modify their code in serialized lambda to perform a bug fix. But fortunately we found a possible solution. By introducing a set of declarative APIs and a `declareProcess()` function that users should implement in some newly introduced AbstractStreamOperator, we could get the declaration of record processing in runtime, broken down to requests and callbacks (lambdas) like FLIP-424 introduced. Thus we avoid the problem of lambda (de)serialization and instead we retrieve callbacks every time before a task runs. The next step is to provide an API allowing users to assign an unique id to each state request and callback, or automatically assign one by declaration order. Thus we can find the corresponding callback in runtime for each restored state request based on the id, then the whole pipeline can be resumed. Note that all these above are internal and won't expose to average users. Exposing this on Stream APIs can be discussed later. I will prepare another FLIP(s) describing the whole optimized checkpointing process, and in the meantime, we can proceed on current FLIPs. The new FLIP(s) are built on top of current ones and we can achieve this incrementally. Best, Zakelly On Thu, Mar 21, 2024 at 12:31 AM Zakelly Lan wrote: > Hi Piotrek, > > Thanks for your comments! > > As we discussed off-line, you agreed that we can not checkpoint while some >> records are in the middle of being >> processed. That we would have to drain the in-progress records before >> doing >> the checkpoint. You also argued >> that this is not a problem, because the size of this buffer can be >> configured. >> >> I'm really afraid of such a solution. I've seen in the past plenty of >> times, that whenever Flink has to drain some >> buffered records, eventually that always brakes timely checkpointing (and >> hence ability for Flink to rescale in >> a timely manner). Even a single record with a `flatMap` like operator >> currently in Flink causes problems during >> back pressure. That's especially true for example for processing >> watermarks. At the same time, I don't see how >> this value could be configured by even Flink's power users, let alone an >> average user. The size of that in-flight >> buffer not only depends on a particular query/job, but also the "good" >> value changes dynamically over time, >> and can change very rapidly. Sudden spikes of records or backpressure, >> some >> hiccup during emitting watermarks, >> all of those could change in an instant the theoretically optimal buffer >> size of let's say "6000" records, down to "1". >> And when those changes happen, those are the exact times when timely >> checkpointing matters the most. >> If the load pattern suddenly changes, and checkpointing takes suddenly >> tens >> of minutes instead of a couple of >> seconds, it means you can not use rescaling and you are forced to >> overprovision the resources. And there also >> other issues if checkpointing takes too long. >> > > I'm gonna clarify some misunderstanding here. First of all, is the sync > phase of checkpointing for the current plan longer than the synchronous > execution model? The answer is yes, it is a trade-off for parallel > execution model. I think the cost is worth the improvement. Now the > question is, how much longer are we talking about? The PoC result I > provided is that it takes 3 seconds to drain 6000 records of a simple job, > and I said it is not a big deal. Even though you would say we may encounter > long watermark/timer processing that make the cp wait, thus I provide > several ways to optimize this: > >1. Instead of only controlling the in-flight records, we could control >the in-flight watermark. >2. Since we have broken down the record processing into several state >requests with at most one subsequent callback for each request, the cp can >be processed after current RUNNING requests (NOT records) and their >callbacks finish. Which means, even though we have a lot of records >in-flight (I mean in 'processElement' here), once only a small group of >state requests finishes, the cp can proceed. They will form into 1 or 2 >multiGets to rocksdb, which takes less time. Moreover, the timer processing >is also split into several requests, so cp won't wait for the whole timer >to finish. The picture attached describes this idea. > > And the size of this buffer can be configured. I'm not counting on average > users to configure it well, I'm just saying that we'd better not focus on > absolute numbers of PoC or specific cases since we can always provide > a conservative default value to make this acceptable for most cases.The > adaptive buffer size is also worth a try if we provide a conservative > strategy. > > Besides, I
[jira] [Created] (FLINK-34944) Use Incremental Source Framework in Flink CDC OceanBase Source Connector
He Wang created FLINK-34944: --- Summary: Use Incremental Source Framework in Flink CDC OceanBase Source Connector Key: FLINK-34944 URL: https://issues.apache.org/jira/browse/FLINK-34944 Project: Flink Issue Type: Improvement Components: Flink CDC Reporter: He Wang Fix For: cdc-3.1.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Thanks for the PoC and updating, The final syntax looks good to me, at least it is a nice and concise first step. SELECT f1, f2, label FROM ML_PREDICT( input => `my_data`, model => `my_cat`.`my_db`.`classifier_model`, args => DESCRIPTOR(f1, f2)); Besides, what built-in models will we support in the FLIP? This might be important because it relates to what use cases can run with the new Flink version out of the box. Best, Jark On Wed, 27 Mar 2024 at 01:10, Hao Li wrote: > Hi Timo, > > Yeah. For `primary key` and `from table(...)` those are explicitly matched > in parser: [1]. > > > SELECT f1, f2, label FROM >ML_PREDICT( > input => `my_data`, > model => `my_cat`.`my_db`.`classifier_model`, > args => DESCRIPTOR(f1, f2)); > > This named argument syntax looks good to me. It can be supported together > with > > SELECT f1, f2, label FROM ML_PREDICT(`my_data`, > `my_cat`.`my_db`.`classifier_model`,DESCRIPTOR(f1, f2)); > > Sure. Will let you know once updated the FLIP. > > [1] > > https://github.com/confluentinc/flink/blob/release-1.18-confluent/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl#L814 > > Thanks, > Hao > > On Tue, Mar 26, 2024 at 4:15 AM Timo Walther wrote: > > > Hi Hao, > > > > > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't > > > work since `TABLE` and `MODEL` are already key words > > > > This argument doesn't count. The parser supports introducing keywords > > that are still non-reserved. For example, this enables using "key" for > > both primary key and a column name: > > > > CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED) > > WITH ('connector' = 'datagen'); > > > > SELECT i AS key FROM t; > > > > I'm sure we will introduce `TABLE(my_data)` eventually as this is what > > the standard dictates. But for now, let's use the most compact syntax > > possible which is also in sync with Oracle. > > > > TLDR: We allow identifiers as arguments for PTFs which are expanded with > > catalog and database if necessary. Those identifier arguments translate > > to catalog lookups for table and models. The ML_ functions will make > > sure that the arguments are of correct type model or table. > > > > SELECT f1, f2, label FROM > >ML_PREDICT( > > input => `my_data`, > > model => `my_cat`.`my_db`.`classifier_model`, > > args => DESCRIPTOR(f1, f2)); > > > > So this will allow us to also use in the future: > > > > SELECT * FROM poly_func(table1); > > > > Same support as Oracle [1]. Very concise. > > > > Let me know when you updated the FLIP for a final review before voting. > > > > Do others have additional objections? > > > > Regards, > > Timo > > > > [1] > > > > > https://livesql.oracle.com/apex/livesql/file/content_HQK7TYEO0NHSJCDY3LN2ERDV6.html > > > > > > > > On 25.03.24 23:40, Hao Li wrote: > > > Hi Timo, > > > > > >> Please double check if this is implementable with the current stack. I > > > fear the parser or validator might not like the "identifier" argument? > > > > > > I checked this, currently the validator throws an exception trying to > get > > > the full qualifier name for `classifier_model`. But since > > > `SqlValidatorImpl` is implemented in Flink, we should be able to fix > > this. > > > The only caveator is if not full model path is provided, > > > the qualifier is interpreted as a column. We should be able to special > > > handle this by rewriting the `ml_predict` function to add the catalog > and > > > database name in `FlinkCalciteSqlValidator` though. > > > > > >> SELECT f1, f2, label FROM > > > ML_PREDICT( > > > TABLE `my_data`, > > > my_cat.my_db.classifier_model, > > > DESCRIPTOR(f1, f2)) > > > > > > SELECT f1, f2, label FROM > > > ML_PREDICT( > > > input => TABLE `my_data`, > > > model => my_cat.my_db.classifier_model, > > > args => DESCRIPTOR(f1, f2)) > > > > > > I verified these can be parsed. The problem is in validator for > qualifier > > > as mentioned above. > > > > > >> So the safest option would be the long-term solution: > > > > > > SELECT f1, f2, label FROM > > > ML_PREDICT( > > > input => TABLE(my_data), > > > model => MODEL(my_cat.my_db.classifier_model), > > > args => DESCRIPTOR(f1, f2)) > > > > > > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't > work > > > since `TABLE` and `MODEL` are already key words in calcite used by > > `CREATE > > > TABLE`, `CREATE MODEL`. Changing to `model_name(...)` works and will be > > > treated as a function. > > > > > > So I think > > > > > > SELECT f1, f2, label FROM > > > ML_PREDICT( > > > input => TABLE `my_data`, > > > model => my_cat.my_db.classifier_model, > > > args => DESCRIPTOR(f1, f2)) > > > should be fine for now. > > > > > > For the syntax part: > > > 1). Sounds good. We can drop model task and model kind from the > > definition. > > > They can be deduced from the options. > > > > > > 2). Sure. We can add temporary
Re: Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed
Congratulations! Thanks for the great work! Best, Jinzhong On Wed, Mar 27, 2024 at 10:28 AM Feifan Wang wrote: > Congratulations ! > > > 在 2024-03-22 12:04:39,"Hangxiang Yu" 写道: > >Congratulations! > >Thanks for the efforts. > > > >On Fri, Mar 22, 2024 at 10:00 AM Yanfei Lei wrote: > > > >> Congratulations! > >> > >> Best regards, > >> Yanfei > >> > >> Xuannan Su 于2024年3月22日周五 09:21写道: > >> > > >> > Congratulations! > >> > > >> > Best regards, > >> > Xuannan > >> > > >> > On Fri, Mar 22, 2024 at 9:17 AM Charles Zhang > > >> wrote: > >> > > > >> > > Congratulations! > >> > > > >> > > Best wishes, > >> > > Charles Zhang > >> > > from Apache InLong > >> > > > >> > > > >> > > Jeyhun Karimov 于2024年3月22日周五 04:16写道: > >> > > > >> > > > Great news! Congratulations! > >> > > > > >> > > > Regards, > >> > > > Jeyhun > >> > > > > >> > > > On Thu, Mar 21, 2024 at 2:00 PM Yuxin Tan > > >> wrote: > >> > > > > >> > > > > Congratulations! Thanks for the efforts. > >> > > > > > >> > > > > > >> > > > > Best, > >> > > > > Yuxin > >> > > > > > >> > > > > > >> > > > > Samrat Deb 于2024年3月21日周四 20:28写道: > >> > > > > > >> > > > > > Congratulations ! > >> > > > > > > >> > > > > > Bests > >> > > > > > Samrat > >> > > > > > > >> > > > > > On Thu, 21 Mar 2024 at 5:52 PM, Ahmed Hamdy < > >> hamdy10...@gmail.com> > >> > > > > wrote: > >> > > > > > > >> > > > > > > Congratulations, great work and great news. > >> > > > > > > Best Regards > >> > > > > > > Ahmed Hamdy > >> > > > > > > > >> > > > > > > > >> > > > > > > On Thu, 21 Mar 2024 at 11:41, Benchao Li < > libenc...@apache.org > >> > > >> > > > wrote: > >> > > > > > > > >> > > > > > > > Congratulations, and thanks for the great work! > >> > > > > > > > > >> > > > > > > > Yuan Mei 于2024年3月21日周四 18:31写道: > >> > > > > > > > > > >> > > > > > > > > Thanks for driving these efforts! > >> > > > > > > > > > >> > > > > > > > > Congratulations > >> > > > > > > > > > >> > > > > > > > > Best > >> > > > > > > > > Yuan > >> > > > > > > > > > >> > > > > > > > > On Thu, Mar 21, 2024 at 4:35 PM Yu Li > > >> wrote: > >> > > > > > > > > > >> > > > > > > > > > Congratulations and look forward to its further > >> development! > >> > > > > > > > > > > >> > > > > > > > > > Best Regards, > >> > > > > > > > > > Yu > >> > > > > > > > > > > >> > > > > > > > > > On Thu, 21 Mar 2024 at 15:54, ConradJam < > >> jam.gz...@gmail.com> > >> > > > > > wrote: > >> > > > > > > > > > > > >> > > > > > > > > > > Congrattulations! > >> > > > > > > > > > > > >> > > > > > > > > > > Leonard Xu 于2024年3月20日周三 > 21:36写道: > >> > > > > > > > > > > > >> > > > > > > > > > > > Hi devs and users, > >> > > > > > > > > > > > > >> > > > > > > > > > > > We are thrilled to announce that the donation of > >> Flink CDC > >> > > > > as a > >> > > > > > > > > > > > sub-project of Apache Flink has completed. We > invite > >> you to > >> > > > > > > explore > >> > > > > > > > > > the new > >> > > > > > > > > > > > resources available: > >> > > > > > > > > > > > > >> > > > > > > > > > > > - GitHub Repository: > >> https://github.com/apache/flink-cdc > >> > > > > > > > > > > > - Flink CDC Documentation: > >> > > > > > > > > > > > > >> https://nightlies.apache.org/flink/flink-cdc-docs-stable > >> > > > > > > > > > > > > >> > > > > > > > > > > > After Flink community accepted this donation[1], > we > >> have > >> > > > > > > completed > >> > > > > > > > > > > > software copyright signing, code repo migration, > code > >> > > > > cleanup, > >> > > > > > > > website > >> > > > > > > > > > > > migration, CI migration and github issues > migration > >> etc. > >> > > > > > > > > > > > Here I am particularly grateful to Hang Ruan, > >> Zhongqaing > >> > > > > Gong, > >> > > > > > > > > > Qingsheng > >> > > > > > > > > > > > Ren, Jiabao Sun, LvYanquan, loserwang1024 and > other > >> > > > > > contributors > >> > > > > > > > for > >> > > > > > > > > > their > >> > > > > > > > > > > > contributions and help during this process! > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > For all previous contributors: The contribution > >> process has > >> > > > > > > > slightly > >> > > > > > > > > > > > changed to align with the main Flink project. To > >> report > >> > > > bugs > >> > > > > or > >> > > > > > > > > > suggest new > >> > > > > > > > > > > > features, please open tickets > >> > > > > > > > > > > > Apache Jira (https://issues.apache.org/jira). > Note > >> that > >> > > > we > >> > > > > > will > >> > > > > > > > no > >> > > > > > > > > > > > longer accept GitHub issues for these purposes. > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > Welcome to explore the new repository and > >> documentation. > >> > > > Your > >> > > > > > > > feedback > >> > > > > > > > > > and > >> > > > > > > > > > > > contributions are invaluable as we continue to > >> improve > >> > > > Flink > >> > > > > > CDC. > >> > > > > > > > > > > > > >> > > > > > > > > > > > Thanks everyone for your support and
Re:Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed
Congratulations ! 在 2024-03-22 12:04:39,"Hangxiang Yu" 写道: >Congratulations! >Thanks for the efforts. > >On Fri, Mar 22, 2024 at 10:00 AM Yanfei Lei wrote: > >> Congratulations! >> >> Best regards, >> Yanfei >> >> Xuannan Su 于2024年3月22日周五 09:21写道: >> > >> > Congratulations! >> > >> > Best regards, >> > Xuannan >> > >> > On Fri, Mar 22, 2024 at 9:17 AM Charles Zhang >> wrote: >> > > >> > > Congratulations! >> > > >> > > Best wishes, >> > > Charles Zhang >> > > from Apache InLong >> > > >> > > >> > > Jeyhun Karimov 于2024年3月22日周五 04:16写道: >> > > >> > > > Great news! Congratulations! >> > > > >> > > > Regards, >> > > > Jeyhun >> > > > >> > > > On Thu, Mar 21, 2024 at 2:00 PM Yuxin Tan >> wrote: >> > > > >> > > > > Congratulations! Thanks for the efforts. >> > > > > >> > > > > >> > > > > Best, >> > > > > Yuxin >> > > > > >> > > > > >> > > > > Samrat Deb 于2024年3月21日周四 20:28写道: >> > > > > >> > > > > > Congratulations ! >> > > > > > >> > > > > > Bests >> > > > > > Samrat >> > > > > > >> > > > > > On Thu, 21 Mar 2024 at 5:52 PM, Ahmed Hamdy < >> hamdy10...@gmail.com> >> > > > > wrote: >> > > > > > >> > > > > > > Congratulations, great work and great news. >> > > > > > > Best Regards >> > > > > > > Ahmed Hamdy >> > > > > > > >> > > > > > > >> > > > > > > On Thu, 21 Mar 2024 at 11:41, Benchao Li > > >> > > > wrote: >> > > > > > > >> > > > > > > > Congratulations, and thanks for the great work! >> > > > > > > > >> > > > > > > > Yuan Mei 于2024年3月21日周四 18:31写道: >> > > > > > > > > >> > > > > > > > > Thanks for driving these efforts! >> > > > > > > > > >> > > > > > > > > Congratulations >> > > > > > > > > >> > > > > > > > > Best >> > > > > > > > > Yuan >> > > > > > > > > >> > > > > > > > > On Thu, Mar 21, 2024 at 4:35 PM Yu Li >> wrote: >> > > > > > > > > >> > > > > > > > > > Congratulations and look forward to its further >> development! >> > > > > > > > > > >> > > > > > > > > > Best Regards, >> > > > > > > > > > Yu >> > > > > > > > > > >> > > > > > > > > > On Thu, 21 Mar 2024 at 15:54, ConradJam < >> jam.gz...@gmail.com> >> > > > > > wrote: >> > > > > > > > > > > >> > > > > > > > > > > Congrattulations! >> > > > > > > > > > > >> > > > > > > > > > > Leonard Xu 于2024年3月20日周三 21:36写道: >> > > > > > > > > > > >> > > > > > > > > > > > Hi devs and users, >> > > > > > > > > > > > >> > > > > > > > > > > > We are thrilled to announce that the donation of >> Flink CDC >> > > > > as a >> > > > > > > > > > > > sub-project of Apache Flink has completed. We invite >> you to >> > > > > > > explore >> > > > > > > > > > the new >> > > > > > > > > > > > resources available: >> > > > > > > > > > > > >> > > > > > > > > > > > - GitHub Repository: >> https://github.com/apache/flink-cdc >> > > > > > > > > > > > - Flink CDC Documentation: >> > > > > > > > > > > > >> https://nightlies.apache.org/flink/flink-cdc-docs-stable >> > > > > > > > > > > > >> > > > > > > > > > > > After Flink community accepted this donation[1], we >> have >> > > > > > > completed >> > > > > > > > > > > > software copyright signing, code repo migration, code >> > > > > cleanup, >> > > > > > > > website >> > > > > > > > > > > > migration, CI migration and github issues migration >> etc. >> > > > > > > > > > > > Here I am particularly grateful to Hang Ruan, >> Zhongqaing >> > > > > Gong, >> > > > > > > > > > Qingsheng >> > > > > > > > > > > > Ren, Jiabao Sun, LvYanquan, loserwang1024 and other >> > > > > > contributors >> > > > > > > > for >> > > > > > > > > > their >> > > > > > > > > > > > contributions and help during this process! >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > For all previous contributors: The contribution >> process has >> > > > > > > > slightly >> > > > > > > > > > > > changed to align with the main Flink project. To >> report >> > > > bugs >> > > > > or >> > > > > > > > > > suggest new >> > > > > > > > > > > > features, please open tickets >> > > > > > > > > > > > Apache Jira (https://issues.apache.org/jira). Note >> that >> > > > we >> > > > > > will >> > > > > > > > no >> > > > > > > > > > > > longer accept GitHub issues for these purposes. >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > Welcome to explore the new repository and >> documentation. >> > > > Your >> > > > > > > > feedback >> > > > > > > > > > and >> > > > > > > > > > > > contributions are invaluable as we continue to >> improve >> > > > Flink >> > > > > > CDC. >> > > > > > > > > > > > >> > > > > > > > > > > > Thanks everyone for your support and happy exploring >> Flink >> > > > > CDC! >> > > > > > > > > > > > >> > > > > > > > > > > > Best, >> > > > > > > > > > > > Leonard >> > > > > > > > > > > > [1] >> > > > > > > > >> https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > -- >> > > > > > > > > > > Best >> > > > > > > > > > > >> > > > > > > > > > > ConradJam >> > > > > > > > > > >> > > > > > > > >> > > > > > >
[jira] [Created] (FLINK-34943) Support Flink 1.19, 1.20-SNAPSHOT for OpenSearch connector
Zhongqiang Gong created FLINK-34943: --- Summary: Support Flink 1.19, 1.20-SNAPSHOT for OpenSearch connector Key: FLINK-34943 URL: https://issues.apache.org/jira/browse/FLINK-34943 Project: Flink Issue Type: Improvement Components: Connectors / JDBC Reporter: Zhongqiang Gong -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34942) Support Flink 1.19, 1.20-SNAPSHOT for OpenSearch connector
Sergey Nuyanzin created FLINK-34942: --- Summary: Support Flink 1.19, 1.20-SNAPSHOT for OpenSearch connector Key: FLINK-34942 URL: https://issues.apache.org/jira/browse/FLINK-34942 Project: Flink Issue Type: Bug Components: Connectors / Opensearch Affects Versions: 3.1.0 Reporter: Sergey Nuyanzin Assignee: Sergey Nuyanzin Currently it fails with similar issue as FLINK-33493 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34941) Cannot convert org.apache.flink.streaming.api.CheckpointingMode to org.apache.flink.core.execution.CheckpointingMode
Sergey Nuyanzin created FLINK-34941: --- Summary: Cannot convert org.apache.flink.streaming.api.CheckpointingMode to org.apache.flink.core.execution.CheckpointingMode Key: FLINK-34941 URL: https://issues.apache.org/jira/browse/FLINK-34941 Project: Flink Issue Type: Bug Components: API / Core, Connectors / ElasticSearch, Runtime / Checkpointing Affects Versions: 1.20.0 Reporter: Sergey Nuyanzin Assignee: Sergey Nuyanzin After this change FLINK-34516 elasticsearch connector for 1.20-SNAPSHOT starts failing with {noformat} Error: /home/runner/work/flink-connector-elasticsearch/flink-connector-elasticsearch/flink-connector-elasticsearch-e2e-tests/flink-connector-elasticsearch-e2e-tests-common/src/main/java/org/apache/flink/streaming/tests/ElasticsearchSinkE2ECaseBase.java:[75,5] method does not override or implement a method from a supertype Error: /home/runner/work/flink-connector-elasticsearch/flink-connector-elasticsearch/flink-connector-elasticsearch-e2e-tests/flink-connector-elasticsearch-e2e-tests-common/src/main/java/org/apache/flink/streaming/tests/ElasticsearchSinkE2ECaseBase.java:[85,84] incompatible types: org.apache.flink.streaming.api.CheckpointingMode cannot be converted to org.apache.flink.core.execution.CheckpointingMode {noformat} https://github.com/apache/flink-connector-elasticsearch/actions/runs/8436631571/job/23104522666#step:15:12668 set blocker since now every build of elasticsearch connector against 1.20-SNAPSHOT is failing probably same issue is for opensearch connector -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format
Thanks Anupam! Looking forward to it. On Thu, Mar 14, 2024 at 1:50 AM Anupam Aggarwal wrote: > Hi Kevin, > > Thanks, these are some great points. > Just to clarify, I do agree that the subject should be an option (like in > the case of RegistryAvroFormatFactory). > We could fallback to subject and auto-register schemas, if schema-Id not > provided explicitly. > In general, I think it would be good to be more explicit about the schemas > used ( > > https://docs.confluent.io/platform/curren/schema-registry/schema_registry_onprem_tutorial.html#auto-schema-registration > < > https://docs.confluent.io/platform/current/schema-registry/schema_registry_onprem_tutorial.html#auto-schema-registration > > > ). > This would also help prevent us from overriding the ids in incompatible > ways. > > Under the current implementation of FlinkToProtoSchemaConverter we might > end up overwriting the field-Ids. > If we are able to locate a prior schema, the approach you outlined makes a > lot of sense. > Let me explore this a bit further and get back(in terms of feasibility). > > Thanks again! > - Anupam > > On Wed, Mar 13, 2024 at 2:28 AM Kevin Lam > wrote: > > > Hi Anupam, > > > > Thanks again for your work on contributing this feature back. > > > > Sounds good re: the refactoring/re-organizing. > > > > Regarding the schema-id, in my opinion this should NOT be a configuration > > option on the format. We should be able to deterministically map the > Flink > > type to the ProtoSchema and register that with the Schema Registry. > > > > I think it can make sense to provide the `subject` as a parameter, so > that > > the serialization format can look up existing schemas. > > > > This would be used in something I mentioned something related above > > > > > Another topic I had is Protobuf's field ids. Ideally in Flink it would > be > > > nice if we are idiomatic about not renumbering them in incompatible > ways, > > > similar to what's discussed on the Schema Registry issue here: > > > https://github.com/confluentinc/schema-registry/issues/2551 > > > > > > When we construct the ProtobufSchema from the Flink LogicalType, we > > shouldn't renumber the field ids in an incompatible way, so one approach > > would be to use the subject to look up the most recent version, and use > > that to evolve the field ids correctly. > > > > > > On Tue, Mar 12, 2024 at 2:33 AM Anupam Aggarwal < > anupam.aggar...@gmail.com > > > > > wrote: > > > > > Hi Kevin, > > > > > > Thanks for starting the discussion on this. > > > I will be working on contributing this feature back (This was developed > > by > > > Dawid Wysakowicz and others at Confluent). > > > > > > I have opened a (very initial) draft PR here > > > https://github.com/apache/flink/pull/24482 with our current > > > implementation. > > > Thanks for the feedback on the PR, I haven’t gotten around to > > > re-organizing/refactoring the classes yet, but it would be inline with > > some > > > of your comments. > > > > > > On the overall approach there are some slight variations from the > initial > > > proposal. > > > Our implementation relies on an explicit schema-id being passed through > > the > > > config. This could help in cases where one Flink type could potentially > > map > > > to multiple proto types. > > > We could make the schema-Id optional and fall back to deriving it from > > the > > > rowType (during serialization) if not present? > > > > > > The message index handling is still TBD. I am thinking of replicating > > logic > > > in AbstractKafkaProtobufSerializer.java > > > < > > > > > > https://github.com/confluentinc/schema-registry/blob/342c8a9d3854d4253d785214f5dcfb1b6cc59a06/protobuf-serializer/src/main/java/io/confluent/kafka/serializers/protobuf/AbstractKafkaProtobufSerializer.java#L157 > > > > > > > (|Deserializer). > > > Please let me know if this makes sense / or in case you have any other > > > feedback. > > > > > > Thanks > > > Anupam > > > > > > On Thu, Feb 29, 2024 at 8:54 PM Kevin Lam > > > > > > wrote: > > > > > > > Hey Robert, > > > > > > > > Awesome thanks, that timeline works for me. Sounds good re: deciding > on > > > > FLIP once we have the PR, and thanks for looking into the field ids. > > > > > > > > Looking forward to it! > > > > > > > > On Thu, Feb 29, 2024 at 5:09 AM Robert Metzger > > > > wrote: > > > > > > > > > Hey Kevin, > > > > > > > > > > Thanks a lot. Then let's contribute the Confluent implementation to > > > > > apache/flink. We can't start working on this immediately because > of a > > > > team > > > > > event next week, but within the next two weeks, we will start > working > > > on > > > > > this. > > > > > It probably makes sense for us to open a pull request of what we > have > > > > > already, so that you can start reviewing and maybe also > contributing > > to > > > > the > > > > > PR. > > > > > I hope this timeline works for you! > > > > > > > > > > Let's also decide if we need a FLIP once the code is public. > > > > > We will look
Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Hi Timo, Yeah. For `primary key` and `from table(...)` those are explicitly matched in parser: [1]. > SELECT f1, f2, label FROM ML_PREDICT( input => `my_data`, model => `my_cat`.`my_db`.`classifier_model`, args => DESCRIPTOR(f1, f2)); This named argument syntax looks good to me. It can be supported together with SELECT f1, f2, label FROM ML_PREDICT(`my_data`, `my_cat`.`my_db`.`classifier_model`,DESCRIPTOR(f1, f2)); Sure. Will let you know once updated the FLIP. [1] https://github.com/confluentinc/flink/blob/release-1.18-confluent/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl#L814 Thanks, Hao On Tue, Mar 26, 2024 at 4:15 AM Timo Walther wrote: > Hi Hao, > > > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't > > work since `TABLE` and `MODEL` are already key words > > This argument doesn't count. The parser supports introducing keywords > that are still non-reserved. For example, this enables using "key" for > both primary key and a column name: > > CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED) > WITH ('connector' = 'datagen'); > > SELECT i AS key FROM t; > > I'm sure we will introduce `TABLE(my_data)` eventually as this is what > the standard dictates. But for now, let's use the most compact syntax > possible which is also in sync with Oracle. > > TLDR: We allow identifiers as arguments for PTFs which are expanded with > catalog and database if necessary. Those identifier arguments translate > to catalog lookups for table and models. The ML_ functions will make > sure that the arguments are of correct type model or table. > > SELECT f1, f2, label FROM >ML_PREDICT( > input => `my_data`, > model => `my_cat`.`my_db`.`classifier_model`, > args => DESCRIPTOR(f1, f2)); > > So this will allow us to also use in the future: > > SELECT * FROM poly_func(table1); > > Same support as Oracle [1]. Very concise. > > Let me know when you updated the FLIP for a final review before voting. > > Do others have additional objections? > > Regards, > Timo > > [1] > > https://livesql.oracle.com/apex/livesql/file/content_HQK7TYEO0NHSJCDY3LN2ERDV6.html > > > > On 25.03.24 23:40, Hao Li wrote: > > Hi Timo, > > > >> Please double check if this is implementable with the current stack. I > > fear the parser or validator might not like the "identifier" argument? > > > > I checked this, currently the validator throws an exception trying to get > > the full qualifier name for `classifier_model`. But since > > `SqlValidatorImpl` is implemented in Flink, we should be able to fix > this. > > The only caveator is if not full model path is provided, > > the qualifier is interpreted as a column. We should be able to special > > handle this by rewriting the `ml_predict` function to add the catalog and > > database name in `FlinkCalciteSqlValidator` though. > > > >> SELECT f1, f2, label FROM > > ML_PREDICT( > > TABLE `my_data`, > > my_cat.my_db.classifier_model, > > DESCRIPTOR(f1, f2)) > > > > SELECT f1, f2, label FROM > > ML_PREDICT( > > input => TABLE `my_data`, > > model => my_cat.my_db.classifier_model, > > args => DESCRIPTOR(f1, f2)) > > > > I verified these can be parsed. The problem is in validator for qualifier > > as mentioned above. > > > >> So the safest option would be the long-term solution: > > > > SELECT f1, f2, label FROM > > ML_PREDICT( > > input => TABLE(my_data), > > model => MODEL(my_cat.my_db.classifier_model), > > args => DESCRIPTOR(f1, f2)) > > > > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't work > > since `TABLE` and `MODEL` are already key words in calcite used by > `CREATE > > TABLE`, `CREATE MODEL`. Changing to `model_name(...)` works and will be > > treated as a function. > > > > So I think > > > > SELECT f1, f2, label FROM > > ML_PREDICT( > > input => TABLE `my_data`, > > model => my_cat.my_db.classifier_model, > > args => DESCRIPTOR(f1, f2)) > > should be fine for now. > > > > For the syntax part: > > 1). Sounds good. We can drop model task and model kind from the > definition. > > They can be deduced from the options. > > > > 2). Sure. We can add temporary model > > > > 3). Make sense. We can use `show create model ` to display all > > information and `describe model ` to show input/output schema > > > > Thanks, > > Hao > > > > On Mon, Mar 25, 2024 at 3:21 PM Hao Li wrote: > > > >> Hi Ahmed, > >> > >> Looks like the feature freeze time for 1.20 release is June 15th. We can > >> definitely get the model DDL into 1.20. For predict and evaluate > functions, > >> if we can't get into the 1.20 release, we can get them into the 1.21 > >> release for sure. > >> > >> Thanks, > >> Hao > >> > >> > >> > >> On Mon, Mar 25, 2024 at 1:25 AM Timo Walther > wrote: > >> > >>> Hi Jark and Hao, > >>> > >>> thanks for the information, Jark! Great that the Calcite community > >>> already fixed the problem for us. +1 to adopt the
[jira] [Created] (FLINK-34940) LeaderContender implementations handle invalid state
Matthias Pohl created FLINK-34940: - Summary: LeaderContender implementations handle invalid state Key: FLINK-34940 URL: https://issues.apache.org/jira/browse/FLINK-34940 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Reporter: Matthias Pohl Currently, LeaderContender implementations (e.g. see [ResourceManagerServiceImplTest#grantLeadership_withExistingLeader_waitTerminationOfExistingLeader|https://github.com/apache/flink/blob/master/flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImplTest.java#L219]) allow the handling of leader events of the same type happening after each other which shouldn't be the case. Two subsequent leadership grants indicate that the leading instance which received the leadership grant again missed the leadership revocation event causing an invalid state of the overall deployment (i.e. split brain scenario). We should fail fatally in these scenarios rather than handling them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Hi Hao, > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't > work since `TABLE` and `MODEL` are already key words This argument doesn't count. The parser supports introducing keywords that are still non-reserved. For example, this enables using "key" for both primary key and a column name: CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED) WITH ('connector' = 'datagen'); SELECT i AS key FROM t; I'm sure we will introduce `TABLE(my_data)` eventually as this is what the standard dictates. But for now, let's use the most compact syntax possible which is also in sync with Oracle. TLDR: We allow identifiers as arguments for PTFs which are expanded with catalog and database if necessary. Those identifier arguments translate to catalog lookups for table and models. The ML_ functions will make sure that the arguments are of correct type model or table. SELECT f1, f2, label FROM ML_PREDICT( input => `my_data`, model => `my_cat`.`my_db`.`classifier_model`, args => DESCRIPTOR(f1, f2)); So this will allow us to also use in the future: SELECT * FROM poly_func(table1); Same support as Oracle [1]. Very concise. Let me know when you updated the FLIP for a final review before voting. Do others have additional objections? Regards, Timo [1] https://livesql.oracle.com/apex/livesql/file/content_HQK7TYEO0NHSJCDY3LN2ERDV6.html On 25.03.24 23:40, Hao Li wrote: Hi Timo, Please double check if this is implementable with the current stack. I fear the parser or validator might not like the "identifier" argument? I checked this, currently the validator throws an exception trying to get the full qualifier name for `classifier_model`. But since `SqlValidatorImpl` is implemented in Flink, we should be able to fix this. The only caveator is if not full model path is provided, the qualifier is interpreted as a column. We should be able to special handle this by rewriting the `ml_predict` function to add the catalog and database name in `FlinkCalciteSqlValidator` though. SELECT f1, f2, label FROM ML_PREDICT( TABLE `my_data`, my_cat.my_db.classifier_model, DESCRIPTOR(f1, f2)) SELECT f1, f2, label FROM ML_PREDICT( input => TABLE `my_data`, model => my_cat.my_db.classifier_model, args => DESCRIPTOR(f1, f2)) I verified these can be parsed. The problem is in validator for qualifier as mentioned above. So the safest option would be the long-term solution: SELECT f1, f2, label FROM ML_PREDICT( input => TABLE(my_data), model => MODEL(my_cat.my_db.classifier_model), args => DESCRIPTOR(f1, f2)) `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't work since `TABLE` and `MODEL` are already key words in calcite used by `CREATE TABLE`, `CREATE MODEL`. Changing to `model_name(...)` works and will be treated as a function. So I think SELECT f1, f2, label FROM ML_PREDICT( input => TABLE `my_data`, model => my_cat.my_db.classifier_model, args => DESCRIPTOR(f1, f2)) should be fine for now. For the syntax part: 1). Sounds good. We can drop model task and model kind from the definition. They can be deduced from the options. 2). Sure. We can add temporary model 3). Make sense. We can use `show create model ` to display all information and `describe model ` to show input/output schema Thanks, Hao On Mon, Mar 25, 2024 at 3:21 PM Hao Li wrote: Hi Ahmed, Looks like the feature freeze time for 1.20 release is June 15th. We can definitely get the model DDL into 1.20. For predict and evaluate functions, if we can't get into the 1.20 release, we can get them into the 1.21 release for sure. Thanks, Hao On Mon, Mar 25, 2024 at 1:25 AM Timo Walther wrote: Hi Jark and Hao, thanks for the information, Jark! Great that the Calcite community already fixed the problem for us. +1 to adopt the simplified syntax asap. Maybe even before we upgrade Calcite (i.e. copy over classes), if upgrading Calcite is too much work right now? > Is `DESCRIPTOR` a must in the syntax? Yes, we should still stick to the standard as much as possible and all vendors use DESCRIPTOR/COLUMNS for distinuishing columns vs. literal arguments. So the final syntax of this discussion would be: SELECT f1, f2, label FROM ML_PREDICT(TABLE `my_data`, `classifier_model`, DESCRIPTOR(f1, f2)) SELECT * FROM ML_EVALUATE(TABLE `eval_data`, `classifier_model`, DESCRIPTOR(f1, f2)) Please double check if this is implementable with the current stack. I fear the parser or validator might not like the "identifier" argument? Make sure that also these variations are supported: SELECT f1, f2, label FROM ML_PREDICT( TABLE `my_data`, my_cat.my_db.classifier_model, DESCRIPTOR(f1, f2)) SELECT f1, f2, label FROM ML_PREDICT( input => TABLE `my_data`, model => my_cat.my_db.classifier_model, args => DESCRIPTOR(f1, f2)) It might be safer and more future proof to wrap a MODEL()
[DISCUSS] FLIP-XXX: Introduce Flink SQL variables
Hello devs, I would like to start a discussion about FLIP-XXX: Introduce Flink SQL variables [1]. The main motivation behing this change is to be able to abstract Flink SQL from environment-specific configuration and provide a way to carry jobs between environments (e.g. dev-stage-prod) without the need to make changes in the code. It can also be a way to decouple sensitive information from the job code, or help with redundant literals. The main decision regarding the proposed solution is to handle the variable resolution as early as possible on the given string statement, so the whole operation is an easy and lightweight string replace. But this approach introduces some limitations as well: - The executed SQL will always be the unresolved, raw string, so in case of secrets a DESC operation would show them. - Changing the value of a variable can break code that uses that variable. For more details, please check the FLIP [1]. There is also a stale Jira about this [2]. Looking forward to any comments and opinions! Thanks, Ferenc [1] https://docs.google.com/document/d/1-eUz-PBCdqNggG_irDT0X7fdL61ysuHOaWnrkZHb5Hc/edit?usp=sharing [2] https://issues.apache.org/jira/browse/FLINK-17377
[jira] [Created] (FLINK-34939) Harden TestingLeaderElection
Matthias Pohl created FLINK-34939: - Summary: Harden TestingLeaderElection Key: FLINK-34939 URL: https://issues.apache.org/jira/browse/FLINK-34939 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl The {{TestingLeaderElection}} implementation does not follow the interface contract of {{LeaderElection}} in all of its facets (e.g. leadership acquire and revocation events should be alternating). This issue is about hardening {{LeaderElection}} contract in the test implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34938) Incorrect behaviour for comparison functions
Dawid Wysakowicz created FLINK-34938: Summary: Incorrect behaviour for comparison functions Key: FLINK-34938 URL: https://issues.apache.org/jira/browse/FLINK-34938 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.19.0 Reporter: Dawid Wysakowicz Assignee: Dawid Wysakowicz Fix For: 1.20.0 There are a few issues with comparison functions. Some versions throw: {code} Incomparable types: TIMESTAMP_LTZ(3) NOT NULL and TIMESTAMP(3) {code} Results of some depend on the comparison order, because the least restrictive precision is not calculated correctly. E.g. {code} final Instant ltz3 = Instant.ofEpochMilli(1_123); final Instant ltz0 = Instant.ofEpochMilli(1_000); TestSetSpec.forFunction(BuiltInFunctionDefinitions.EQUALS) .onFieldsWithData(ltz3, ltz0) .andDataTypes(TIMESTAMP_LTZ(3), TIMESTAMP_LTZ(0)) // compare same type, but different precision, should always adjust to the higher precision .testResult($("f0").isEqual($("f1")), "f0 = f1", false, DataTypes.BOOLEAN()) .testResult($("f1").isEqual($("f0")), "f1 = f0", true /* but should be false */, DataTypes.BOOLEAN()) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34937) Apache Infra GHA policy update
Matthias Pohl created FLINK-34937: - Summary: Apache Infra GHA policy update Key: FLINK-34937 URL: https://issues.apache.org/jira/browse/FLINK-34937 Project: Flink Issue Type: Bug Components: Build System / CI Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl There is a policy update [announced in the infra ML|https://lists.apache.org/thread/6qw21x44q88rc3mhkn42jgjjw94rsvb1] which asked Apache projects to limit the number of runners per job. Additionally, the [GHA policy|https://infra.apache.org/github-actions-policy.html] is referenced which I wasn't aware of when working on the action workflow. This issue is about applying the policy to the Flink GHA workflows. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Bug report for reading Hive table as streaming source.
Hi, I found a weird bug when reading a Hive table as a streaming source. In summary, if the first partition is not time related, then the Hive table cannot be read as a streaming source. e.g. I've a Hive table in the definition of ``` CREATE TABLE article ( id BIGINT, edition STRING, dt STRING, hh STRING ) PARTITIONED BY (edition, dt, hh) USING orc; ``` Then I try to query it as a streaming source: ``` INSERT INTO kafka_sink SELECT id FROM article /*+ OPTIONS('streaming-source.enable' = 'true', 'streaming-source.partition-order' = 'partition-name', 'streaming-source.consume-start-offset' = 'edition=en_US/dt=2024-03-26/hh=00') */ ``` And I see no output in the `kafka_sink`. Then I defined an external table pointing to the same path but has no `edition` partition, ``` CREATE TABLE en_article ( id BIGINT, edition STRING, dt STRING, hh STRING ) PARTITIONED BY (edition, dt, hh) LOCATION 's3://xxx/article/edition=en_US' USING orc; ``` And insert with the following statement: ``` INSERT INTO kafka_sink SELECT id FROM en_article /*+ OPTIONS('streaming-source.enable' = 'true', 'streaming-source.partition-order' = 'partition-name', 'streaming-source.consume-start-offset' = 'dt=2024-03-26/hh=00') */ ``` The data is sinked.
[jira] [Created] (FLINK-34936) Register shared state files to FileMergingSnapshotManager
Jinzhong Li created FLINK-34936: --- Summary: Register shared state files to FileMergingSnapshotManager Key: FLINK-34936 URL: https://issues.apache.org/jira/browse/FLINK-34936 Project: Flink Issue Type: New Feature Reporter: Jinzhong Li Fix For: 1.20.0 The shared state files should be registered into the FileMergingSnapshotManager, so that these files can be properly cleaned up when checkpoint aborted/subsumed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Flink Website Menu Adjustment
+1 for the proposal. Best, Hang Hangxiang Yu 于2024年3月26日周二 13:40写道: > Thanks Zhongqiang for driving this. > +1 for the proposal. > > On Tue, Mar 26, 2024 at 1:36 PM Shawn Huang wrote: > > > +1 for the proposal > > > > Best, > > Shawn Huang > > > > > > Hongshun Wang 于2024年3月26日周二 11:56写道: > > > > > +1 for the proposal > > > > > > Best Regards, > > > Hongshun Wang > > > > > > On Tue, Mar 26, 2024 at 11:37 AM gongzhongqiang < > > gongzhongqi...@apache.org > > > > > > > wrote: > > > > > > > Hi Martijn, > > > > > > > > Thank you for your feedback. > > > > > > > > I agree with your point that we should make a one-time update to the > > > menu, > > > > rather than continuously updating it. This will be done unless some > > > > sub-projects are moved or archived. > > > > > > > > Best regards, > > > > > > > > Zhongqiang Gong > > > > > > > > > > > > Martijn Visser 于2024年3月25日周一 23:35写道: > > > > > > > > > Hi Zhongqiang Gong, > > > > > > > > > > Are you suggesting to continuously update the menu based on the > > number > > > of > > > > > releases, or just this one time? I wouldn't be in favor of > > continuously > > > > > updating: returning customers expect a certain order in the menu, > > and I > > > > > don't see a lot of value in continuously changing that. I do think > > that > > > > the > > > > > order that you have currently proposed is better then the one we > have > > > > right > > > > > now, so I would +1 a one-time update but not a continuously > updating > > > > order. > > > > > > > > > > Best regards, > > > > > > > > > > Martijn > > > > > > > > > > On Mon, Mar 25, 2024 at 4:15 PM Yanquan Lv > > > wrote: > > > > > > > > > > > +1 for this proposal. > > > > > > > > > > > > gongzhongqiang 于2024年3月25日周一 > 15:49写道: > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > I'd like to start a discussion on adjusting the Flink website > [1] > > > > menu > > > > > to > > > > > > > improve accuracy and usability.While migrating Flink CDC > > > > documentation > > > > > > > to the website, I found outdated links, need to review and > update > > > > menus > > > > > > > for the most relevant information for our users. > > > > > > > > > > > > > > > > > > > > > Proposal: > > > > > > > > > > > > > > - Remove Paimon [2] from the "Getting Started" and > > "Documentation" > > > > > menus: > > > > > > > Paimon [2] is now an independent top project of ASF. CC: > jingsong > > > > lees > > > > > > > > > > > > > > - Sort the projects in the subdirectory by the activity of the > > > > > projects. > > > > > > > Here I list the number of releases for each project in the past > > > year. > > > > > > > > > > > > > > Flink Kubernetes Operator : 7 > > > > > > > Flink CDC : 5 > > > > > > > Flink ML : 2 > > > > > > > Flink Stateful Functions : 1 > > > > > > > > > > > > > > > > > > > > > Expected Outcome : > > > > > > > > > > > > > > - Menu "Getting Started" > > > > > > > > > > > > > > Before: > > > > > > > > > > > > > > With Flink > > > > > > > > > > > > > > With Flink Stateful Functions > > > > > > > > > > > > > > With Flink ML > > > > > > > > > > > > > > With Flink Kubernetes Operator > > > > > > > > > > > > > > With Paimon(incubating) (formerly Flink Table Store) > > > > > > > > > > > > > > With Flink CDC > > > > > > > > > > > > > > Training Course > > > > > > > > > > > > > > > > > > > > > After: > > > > > > > > > > > > > > With Flink > > > > > > > With Flink Kubernetes Operator > > > > > > > > > > > > > > With Flink CDC > > > > > > > > > > > > > > With Flink ML > > > > > > > > > > > > > > With Flink Stateful Functions > > > > > > > > > > > > > > Training Course > > > > > > > > > > > > > > > > > > > > > - Menu "Documentation" will same with "Getting Started" > > > > > > > > > > > > > > > > > > > > > I look forward to hearing your thoughts and suggestions on this > > > > > proposal. > > > > > > > > > > > > > > [1] https://flink.apache.org/ > > > > > > > [2] https://github.com/apache/incubator-paimon > > > > > > > [3] https://github.com/apache/flink-statefun > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > Zhongqiang Gong > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Best, > Hangxiang. >