Re: Re:Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-26 Thread Yun Tang
Congratulations, cool works!

Best
Yun Tang



From: Feifan Wang 
Sent: Wednesday, March 27, 2024 10:27
To: dev@flink.apache.org 
Subject: Re:Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

Congratulations !


在 2024-03-22 12:04:39,"Hangxiang Yu"  写道:
>Congratulations!
>Thanks for the efforts.
>
>On Fri, Mar 22, 2024 at 10:00 AM Yanfei Lei  wrote:
>
>> Congratulations!
>>
>> Best regards,
>> Yanfei
>>
>> Xuannan Su  于2024年3月22日周五 09:21写道:
>> >
>> > Congratulations!
>> >
>> > Best regards,
>> > Xuannan
>> >
>> > On Fri, Mar 22, 2024 at 9:17 AM Charles Zhang 
>> wrote:
>> > >
>> > > Congratulations!
>> > >
>> > > Best wishes,
>> > > Charles Zhang
>> > > from Apache InLong
>> > >
>> > >
>> > > Jeyhun Karimov  于2024年3月22日周五 04:16写道:
>> > >
>> > > > Great news! Congratulations!
>> > > >
>> > > > Regards,
>> > > > Jeyhun
>> > > >
>> > > > On Thu, Mar 21, 2024 at 2:00 PM Yuxin Tan 
>> wrote:
>> > > >
>> > > > > Congratulations! Thanks for the efforts.
>> > > > >
>> > > > >
>> > > > > Best,
>> > > > > Yuxin
>> > > > >
>> > > > >
>> > > > > Samrat Deb  于2024年3月21日周四 20:28写道:
>> > > > >
>> > > > > > Congratulations !
>> > > > > >
>> > > > > > Bests
>> > > > > > Samrat
>> > > > > >
>> > > > > > On Thu, 21 Mar 2024 at 5:52 PM, Ahmed Hamdy <
>> hamdy10...@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > Congratulations, great work and great news.
>> > > > > > > Best Regards
>> > > > > > > Ahmed Hamdy
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, 21 Mar 2024 at 11:41, Benchao Li > >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > Congratulations, and thanks for the great work!
>> > > > > > > >
>> > > > > > > > Yuan Mei  于2024年3月21日周四 18:31写道:
>> > > > > > > > >
>> > > > > > > > > Thanks for driving these efforts!
>> > > > > > > > >
>> > > > > > > > > Congratulations
>> > > > > > > > >
>> > > > > > > > > Best
>> > > > > > > > > Yuan
>> > > > > > > > >
>> > > > > > > > > On Thu, Mar 21, 2024 at 4:35 PM Yu Li 
>> wrote:
>> > > > > > > > >
>> > > > > > > > > > Congratulations and look forward to its further
>> development!
>> > > > > > > > > >
>> > > > > > > > > > Best Regards,
>> > > > > > > > > > Yu
>> > > > > > > > > >
>> > > > > > > > > > On Thu, 21 Mar 2024 at 15:54, ConradJam <
>> jam.gz...@gmail.com>
>> > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > Congrattulations!
>> > > > > > > > > > >
>> > > > > > > > > > > Leonard Xu  于2024年3月20日周三 21:36写道:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hi devs and users,
>> > > > > > > > > > > >
>> > > > > > > > > > > > We are thrilled to announce that the donation of
>> Flink CDC
>> > > > > as a
>> > > > > > > > > > > > sub-project of Apache Flink has completed. We invite
>> you to
>> > > > > > > explore
>> > > > > > > > > > the new
>> > > > > > > > > > > > resources available:
>> > > > > > > > > > > >
>> > > > > > > > > > > > - GitHub Repository:
>> https://github.com/apache/flink-cdc
>> > > > > > > > > > > > - Flink CDC Documentation:
>> > > > > > > > > > > >
>> https://nightlies.apache.org/flink/flink-cdc-docs-stable
>> > > > > > > > > > > >
>> > > > > > > > > > > > After Flink community accepted this donation[1], we
>> have
>> > > > > > > completed
>> > > > > > > > > > > > software copyright signing, code repo migration, code
>> > > > > cleanup,
>> > > > > > > > website
>> > > > > > > > > > > > migration, CI migration and github issues migration
>> etc.
>> > > > > > > > > > > > Here I am particularly grateful to Hang Ruan,
>> Zhongqaing
>> > > > > Gong,
>> > > > > > > > > > Qingsheng
>> > > > > > > > > > > > Ren, Jiabao Sun, LvYanquan, loserwang1024 and other
>> > > > > > contributors
>> > > > > > > > for
>> > > > > > > > > > their
>> > > > > > > > > > > > contributions and help during this process!
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > For all previous contributors: The contribution
>> process has
>> > > > > > > > slightly
>> > > > > > > > > > > > changed to align with the main Flink project. To
>> report
>> > > > bugs
>> > > > > or
>> > > > > > > > > > suggest new
>> > > > > > > > > > > > features, please open tickets
>> > > > > > > > > > > > Apache Jira (https://issues.apache.org/jira).  Note
>> that
>> > > > we
>> > > > > > will
>> > > > > > > > no
>> > > > > > > > > > > > longer accept GitHub issues for these purposes.
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Welcome to explore the new repository and
>> documentation.
>> > > > Your
>> > > > > > > > feedback
>> > > > > > > > > > and
>> > > > > > > > > > > > contributions are invaluable as we continue to
>> improve
>> > > > Flink
>> > > > > > CDC.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks everyone for your support and happy exploring
>> Flink
>> > > > > CDC!
>> > > > > > > > > > > >
>> > > > > > > > > > > > Best,
>> > > > > > > > > > > > Leonard
>> > > > > > > > > > > > [1]
>> > > > > > > >
>> 

Re: [DISCUSS] FLIP-423 ~FLIP-428: Introduce Disaggregated State Storage and Management in Flink 2.0

2024-03-26 Thread Zakelly Lan
Hi devs,

It seems there is no more concern or suggestion for a while. We'd like to
start voting recently.


Best,
Zakelly

On Wed, Mar 27, 2024 at 11:46 AM Zakelly Lan  wrote:

> Hi everyone,
>
> Piotr and I had a long discussion about the checkpoint duration issue. We
> think that the lambda serialization approach I proposed in last mail may
> bring in more issues, the most serious one is that users may not be able to
> modify their code in serialized lambda to perform a bug fix.
>
> But fortunately we found a possible solution. By introducing a set of
> declarative APIs and a `declareProcess()` function that users should
> implement in some newly introduced AbstractStreamOperator, we could get the
> declaration of record processing in runtime, broken down to requests and
> callbacks (lambdas) like FLIP-424 introduced. Thus we avoid the problem of
> lambda (de)serialization and instead we retrieve callbacks every time
> before a task runs. The next step is to provide an API allowing users to
> assign an unique id to each state request and callback, or automatically
> assign one by declaration order. Thus we can find the corresponding
> callback in runtime for each restored state request based on the id, then
> the whole pipeline can be resumed.
>
> Note that all these above are internal and won't expose to average users.
> Exposing this on Stream APIs can be discussed later. I will prepare another
> FLIP(s) describing the whole optimized checkpointing process, and in the
> meantime, we can proceed on current FLIPs. The new FLIP(s) are built on top
> of current ones and we can achieve this incrementally.
>
>
> Best,
> Zakelly
>
> On Thu, Mar 21, 2024 at 12:31 AM Zakelly Lan 
> wrote:
>
>> Hi Piotrek,
>>
>> Thanks for your comments!
>>
>> As we discussed off-line, you agreed that we can not checkpoint while some
>>> records are in the middle of being
>>> processed. That we would have to drain the in-progress records before
>>> doing
>>> the checkpoint. You also argued
>>> that this is not a problem, because the size of this buffer can be
>>> configured.
>>>
>>> I'm really afraid of such a solution. I've seen in the past plenty of
>>> times, that whenever Flink has to drain some
>>> buffered records, eventually that always brakes timely checkpointing (and
>>> hence ability for Flink to rescale in
>>> a timely manner). Even a single record with a `flatMap` like operator
>>> currently in Flink causes problems during
>>> back pressure. That's especially true for example for processing
>>> watermarks. At the same time, I don't see how
>>> this value could be configured by even Flink's power users, let alone an
>>> average user. The size of that in-flight
>>> buffer not only depends on a particular query/job, but also the "good"
>>> value changes dynamically over time,
>>> and can change very rapidly. Sudden spikes of records or backpressure,
>>> some
>>> hiccup during emitting watermarks,
>>> all of those could change in an instant the theoretically optimal buffer
>>> size of let's say "6000" records, down to "1".
>>> And when those changes happen, those are the exact times when timely
>>> checkpointing matters the most.
>>> If the load pattern suddenly changes, and checkpointing takes suddenly
>>> tens
>>> of minutes instead of a couple of
>>> seconds, it means you can not use rescaling and you are forced to
>>> overprovision the resources. And there also
>>> other issues if checkpointing takes too long.
>>>
>>
>> I'm gonna clarify some misunderstanding here. First of all, is the sync
>> phase of checkpointing for the current plan longer than the synchronous
>> execution model? The answer is yes, it is a trade-off for parallel
>> execution model. I think the cost is worth the improvement. Now the
>> question is, how much longer are we talking about? The PoC result I
>> provided is that it takes 3 seconds to drain 6000 records of a simple job,
>> and I said it is not a big deal. Even though you would say we may encounter
>> long watermark/timer processing that make the cp wait, thus I provide
>> several ways to optimize this:
>>
>>1. Instead of only controlling the in-flight records, we could
>>control the in-flight watermark.
>>2. Since we have broken down the record processing into several state
>>requests with at most one subsequent callback for each request, the cp can
>>be processed after current RUNNING requests (NOT records) and their
>>callbacks finish. Which means, even though we have a lot of records
>>in-flight (I mean in 'processElement' here), once only a small group of
>>state requests finishes, the cp can proceed. They will form into 1 or 2
>>multiGets to rocksdb, which takes less time. Moreover, the timer 
>> processing
>>is also split into several requests, so cp won't wait for the whole timer
>>to finish. The picture attached describes this idea.
>>
>> And the size of this buffer can be configured. I'm not counting on
>> average users to configure 

[jira] [Created] (FLINK-34945) Support recover shuffle descriptor and partition metrics from tiered storage

2024-03-26 Thread Junrui Li (Jira)
Junrui Li created FLINK-34945:
-

 Summary: Support recover shuffle descriptor and partition metrics 
from tiered storage
 Key: FLINK-34945
 URL: https://issues.apache.org/jira/browse/FLINK-34945
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Junrui Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-423 ~FLIP-428: Introduce Disaggregated State Storage and Management in Flink 2.0

2024-03-26 Thread Zakelly Lan
Hi everyone,

Piotr and I had a long discussion about the checkpoint duration issue. We
think that the lambda serialization approach I proposed in last mail may
bring in more issues, the most serious one is that users may not be able to
modify their code in serialized lambda to perform a bug fix.

But fortunately we found a possible solution. By introducing a set of
declarative APIs and a `declareProcess()` function that users should
implement in some newly introduced AbstractStreamOperator, we could get the
declaration of record processing in runtime, broken down to requests and
callbacks (lambdas) like FLIP-424 introduced. Thus we avoid the problem of
lambda (de)serialization and instead we retrieve callbacks every time
before a task runs. The next step is to provide an API allowing users to
assign an unique id to each state request and callback, or automatically
assign one by declaration order. Thus we can find the corresponding
callback in runtime for each restored state request based on the id, then
the whole pipeline can be resumed.

Note that all these above are internal and won't expose to average users.
Exposing this on Stream APIs can be discussed later. I will prepare another
FLIP(s) describing the whole optimized checkpointing process, and in the
meantime, we can proceed on current FLIPs. The new FLIP(s) are built on top
of current ones and we can achieve this incrementally.


Best,
Zakelly

On Thu, Mar 21, 2024 at 12:31 AM Zakelly Lan  wrote:

> Hi Piotrek,
>
> Thanks for your comments!
>
> As we discussed off-line, you agreed that we can not checkpoint while some
>> records are in the middle of being
>> processed. That we would have to drain the in-progress records before
>> doing
>> the checkpoint. You also argued
>> that this is not a problem, because the size of this buffer can be
>> configured.
>>
>> I'm really afraid of such a solution. I've seen in the past plenty of
>> times, that whenever Flink has to drain some
>> buffered records, eventually that always brakes timely checkpointing (and
>> hence ability for Flink to rescale in
>> a timely manner). Even a single record with a `flatMap` like operator
>> currently in Flink causes problems during
>> back pressure. That's especially true for example for processing
>> watermarks. At the same time, I don't see how
>> this value could be configured by even Flink's power users, let alone an
>> average user. The size of that in-flight
>> buffer not only depends on a particular query/job, but also the "good"
>> value changes dynamically over time,
>> and can change very rapidly. Sudden spikes of records or backpressure,
>> some
>> hiccup during emitting watermarks,
>> all of those could change in an instant the theoretically optimal buffer
>> size of let's say "6000" records, down to "1".
>> And when those changes happen, those are the exact times when timely
>> checkpointing matters the most.
>> If the load pattern suddenly changes, and checkpointing takes suddenly
>> tens
>> of minutes instead of a couple of
>> seconds, it means you can not use rescaling and you are forced to
>> overprovision the resources. And there also
>> other issues if checkpointing takes too long.
>>
>
> I'm gonna clarify some misunderstanding here. First of all, is the sync
> phase of checkpointing for the current plan longer than the synchronous
> execution model? The answer is yes, it is a trade-off for parallel
> execution model. I think the cost is worth the improvement. Now the
> question is, how much longer are we talking about? The PoC result I
> provided is that it takes 3 seconds to drain 6000 records of a simple job,
> and I said it is not a big deal. Even though you would say we may encounter
> long watermark/timer processing that make the cp wait, thus I provide
> several ways to optimize this:
>
>1. Instead of only controlling the in-flight records, we could control
>the in-flight watermark.
>2. Since we have broken down the record processing into several state
>requests with at most one subsequent callback for each request, the cp can
>be processed after current RUNNING requests (NOT records) and their
>callbacks finish. Which means, even though we have a lot of records
>in-flight (I mean in 'processElement' here), once only a small group of
>state requests finishes, the cp can proceed. They will form into 1 or 2
>multiGets to rocksdb, which takes less time. Moreover, the timer processing
>is also split into several requests, so cp won't wait for the whole timer
>to finish. The picture attached describes this idea.
>
> And the size of this buffer can be configured. I'm not counting on average
> users to configure it well, I'm just saying that we'd better not focus on
> absolute numbers of PoC or specific cases since we can always provide
> a conservative default value to make this acceptable for most cases.The
> adaptive buffer size is also worth a try if we provide a conservative
> strategy.
>
> Besides, I 

[jira] [Created] (FLINK-34944) Use Incremental Source Framework in Flink CDC OceanBase Source Connector

2024-03-26 Thread He Wang (Jira)
He Wang created FLINK-34944:
---

 Summary: Use Incremental Source Framework in Flink CDC OceanBase 
Source Connector
 Key: FLINK-34944
 URL: https://issues.apache.org/jira/browse/FLINK-34944
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: He Wang
 Fix For: cdc-3.1.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-26 Thread Jark Wu
Thanks for the PoC and updating,

The final syntax looks good to me, at least it is a nice and concise first
step.

SELECT f1, f2, label FROM
   ML_PREDICT(
 input => `my_data`,
 model => `my_cat`.`my_db`.`classifier_model`,
 args => DESCRIPTOR(f1, f2));

Besides, what built-in models will we support in the FLIP? This might be
important
because it relates to what use cases can run with the new Flink version out
of the box.

Best,
Jark

On Wed, 27 Mar 2024 at 01:10, Hao Li  wrote:

> Hi Timo,
>
> Yeah. For `primary key` and `from table(...)` those are explicitly matched
> in parser: [1].
>
> > SELECT f1, f2, label FROM
>ML_PREDICT(
>  input => `my_data`,
>  model => `my_cat`.`my_db`.`classifier_model`,
>  args => DESCRIPTOR(f1, f2));
>
> This named argument syntax looks good to me. It can be supported together
> with
>
> SELECT f1, f2, label FROM ML_PREDICT(`my_data`,
> `my_cat`.`my_db`.`classifier_model`,DESCRIPTOR(f1, f2));
>
> Sure. Will let you know once updated the FLIP.
>
> [1]
>
> https://github.com/confluentinc/flink/blob/release-1.18-confluent/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl#L814
>
> Thanks,
> Hao
>
> On Tue, Mar 26, 2024 at 4:15 AM Timo Walther  wrote:
>
> > Hi Hao,
> >
> >  > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't
> >  > work since `TABLE` and `MODEL` are already key words
> >
> > This argument doesn't count. The parser supports introducing keywords
> > that are still non-reserved. For example, this enables using "key" for
> > both primary key and a column name:
> >
> > CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED)
> > WITH ('connector' = 'datagen');
> >
> > SELECT i AS key FROM t;
> >
> > I'm sure we will introduce `TABLE(my_data)` eventually as this is what
> > the standard dictates. But for now, let's use the most compact syntax
> > possible which is also in sync with Oracle.
> >
> > TLDR: We allow identifiers as arguments for PTFs which are expanded with
> > catalog and database if necessary. Those identifier arguments translate
> > to catalog lookups for table and models. The ML_ functions will make
> > sure that the arguments are of correct type model or table.
> >
> > SELECT f1, f2, label FROM
> >ML_PREDICT(
> >  input => `my_data`,
> >  model => `my_cat`.`my_db`.`classifier_model`,
> >  args => DESCRIPTOR(f1, f2));
> >
> > So this will allow us to also use in the future:
> >
> > SELECT * FROM poly_func(table1);
> >
> > Same support as Oracle [1]. Very concise.
> >
> > Let me know when you updated the FLIP for a final review before voting.
> >
> > Do others have additional objections?
> >
> > Regards,
> > Timo
> >
> > [1]
> >
> >
> https://livesql.oracle.com/apex/livesql/file/content_HQK7TYEO0NHSJCDY3LN2ERDV6.html
> >
> >
> >
> > On 25.03.24 23:40, Hao Li wrote:
> > > Hi Timo,
> > >
> > >> Please double check if this is implementable with the current stack. I
> > > fear the parser or validator might not like the "identifier" argument?
> > >
> > > I checked this, currently the validator throws an exception trying to
> get
> > > the full qualifier name for `classifier_model`. But since
> > > `SqlValidatorImpl` is implemented in Flink, we should be able to fix
> > this.
> > > The only caveator is if not full model path is provided,
> > > the qualifier is interpreted as a column. We should be able to special
> > > handle this by rewriting the `ml_predict` function to add the catalog
> and
> > > database name in `FlinkCalciteSqlValidator` though.
> > >
> > >> SELECT f1, f2, label FROM
> > > ML_PREDICT(
> > >   TABLE `my_data`,
> > >   my_cat.my_db.classifier_model,
> > >   DESCRIPTOR(f1, f2))
> > >
> > > SELECT f1, f2, label FROM
> > > ML_PREDICT(
> > >   input => TABLE `my_data`,
> > >   model => my_cat.my_db.classifier_model,
> > >   args => DESCRIPTOR(f1, f2))
> > >
> > > I verified these can be parsed. The problem is in validator for
> qualifier
> > > as mentioned above.
> > >
> > >> So the safest option would be the long-term solution:
> > >
> > > SELECT f1, f2, label FROM
> > > ML_PREDICT(
> > >   input => TABLE(my_data),
> > >   model => MODEL(my_cat.my_db.classifier_model),
> > >   args => DESCRIPTOR(f1, f2))
> > >
> > > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't
> work
> > > since `TABLE` and `MODEL` are already key words in calcite used by
> > `CREATE
> > > TABLE`, `CREATE MODEL`. Changing to `model_name(...)` works and will be
> > > treated as a function.
> > >
> > > So I think
> > >
> > > SELECT f1, f2, label FROM
> > > ML_PREDICT(
> > >   input => TABLE `my_data`,
> > >   model => my_cat.my_db.classifier_model,
> > >   args => DESCRIPTOR(f1, f2))
> > > should be fine for now.
> > >
> > > For the syntax part:
> > > 1). Sounds good. We can drop model task and model kind from the
> > definition.
> > > They can be deduced from the options.
> > >
> > > 2). Sure. We can add temporary 

Re: Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-26 Thread Jinzhong Li
Congratulations! Thanks for the great work!

Best,
Jinzhong

On Wed, Mar 27, 2024 at 10:28 AM Feifan Wang  wrote:

> Congratulations !
>
>
> 在 2024-03-22 12:04:39,"Hangxiang Yu"  写道:
> >Congratulations!
> >Thanks for the efforts.
> >
> >On Fri, Mar 22, 2024 at 10:00 AM Yanfei Lei  wrote:
> >
> >> Congratulations!
> >>
> >> Best regards,
> >> Yanfei
> >>
> >> Xuannan Su  于2024年3月22日周五 09:21写道:
> >> >
> >> > Congratulations!
> >> >
> >> > Best regards,
> >> > Xuannan
> >> >
> >> > On Fri, Mar 22, 2024 at 9:17 AM Charles Zhang  >
> >> wrote:
> >> > >
> >> > > Congratulations!
> >> > >
> >> > > Best wishes,
> >> > > Charles Zhang
> >> > > from Apache InLong
> >> > >
> >> > >
> >> > > Jeyhun Karimov  于2024年3月22日周五 04:16写道:
> >> > >
> >> > > > Great news! Congratulations!
> >> > > >
> >> > > > Regards,
> >> > > > Jeyhun
> >> > > >
> >> > > > On Thu, Mar 21, 2024 at 2:00 PM Yuxin Tan  >
> >> wrote:
> >> > > >
> >> > > > > Congratulations! Thanks for the efforts.
> >> > > > >
> >> > > > >
> >> > > > > Best,
> >> > > > > Yuxin
> >> > > > >
> >> > > > >
> >> > > > > Samrat Deb  于2024年3月21日周四 20:28写道:
> >> > > > >
> >> > > > > > Congratulations !
> >> > > > > >
> >> > > > > > Bests
> >> > > > > > Samrat
> >> > > > > >
> >> > > > > > On Thu, 21 Mar 2024 at 5:52 PM, Ahmed Hamdy <
> >> hamdy10...@gmail.com>
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > Congratulations, great work and great news.
> >> > > > > > > Best Regards
> >> > > > > > > Ahmed Hamdy
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Thu, 21 Mar 2024 at 11:41, Benchao Li <
> libenc...@apache.org
> >> >
> >> > > > wrote:
> >> > > > > > >
> >> > > > > > > > Congratulations, and thanks for the great work!
> >> > > > > > > >
> >> > > > > > > > Yuan Mei  于2024年3月21日周四 18:31写道:
> >> > > > > > > > >
> >> > > > > > > > > Thanks for driving these efforts!
> >> > > > > > > > >
> >> > > > > > > > > Congratulations
> >> > > > > > > > >
> >> > > > > > > > > Best
> >> > > > > > > > > Yuan
> >> > > > > > > > >
> >> > > > > > > > > On Thu, Mar 21, 2024 at 4:35 PM Yu Li  >
> >> wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Congratulations and look forward to its further
> >> development!
> >> > > > > > > > > >
> >> > > > > > > > > > Best Regards,
> >> > > > > > > > > > Yu
> >> > > > > > > > > >
> >> > > > > > > > > > On Thu, 21 Mar 2024 at 15:54, ConradJam <
> >> jam.gz...@gmail.com>
> >> > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > Congrattulations!
> >> > > > > > > > > > >
> >> > > > > > > > > > > Leonard Xu  于2024年3月20日周三
> 21:36写道:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Hi devs and users,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > We are thrilled to announce that the donation of
> >> Flink CDC
> >> > > > > as a
> >> > > > > > > > > > > > sub-project of Apache Flink has completed. We
> invite
> >> you to
> >> > > > > > > explore
> >> > > > > > > > > > the new
> >> > > > > > > > > > > > resources available:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > - GitHub Repository:
> >> https://github.com/apache/flink-cdc
> >> > > > > > > > > > > > - Flink CDC Documentation:
> >> > > > > > > > > > > >
> >> https://nightlies.apache.org/flink/flink-cdc-docs-stable
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > After Flink community accepted this donation[1],
> we
> >> have
> >> > > > > > > completed
> >> > > > > > > > > > > > software copyright signing, code repo migration,
> code
> >> > > > > cleanup,
> >> > > > > > > > website
> >> > > > > > > > > > > > migration, CI migration and github issues
> migration
> >> etc.
> >> > > > > > > > > > > > Here I am particularly grateful to Hang Ruan,
> >> Zhongqaing
> >> > > > > Gong,
> >> > > > > > > > > > Qingsheng
> >> > > > > > > > > > > > Ren, Jiabao Sun, LvYanquan, loserwang1024 and
> other
> >> > > > > > contributors
> >> > > > > > > > for
> >> > > > > > > > > > their
> >> > > > > > > > > > > > contributions and help during this process!
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > For all previous contributors: The contribution
> >> process has
> >> > > > > > > > slightly
> >> > > > > > > > > > > > changed to align with the main Flink project. To
> >> report
> >> > > > bugs
> >> > > > > or
> >> > > > > > > > > > suggest new
> >> > > > > > > > > > > > features, please open tickets
> >> > > > > > > > > > > > Apache Jira (https://issues.apache.org/jira).
> Note
> >> that
> >> > > > we
> >> > > > > > will
> >> > > > > > > > no
> >> > > > > > > > > > > > longer accept GitHub issues for these purposes.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Welcome to explore the new repository and
> >> documentation.
> >> > > > Your
> >> > > > > > > > feedback
> >> > > > > > > > > > and
> >> > > > > > > > > > > > contributions are invaluable as we continue to
> >> improve
> >> > > > Flink
> >> > > > > > CDC.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thanks everyone for your support and 

Re:Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-26 Thread Feifan Wang
Congratulations !


在 2024-03-22 12:04:39,"Hangxiang Yu"  写道:
>Congratulations!
>Thanks for the efforts.
>
>On Fri, Mar 22, 2024 at 10:00 AM Yanfei Lei  wrote:
>
>> Congratulations!
>>
>> Best regards,
>> Yanfei
>>
>> Xuannan Su  于2024年3月22日周五 09:21写道:
>> >
>> > Congratulations!
>> >
>> > Best regards,
>> > Xuannan
>> >
>> > On Fri, Mar 22, 2024 at 9:17 AM Charles Zhang 
>> wrote:
>> > >
>> > > Congratulations!
>> > >
>> > > Best wishes,
>> > > Charles Zhang
>> > > from Apache InLong
>> > >
>> > >
>> > > Jeyhun Karimov  于2024年3月22日周五 04:16写道:
>> > >
>> > > > Great news! Congratulations!
>> > > >
>> > > > Regards,
>> > > > Jeyhun
>> > > >
>> > > > On Thu, Mar 21, 2024 at 2:00 PM Yuxin Tan 
>> wrote:
>> > > >
>> > > > > Congratulations! Thanks for the efforts.
>> > > > >
>> > > > >
>> > > > > Best,
>> > > > > Yuxin
>> > > > >
>> > > > >
>> > > > > Samrat Deb  于2024年3月21日周四 20:28写道:
>> > > > >
>> > > > > > Congratulations !
>> > > > > >
>> > > > > > Bests
>> > > > > > Samrat
>> > > > > >
>> > > > > > On Thu, 21 Mar 2024 at 5:52 PM, Ahmed Hamdy <
>> hamdy10...@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > Congratulations, great work and great news.
>> > > > > > > Best Regards
>> > > > > > > Ahmed Hamdy
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, 21 Mar 2024 at 11:41, Benchao Li > >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > Congratulations, and thanks for the great work!
>> > > > > > > >
>> > > > > > > > Yuan Mei  于2024年3月21日周四 18:31写道:
>> > > > > > > > >
>> > > > > > > > > Thanks for driving these efforts!
>> > > > > > > > >
>> > > > > > > > > Congratulations
>> > > > > > > > >
>> > > > > > > > > Best
>> > > > > > > > > Yuan
>> > > > > > > > >
>> > > > > > > > > On Thu, Mar 21, 2024 at 4:35 PM Yu Li 
>> wrote:
>> > > > > > > > >
>> > > > > > > > > > Congratulations and look forward to its further
>> development!
>> > > > > > > > > >
>> > > > > > > > > > Best Regards,
>> > > > > > > > > > Yu
>> > > > > > > > > >
>> > > > > > > > > > On Thu, 21 Mar 2024 at 15:54, ConradJam <
>> jam.gz...@gmail.com>
>> > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > Congrattulations!
>> > > > > > > > > > >
>> > > > > > > > > > > Leonard Xu  于2024年3月20日周三 21:36写道:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hi devs and users,
>> > > > > > > > > > > >
>> > > > > > > > > > > > We are thrilled to announce that the donation of
>> Flink CDC
>> > > > > as a
>> > > > > > > > > > > > sub-project of Apache Flink has completed. We invite
>> you to
>> > > > > > > explore
>> > > > > > > > > > the new
>> > > > > > > > > > > > resources available:
>> > > > > > > > > > > >
>> > > > > > > > > > > > - GitHub Repository:
>> https://github.com/apache/flink-cdc
>> > > > > > > > > > > > - Flink CDC Documentation:
>> > > > > > > > > > > >
>> https://nightlies.apache.org/flink/flink-cdc-docs-stable
>> > > > > > > > > > > >
>> > > > > > > > > > > > After Flink community accepted this donation[1], we
>> have
>> > > > > > > completed
>> > > > > > > > > > > > software copyright signing, code repo migration, code
>> > > > > cleanup,
>> > > > > > > > website
>> > > > > > > > > > > > migration, CI migration and github issues migration
>> etc.
>> > > > > > > > > > > > Here I am particularly grateful to Hang Ruan,
>> Zhongqaing
>> > > > > Gong,
>> > > > > > > > > > Qingsheng
>> > > > > > > > > > > > Ren, Jiabao Sun, LvYanquan, loserwang1024 and other
>> > > > > > contributors
>> > > > > > > > for
>> > > > > > > > > > their
>> > > > > > > > > > > > contributions and help during this process!
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > For all previous contributors: The contribution
>> process has
>> > > > > > > > slightly
>> > > > > > > > > > > > changed to align with the main Flink project. To
>> report
>> > > > bugs
>> > > > > or
>> > > > > > > > > > suggest new
>> > > > > > > > > > > > features, please open tickets
>> > > > > > > > > > > > Apache Jira (https://issues.apache.org/jira).  Note
>> that
>> > > > we
>> > > > > > will
>> > > > > > > > no
>> > > > > > > > > > > > longer accept GitHub issues for these purposes.
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Welcome to explore the new repository and
>> documentation.
>> > > > Your
>> > > > > > > > feedback
>> > > > > > > > > > and
>> > > > > > > > > > > > contributions are invaluable as we continue to
>> improve
>> > > > Flink
>> > > > > > CDC.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks everyone for your support and happy exploring
>> Flink
>> > > > > CDC!
>> > > > > > > > > > > >
>> > > > > > > > > > > > Best,
>> > > > > > > > > > > > Leonard
>> > > > > > > > > > > > [1]
>> > > > > > > >
>> https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > --
>> > > > > > > > > > > Best
>> > > > > > > > > > >
>> > > > > > > > > > > ConradJam
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > > 

[jira] [Created] (FLINK-34943) Support Flink 1.19, 1.20-SNAPSHOT for OpenSearch connector

2024-03-26 Thread Zhongqiang Gong (Jira)
Zhongqiang Gong created FLINK-34943:
---

 Summary: Support Flink 1.19, 1.20-SNAPSHOT for OpenSearch connector
 Key: FLINK-34943
 URL: https://issues.apache.org/jira/browse/FLINK-34943
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Reporter: Zhongqiang Gong






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34942) Support Flink 1.19, 1.20-SNAPSHOT for OpenSearch connector

2024-03-26 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-34942:
---

 Summary: Support Flink 1.19, 1.20-SNAPSHOT for OpenSearch connector
 Key: FLINK-34942
 URL: https://issues.apache.org/jira/browse/FLINK-34942
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Opensearch
Affects Versions: 3.1.0
Reporter: Sergey Nuyanzin
Assignee: Sergey Nuyanzin


Currently it fails with similar issue as FLINK-33493



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34941) Cannot convert org.apache.flink.streaming.api.CheckpointingMode to org.apache.flink.core.execution.CheckpointingMode

2024-03-26 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-34941:
---

 Summary: Cannot convert 
org.apache.flink.streaming.api.CheckpointingMode  to 
org.apache.flink.core.execution.CheckpointingMode
 Key: FLINK-34941
 URL: https://issues.apache.org/jira/browse/FLINK-34941
 Project: Flink
  Issue Type: Bug
  Components: API / Core, Connectors / ElasticSearch, Runtime / 
Checkpointing
Affects Versions: 1.20.0
Reporter: Sergey Nuyanzin
Assignee: Sergey Nuyanzin


After this change FLINK-34516 elasticsearch connector for 1.20-SNAPSHOT starts 
failing with
{noformat}
 Error:  
/home/runner/work/flink-connector-elasticsearch/flink-connector-elasticsearch/flink-connector-elasticsearch-e2e-tests/flink-connector-elasticsearch-e2e-tests-common/src/main/java/org/apache/flink/streaming/tests/ElasticsearchSinkE2ECaseBase.java:[75,5]
 method does not override or implement a method from a supertype
Error:  
/home/runner/work/flink-connector-elasticsearch/flink-connector-elasticsearch/flink-connector-elasticsearch-e2e-tests/flink-connector-elasticsearch-e2e-tests-common/src/main/java/org/apache/flink/streaming/tests/ElasticsearchSinkE2ECaseBase.java:[85,84]
 incompatible types: org.apache.flink.streaming.api.CheckpointingMode cannot be 
converted to org.apache.flink.core.execution.CheckpointingMode
{noformat}
https://github.com/apache/flink-connector-elasticsearch/actions/runs/8436631571/job/23104522666#step:15:12668

set blocker since now every build of elasticsearch connector against  
1.20-SNAPSHOT  is failing
probably same issue is for opensearch connector



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-03-26 Thread Kevin Lam
Thanks Anupam! Looking forward to it.

On Thu, Mar 14, 2024 at 1:50 AM Anupam Aggarwal 
wrote:

> Hi Kevin,
>
> Thanks, these are some great points.
> Just to clarify, I do agree that the subject should be an option (like in
> the case of RegistryAvroFormatFactory).
> We could fallback to subject and auto-register schemas, if schema-Id not
> provided explicitly.
> In general, I think it would be good to be more explicit about the schemas
> used (
>
> https://docs.confluent.io/platform/curren/schema-registry/schema_registry_onprem_tutorial.html#auto-schema-registration
> <
> https://docs.confluent.io/platform/current/schema-registry/schema_registry_onprem_tutorial.html#auto-schema-registration
> >
> ).
> This would also help prevent us from overriding the ids in incompatible
> ways.
>
> Under the current implementation of FlinkToProtoSchemaConverter we might
> end up overwriting the field-Ids.
> If we are able to locate a prior schema, the approach you outlined makes a
> lot of sense.
> Let me explore this a bit further and get back(in terms of feasibility).
>
> Thanks again!
> - Anupam
>
> On Wed, Mar 13, 2024 at 2:28 AM Kevin Lam 
> wrote:
>
> > Hi Anupam,
> >
> > Thanks again for your work on contributing this feature back.
> >
> > Sounds good re: the refactoring/re-organizing.
> >
> > Regarding the schema-id, in my opinion this should NOT be a configuration
> > option on the format. We should be able to deterministically map the
> Flink
> > type to the ProtoSchema and register that with the Schema Registry.
> >
> > I think it can make sense to provide the `subject` as a parameter, so
> that
> > the serialization format can look up existing schemas.
> >
> > This would be used in something I mentioned something related above
> >
> > > Another topic I had is Protobuf's field ids. Ideally in Flink it would
> be
> > > nice if we are idiomatic about not renumbering them in incompatible
> ways,
> > > similar to what's discussed on the Schema Registry issue here:
> > > https://github.com/confluentinc/schema-registry/issues/2551
> >
> >
> > When we construct the ProtobufSchema from the Flink LogicalType, we
> > shouldn't renumber the field ids in an incompatible way, so one approach
> > would be to use the subject to look up the most recent version, and use
> > that to evolve the field ids correctly.
> >
> >
> > On Tue, Mar 12, 2024 at 2:33 AM Anupam Aggarwal <
> anupam.aggar...@gmail.com
> > >
> > wrote:
> >
> > > Hi Kevin,
> > >
> > > Thanks for starting the discussion on this.
> > > I will be working on contributing this feature back (This was developed
> > by
> > > Dawid Wysakowicz and others at Confluent).
> > >
> > > I have opened a (very initial) draft PR here
> > > https://github.com/apache/flink/pull/24482 with our current
> > > implementation.
> > > Thanks for the feedback on the PR, I haven’t gotten around to
> > > re-organizing/refactoring the classes yet, but it would be inline with
> > some
> > > of your comments.
> > >
> > > On the overall approach there are some slight variations from the
> initial
> > > proposal.
> > > Our implementation relies on an explicit schema-id being passed through
> > the
> > > config. This could help in cases where one Flink type could potentially
> > map
> > > to multiple proto types.
> > > We could make the schema-Id optional and fall back to deriving it from
> > the
> > > rowType (during serialization) if not present?
> > >
> > > The message index handling is still TBD. I am thinking of replicating
> > logic
> > > in AbstractKafkaProtobufSerializer.java
> > > <
> > >
> >
> https://github.com/confluentinc/schema-registry/blob/342c8a9d3854d4253d785214f5dcfb1b6cc59a06/protobuf-serializer/src/main/java/io/confluent/kafka/serializers/protobuf/AbstractKafkaProtobufSerializer.java#L157
> > > >
> > >  (|Deserializer).
> > > Please let me know if this makes sense / or in case you have any other
> > > feedback.
> > >
> > > Thanks
> > > Anupam
> > >
> > > On Thu, Feb 29, 2024 at 8:54 PM Kevin Lam
>  > >
> > > wrote:
> > >
> > > > Hey Robert,
> > > >
> > > > Awesome thanks, that timeline works for me. Sounds good re: deciding
> on
> > > > FLIP once we have the PR, and thanks for looking into the field ids.
> > > >
> > > > Looking forward to it!
> > > >
> > > > On Thu, Feb 29, 2024 at 5:09 AM Robert Metzger 
> > > > wrote:
> > > >
> > > > > Hey Kevin,
> > > > >
> > > > > Thanks a lot. Then let's contribute the Confluent implementation to
> > > > > apache/flink. We can't start working on this immediately because
> of a
> > > > team
> > > > > event next week, but within the next two weeks, we will start
> working
> > > on
> > > > > this.
> > > > > It probably makes sense for us to open a pull request of what we
> have
> > > > > already, so that you can start reviewing and maybe also
> contributing
> > to
> > > > the
> > > > > PR.
> > > > > I hope this timeline works for you!
> > > > >
> > > > > Let's also decide if we need a FLIP once the code is public.
> > > > > We will look 

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-26 Thread Hao Li
Hi Timo,

Yeah. For `primary key` and `from table(...)` those are explicitly matched
in parser: [1].

> SELECT f1, f2, label FROM
   ML_PREDICT(
 input => `my_data`,
 model => `my_cat`.`my_db`.`classifier_model`,
 args => DESCRIPTOR(f1, f2));

This named argument syntax looks good to me. It can be supported together
with

SELECT f1, f2, label FROM ML_PREDICT(`my_data`,
`my_cat`.`my_db`.`classifier_model`,DESCRIPTOR(f1, f2));

Sure. Will let you know once updated the FLIP.

[1]
https://github.com/confluentinc/flink/blob/release-1.18-confluent/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl#L814

Thanks,
Hao

On Tue, Mar 26, 2024 at 4:15 AM Timo Walther  wrote:

> Hi Hao,
>
>  > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't
>  > work since `TABLE` and `MODEL` are already key words
>
> This argument doesn't count. The parser supports introducing keywords
> that are still non-reserved. For example, this enables using "key" for
> both primary key and a column name:
>
> CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED)
> WITH ('connector' = 'datagen');
>
> SELECT i AS key FROM t;
>
> I'm sure we will introduce `TABLE(my_data)` eventually as this is what
> the standard dictates. But for now, let's use the most compact syntax
> possible which is also in sync with Oracle.
>
> TLDR: We allow identifiers as arguments for PTFs which are expanded with
> catalog and database if necessary. Those identifier arguments translate
> to catalog lookups for table and models. The ML_ functions will make
> sure that the arguments are of correct type model or table.
>
> SELECT f1, f2, label FROM
>ML_PREDICT(
>  input => `my_data`,
>  model => `my_cat`.`my_db`.`classifier_model`,
>  args => DESCRIPTOR(f1, f2));
>
> So this will allow us to also use in the future:
>
> SELECT * FROM poly_func(table1);
>
> Same support as Oracle [1]. Very concise.
>
> Let me know when you updated the FLIP for a final review before voting.
>
> Do others have additional objections?
>
> Regards,
> Timo
>
> [1]
>
> https://livesql.oracle.com/apex/livesql/file/content_HQK7TYEO0NHSJCDY3LN2ERDV6.html
>
>
>
> On 25.03.24 23:40, Hao Li wrote:
> > Hi Timo,
> >
> >> Please double check if this is implementable with the current stack. I
> > fear the parser or validator might not like the "identifier" argument?
> >
> > I checked this, currently the validator throws an exception trying to get
> > the full qualifier name for `classifier_model`. But since
> > `SqlValidatorImpl` is implemented in Flink, we should be able to fix
> this.
> > The only caveator is if not full model path is provided,
> > the qualifier is interpreted as a column. We should be able to special
> > handle this by rewriting the `ml_predict` function to add the catalog and
> > database name in `FlinkCalciteSqlValidator` though.
> >
> >> SELECT f1, f2, label FROM
> > ML_PREDICT(
> >   TABLE `my_data`,
> >   my_cat.my_db.classifier_model,
> >   DESCRIPTOR(f1, f2))
> >
> > SELECT f1, f2, label FROM
> > ML_PREDICT(
> >   input => TABLE `my_data`,
> >   model => my_cat.my_db.classifier_model,
> >   args => DESCRIPTOR(f1, f2))
> >
> > I verified these can be parsed. The problem is in validator for qualifier
> > as mentioned above.
> >
> >> So the safest option would be the long-term solution:
> >
> > SELECT f1, f2, label FROM
> > ML_PREDICT(
> >   input => TABLE(my_data),
> >   model => MODEL(my_cat.my_db.classifier_model),
> >   args => DESCRIPTOR(f1, f2))
> >
> > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't work
> > since `TABLE` and `MODEL` are already key words in calcite used by
> `CREATE
> > TABLE`, `CREATE MODEL`. Changing to `model_name(...)` works and will be
> > treated as a function.
> >
> > So I think
> >
> > SELECT f1, f2, label FROM
> > ML_PREDICT(
> >   input => TABLE `my_data`,
> >   model => my_cat.my_db.classifier_model,
> >   args => DESCRIPTOR(f1, f2))
> > should be fine for now.
> >
> > For the syntax part:
> > 1). Sounds good. We can drop model task and model kind from the
> definition.
> > They can be deduced from the options.
> >
> > 2). Sure. We can add temporary model
> >
> > 3). Make sense. We can use `show create model ` to display all
> > information and `describe model ` to show input/output schema
> >
> > Thanks,
> > Hao
> >
> > On Mon, Mar 25, 2024 at 3:21 PM Hao Li  wrote:
> >
> >> Hi Ahmed,
> >>
> >> Looks like the feature freeze time for 1.20 release is June 15th. We can
> >> definitely get the model DDL into 1.20. For predict and evaluate
> functions,
> >> if we can't get into the 1.20 release, we can get them into the 1.21
> >> release for sure.
> >>
> >> Thanks,
> >> Hao
> >>
> >>
> >>
> >> On Mon, Mar 25, 2024 at 1:25 AM Timo Walther 
> wrote:
> >>
> >>> Hi Jark and Hao,
> >>>
> >>> thanks for the information, Jark! Great that the Calcite community
> >>> already fixed the problem for us. +1 to adopt the 

[jira] [Created] (FLINK-34940) LeaderContender implementations handle invalid state

2024-03-26 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34940:
-

 Summary: LeaderContender implementations handle invalid state
 Key: FLINK-34940
 URL: https://issues.apache.org/jira/browse/FLINK-34940
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Reporter: Matthias Pohl


Currently, LeaderContender implementations (e.g. see 
[ResourceManagerServiceImplTest#grantLeadership_withExistingLeader_waitTerminationOfExistingLeader|https://github.com/apache/flink/blob/master/flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImplTest.java#L219])
 allow the handling of leader events of the same type happening after each 
other which shouldn't be the case.

Two subsequent leadership grants indicate that the leading instance which 
received the leadership grant again missed the leadership revocation event 
causing an invalid state of the overall deployment (i.e. split brain scenario). 
We should fail fatally in these scenarios rather than handling them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-26 Thread Timo Walther

Hi Hao,

> `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't
> work since `TABLE` and `MODEL` are already key words

This argument doesn't count. The parser supports introducing keywords 
that are still non-reserved. For example, this enables using "key" for 
both primary key and a column name:


CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED)
WITH ('connector' = 'datagen');

SELECT i AS key FROM t;

I'm sure we will introduce `TABLE(my_data)` eventually as this is what 
the standard dictates. But for now, let's use the most compact syntax 
possible which is also in sync with Oracle.


TLDR: We allow identifiers as arguments for PTFs which are expanded with 
catalog and database if necessary. Those identifier arguments translate 
to catalog lookups for table and models. The ML_ functions will make 
sure that the arguments are of correct type model or table.


SELECT f1, f2, label FROM
  ML_PREDICT(
input => `my_data`,
model => `my_cat`.`my_db`.`classifier_model`,
args => DESCRIPTOR(f1, f2));

So this will allow us to also use in the future:

SELECT * FROM poly_func(table1);

Same support as Oracle [1]. Very concise.

Let me know when you updated the FLIP for a final review before voting.

Do others have additional objections?

Regards,
Timo

[1] 
https://livesql.oracle.com/apex/livesql/file/content_HQK7TYEO0NHSJCDY3LN2ERDV6.html




On 25.03.24 23:40, Hao Li wrote:

Hi Timo,


Please double check if this is implementable with the current stack. I

fear the parser or validator might not like the "identifier" argument?

I checked this, currently the validator throws an exception trying to get
the full qualifier name for `classifier_model`. But since
`SqlValidatorImpl` is implemented in Flink, we should be able to fix this.
The only caveator is if not full model path is provided,
the qualifier is interpreted as a column. We should be able to special
handle this by rewriting the `ml_predict` function to add the catalog and
database name in `FlinkCalciteSqlValidator` though.


SELECT f1, f2, label FROM

ML_PREDICT(
  TABLE `my_data`,
  my_cat.my_db.classifier_model,
  DESCRIPTOR(f1, f2))

SELECT f1, f2, label FROM
ML_PREDICT(
  input => TABLE `my_data`,
  model => my_cat.my_db.classifier_model,
  args => DESCRIPTOR(f1, f2))

I verified these can be parsed. The problem is in validator for qualifier
as mentioned above.


So the safest option would be the long-term solution:


SELECT f1, f2, label FROM
ML_PREDICT(
  input => TABLE(my_data),
  model => MODEL(my_cat.my_db.classifier_model),
  args => DESCRIPTOR(f1, f2))

`TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't work
since `TABLE` and `MODEL` are already key words in calcite used by `CREATE
TABLE`, `CREATE MODEL`. Changing to `model_name(...)` works and will be
treated as a function.

So I think

SELECT f1, f2, label FROM
ML_PREDICT(
  input => TABLE `my_data`,
  model => my_cat.my_db.classifier_model,
  args => DESCRIPTOR(f1, f2))
should be fine for now.

For the syntax part:
1). Sounds good. We can drop model task and model kind from the definition.
They can be deduced from the options.

2). Sure. We can add temporary model

3). Make sense. We can use `show create model ` to display all
information and `describe model ` to show input/output schema

Thanks,
Hao

On Mon, Mar 25, 2024 at 3:21 PM Hao Li  wrote:


Hi Ahmed,

Looks like the feature freeze time for 1.20 release is June 15th. We can
definitely get the model DDL into 1.20. For predict and evaluate functions,
if we can't get into the 1.20 release, we can get them into the 1.21
release for sure.

Thanks,
Hao



On Mon, Mar 25, 2024 at 1:25 AM Timo Walther  wrote:


Hi Jark and Hao,

thanks for the information, Jark! Great that the Calcite community
already fixed the problem for us. +1 to adopt the simplified syntax
asap. Maybe even before we upgrade Calcite (i.e. copy over classes), if
upgrading Calcite is too much work right now?

  > Is `DESCRIPTOR` a must in the syntax?

Yes, we should still stick to the standard as much as possible and all
vendors use DESCRIPTOR/COLUMNS for distinuishing columns vs. literal
arguments. So the final syntax of this discussion would be:


SELECT f1, f2, label FROM
ML_PREDICT(TABLE `my_data`, `classifier_model`, DESCRIPTOR(f1, f2))

SELECT * FROM
ML_EVALUATE(TABLE `eval_data`, `classifier_model`, DESCRIPTOR(f1, f2))

Please double check if this is implementable with the current stack. I
fear the parser or validator might not like the "identifier" argument?

Make sure that also these variations are supported:

SELECT f1, f2, label FROM
ML_PREDICT(
  TABLE `my_data`,
  my_cat.my_db.classifier_model,
  DESCRIPTOR(f1, f2))

SELECT f1, f2, label FROM
ML_PREDICT(
  input => TABLE `my_data`,
  model => my_cat.my_db.classifier_model,
  args => DESCRIPTOR(f1, f2))

It might be safer and more future proof to wrap a MODEL() 

[DISCUSS] FLIP-XXX: Introduce Flink SQL variables

2024-03-26 Thread Ferenc Csaky
Hello devs,

I would like to start a discussion about FLIP-XXX: Introduce Flink SQL 
variables [1].

The main motivation behing this change is to be able to abstract Flink SQL from
environment-specific configuration and provide a way to carry jobs between
environments (e.g. dev-stage-prod) without the need to make changes in the code.
It can also be a way to decouple sensitive information from the job code, or 
help
with redundant literals.

The main decision regarding the proposed solution is to handle the variable 
resolution
as early as possible on the given string statement, so the whole operation is 
an easy and
lightweight string replace. But this approach introduces some limitations as 
well:

- The executed SQL will always be the unresolved, raw string, so in case of 
secrets
a DESC operation would show them.
- Changing the value of a variable can break code that uses that variable.

For more details, please check the FLIP [1]. There is also a stale Jira about 
this [2].

Looking forward to any comments and opinions!

Thanks,
Ferenc

[1] 
https://docs.google.com/document/d/1-eUz-PBCdqNggG_irDT0X7fdL61ysuHOaWnrkZHb5Hc/edit?usp=sharing
[2] https://issues.apache.org/jira/browse/FLINK-17377

[jira] [Created] (FLINK-34939) Harden TestingLeaderElection

2024-03-26 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34939:
-

 Summary: Harden TestingLeaderElection
 Key: FLINK-34939
 URL: https://issues.apache.org/jira/browse/FLINK-34939
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


The {{TestingLeaderElection}} implementation does not follow the interface 
contract of {{LeaderElection}} in all of its facets (e.g. leadership acquire 
and revocation events should be alternating).

This issue is about hardening {{LeaderElection}} contract in the test 
implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34938) Incorrect behaviour for comparison functions

2024-03-26 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-34938:


 Summary: Incorrect behaviour for comparison functions
 Key: FLINK-34938
 URL: https://issues.apache.org/jira/browse/FLINK-34938
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.20.0


There are a few issues with comparison functions.

Some versions throw:
{code}
Incomparable types: TIMESTAMP_LTZ(3) NOT NULL and TIMESTAMP(3)
{code}

Results of some depend on the comparison order, because the least restrictive 
precision is not calculated correctly.

E.g.

{code}
final Instant ltz3 = Instant.ofEpochMilli(1_123);
final Instant ltz0 = Instant.ofEpochMilli(1_000);

TestSetSpec.forFunction(BuiltInFunctionDefinitions.EQUALS)
.onFieldsWithData(ltz3, ltz0)
.andDataTypes(TIMESTAMP_LTZ(3), TIMESTAMP_LTZ(0))
// compare same type, but different precision, should 
always adjust to the higher precision
.testResult($("f0").isEqual($("f1")), "f0 = f1", false, 
DataTypes.BOOLEAN())
.testResult($("f1").isEqual($("f0")), "f1 = f0", true 
/* but should be false */, DataTypes.BOOLEAN())
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34937) Apache Infra GHA policy update

2024-03-26 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34937:
-

 Summary: Apache Infra GHA policy update
 Key: FLINK-34937
 URL: https://issues.apache.org/jira/browse/FLINK-34937
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


There is a policy update [announced in the infra 
ML|https://lists.apache.org/thread/6qw21x44q88rc3mhkn42jgjjw94rsvb1] which 
asked Apache projects to limit the number of runners per job. Additionally, the 
[GHA policy|https://infra.apache.org/github-actions-policy.html] is referenced 
which I wasn't aware of when working on the action workflow.

This issue is about applying the policy to the Flink GHA workflows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Bug report for reading Hive table as streaming source.

2024-03-26 Thread Xiaolong Wang
Hi,

I found a weird bug when reading a Hive table as a streaming source.

In summary, if the first partition is not time related, then the Hive table
cannot be read as a streaming source.

e.g.

I've a Hive table in the definition of

```
CREATE TABLE article (
id BIGINT,
edition STRING,
dt STRING,
hh STRING
)
PARTITIONED BY (edition, dt, hh)
USING orc;
```
Then I try to query it as a streaming source:

```
INSERT INTO kafka_sink
SELECT id
FROM article /*+ OPTIONS('streaming-source.enable' = 'true',
'streaming-source.partition-order' = 'partition-name',
'streaming-source.consume-start-offset' =
'edition=en_US/dt=2024-03-26/hh=00') */
```

And I see no output in the `kafka_sink`.

Then I defined an external table pointing to the same path but has no
`edition` partition,

```
CREATE TABLE en_article (
id BIGINT,
edition STRING,
dt STRING,
hh STRING
)
PARTITIONED BY (edition, dt, hh)
LOCATION 's3://xxx/article/edition=en_US'
USING orc;
```

And insert with the following statement:

```
INSERT INTO kafka_sink
SELECT id
FROM en_article /*+ OPTIONS('streaming-source.enable' = 'true',
'streaming-source.partition-order' = 'partition-name',
'streaming-source.consume-start-offset' = 'dt=2024-03-26/hh=00') */
```

The data is sinked.


[jira] [Created] (FLINK-34936) Register shared state files to FileMergingSnapshotManager

2024-03-26 Thread Jinzhong Li (Jira)
Jinzhong Li created FLINK-34936:
---

 Summary: Register shared state files to FileMergingSnapshotManager
 Key: FLINK-34936
 URL: https://issues.apache.org/jira/browse/FLINK-34936
 Project: Flink
  Issue Type: New Feature
Reporter: Jinzhong Li
 Fix For: 1.20.0


The shared state files should be registered into the 
FileMergingSnapshotManager, so that these files can be properly cleaned up  
when checkpoint aborted/subsumed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Flink Website Menu Adjustment

2024-03-26 Thread Hang Ruan
+1 for the proposal.

Best,
Hang

Hangxiang Yu  于2024年3月26日周二 13:40写道:

> Thanks Zhongqiang for driving this.
> +1 for the proposal.
>
> On Tue, Mar 26, 2024 at 1:36 PM Shawn Huang  wrote:
>
> > +1 for the proposal
> >
> > Best,
> > Shawn Huang
> >
> >
> > Hongshun Wang  于2024年3月26日周二 11:56写道:
> >
> > > +1 for the proposal
> > >
> > > Best Regards,
> > > Hongshun Wang
> > >
> > > On Tue, Mar 26, 2024 at 11:37 AM gongzhongqiang <
> > gongzhongqi...@apache.org
> > > >
> > > wrote:
> > >
> > > > Hi Martijn,
> > > >
> > > > Thank you for your feedback.
> > > >
> > > > I agree with your point that we should make a one-time update to the
> > > menu,
> > > > rather than continuously updating it. This will be done unless some
> > > > sub-projects are moved or archived.
> > > >
> > > > Best regards,
> > > >
> > > > Zhongqiang Gong
> > > >
> > > >
> > > > Martijn Visser  于2024年3月25日周一 23:35写道:
> > > >
> > > > > Hi Zhongqiang Gong,
> > > > >
> > > > > Are you suggesting to continuously update the menu based on the
> > number
> > > of
> > > > > releases, or just this one time? I wouldn't be in favor of
> > continuously
> > > > > updating: returning customers expect a certain order in the menu,
> > and I
> > > > > don't see a lot of value in continuously changing that. I do think
> > that
> > > > the
> > > > > order that you have currently proposed is better then the one we
> have
> > > > right
> > > > > now, so I would +1 a one-time update but not a continuously
> updating
> > > > order.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn
> > > > >
> > > > > On Mon, Mar 25, 2024 at 4:15 PM Yanquan Lv 
> > > wrote:
> > > > >
> > > > > > +1 for this proposal.
> > > > > >
> > > > > > gongzhongqiang  于2024年3月25日周一
> 15:49写道:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > I'd like to start a discussion on adjusting the Flink website
> [1]
> > > > menu
> > > > > to
> > > > > > > improve accuracy and usability.While migrating Flink CDC
> > > > documentation
> > > > > > > to the website, I found outdated links, need to review and
> update
> > > > menus
> > > > > > > for the most relevant information for our users.
> > > > > > >
> > > > > > >
> > > > > > > Proposal:
> > > > > > >
> > > > > > > - Remove Paimon [2] from the "Getting Started" and
> > "Documentation"
> > > > > menus:
> > > > > > > Paimon [2] is now an independent top project of ASF. CC:
> jingsong
> > > > lees
> > > > > > >
> > > > > > > - Sort the projects in the subdirectory by the activity of the
> > > > > projects.
> > > > > > > Here I list the number of releases for each project in the past
> > > year.
> > > > > > >
> > > > > > > Flink Kubernetes Operator : 7
> > > > > > > Flink CDC : 5
> > > > > > > Flink ML  : 2
> > > > > > > Flink Stateful Functions : 1
> > > > > > >
> > > > > > >
> > > > > > > Expected Outcome :
> > > > > > >
> > > > > > > - Menu "Getting Started"
> > > > > > >
> > > > > > > Before:
> > > > > > >
> > > > > > > With Flink
> > > > > > >
> > > > > > > With Flink Stateful Functions
> > > > > > >
> > > > > > > With Flink ML
> > > > > > >
> > > > > > > With Flink Kubernetes Operator
> > > > > > >
> > > > > > > With Paimon(incubating) (formerly Flink Table Store)
> > > > > > >
> > > > > > > With Flink CDC
> > > > > > >
> > > > > > > Training Course
> > > > > > >
> > > > > > >
> > > > > > > After:
> > > > > > >
> > > > > > > With Flink
> > > > > > > With Flink Kubernetes Operator
> > > > > > >
> > > > > > > With Flink CDC
> > > > > > >
> > > > > > > With Flink ML
> > > > > > >
> > > > > > > With Flink Stateful Functions
> > > > > > >
> > > > > > > Training Course
> > > > > > >
> > > > > > >
> > > > > > > - Menu "Documentation" will same with "Getting Started"
> > > > > > >
> > > > > > >
> > > > > > > I look forward to hearing your thoughts and suggestions on this
> > > > > proposal.
> > > > > > >
> > > > > > > [1] https://flink.apache.org/
> > > > > > > [2] https://github.com/apache/incubator-paimon
> > > > > > > [3] https://github.com/apache/flink-statefun
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Best regards,
> > > > > > >
> > > > > > > Zhongqiang Gong
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> Best,
> Hangxiang.
>