date:20240320

Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Zakelly Lan

Congratulations!


Best,
Zakelly

On Thu, Mar 21, 2024 at 12:05 PM weijie guo 
wrote:

> Congratulations! Well done.
>
>
> Best regards,
>
> Weijie
>
>
> Feng Jin  于2024年3月21日周四 11:40写道：
>
>> Congratulations!
>>
>>
>> Best,
>> Feng
>>
>>
>> On Thu, Mar 21, 2024 at 11:37 AM Ron liu  wrote:
>>
>> > Congratulations!
>> >
>> > Best,
>> > Ron
>> >
>> > Jark Wu  于2024年3月21日周四 10:46写道：
>> >
>> > > Congratulations and welcome!
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > On Thu, 21 Mar 2024 at 10:35, Rui Fan <1996fan...@gmail.com> wrote:
>> > >
>> > > > Congratulations!
>> > > >
>> > > > Best,
>> > > > Rui
>> > > >
>> > > > On Thu, Mar 21, 2024 at 10:25 AM Hang Ruan 
>> > > wrote:
>> > > >
>> > > > > Congrattulations!
>> > > > >
>> > > > > Best,
>> > > > > Hang
>> > > > >
>> > > > > Lincoln Lee  于2024年3月21日周四 09:54写道：
>> > > > >
>> > > > >>
>> > > > >> Congrats, thanks for the great work!
>> > > > >>
>> > > > >>
>> > > > >> Best,
>> > > > >> Lincoln Lee
>> > > > >>
>> > > > >>
>> > > > >> Peter Huang  于2024年3月20日周三 22:48写道：
>> > > > >>
>> > > > >>> Congratulations
>> > > > >>>
>> > > > >>>
>> > > > >>> Best Regards
>> > > > >>> Peter Huang
>> > > > >>>
>> > > > >>> On Wed, Mar 20, 2024 at 6:56 AM Huajie Wang > >
>> > > > wrote:
>> > > > >>>
>> > > > 
>> > > >  Congratulations
>> > > > 
>> > > > 
>> > > > 
>> > > >  Best,
>> > > >  Huajie Wang
>> > > > 
>> > > > 
>> > > > 
>> > > >  Leonard Xu  于2024年3月20日周三 21:36写道：
>> > > > 
>> > > > > Hi devs and users,
>> > > > >
>> > > > > We are thrilled to announce that the donation of Flink CDC as
>> a
>> > > > > sub-project of Apache Flink has completed. We invite you to
>> > explore
>> > > > the new
>> > > > > resources available:
>> > > > >
>> > > > > - GitHub Repository: https://github.com/apache/flink-cdc
>> > > > > - Flink CDC Documentation:
>> > > > > https://nightlies.apache.org/flink/flink-cdc-docs-stable
>> > > > >
>> > > > > After Flink community accepted this donation[1], we have
>> > completed
>> > > > > software copyright signing, code repo migration, code cleanup,
>> > > > website
>> > > > > migration, CI migration and github issues migration etc.
>> > > > > Here I am particularly grateful to Hang Ruan, Zhongqaing Gong,
>> > > > > Qingsheng Ren, Jiabao Sun, LvYanquan, loserwang1024 and other
>> > > > contributors
>> > > > > for their contributions and help during this process！
>> > > > >
>> > > > >
>> > > > > For all previous contributors: The contribution process has
>> > > slightly
>> > > > > changed to align with the main Flink project. To report bugs
>> or
>> > > > suggest new
>> > > > > features, please open tickets
>> > > > > Apache Jira (https://issues.apache.org/jira).  Note that we
>> will
>> > > no
>> > > > > longer accept GitHub issues for these purposes.
>> > > > >
>> > > > >
>> > > > > Welcome to explore the new repository and documentation. Your
>> > > > feedback
>> > > > > and contributions are invaluable as we continue to improve
>> Flink
>> > > CDC.
>> > > > >
>> > > > > Thanks everyone for your support and happy exploring Flink
>> CDC!
>> > > > >
>> > > > > Best，
>> > > > > Leonard
>> > > > > [1]
>> > > https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread weijie guo

Congratulations! Well done.


Best regards,

Weijie


Feng Jin  于2024年3月21日周四 11:40写道：

> Congratulations!
>
>
> Best,
> Feng
>
>
> On Thu, Mar 21, 2024 at 11:37 AM Ron liu  wrote:
>
> > Congratulations!
> >
> > Best,
> > Ron
> >
> > Jark Wu  于2024年3月21日周四 10:46写道：
> >
> > > Congratulations and welcome!
> > >
> > > Best,
> > > Jark
> > >
> > > On Thu, 21 Mar 2024 at 10:35, Rui Fan <1996fan...@gmail.com> wrote:
> > >
> > > > Congratulations!
> > > >
> > > > Best,
> > > > Rui
> > > >
> > > > On Thu, Mar 21, 2024 at 10:25 AM Hang Ruan 
> > > wrote:
> > > >
> > > > > Congrattulations!
> > > > >
> > > > > Best,
> > > > > Hang
> > > > >
> > > > > Lincoln Lee  于2024年3月21日周四 09:54写道：
> > > > >
> > > > >>
> > > > >> Congrats, thanks for the great work!
> > > > >>
> > > > >>
> > > > >> Best,
> > > > >> Lincoln Lee
> > > > >>
> > > > >>
> > > > >> Peter Huang  于2024年3月20日周三 22:48写道：
> > > > >>
> > > > >>> Congratulations
> > > > >>>
> > > > >>>
> > > > >>> Best Regards
> > > > >>> Peter Huang
> > > > >>>
> > > > >>> On Wed, Mar 20, 2024 at 6:56 AM Huajie Wang 
> > > > wrote:
> > > > >>>
> > > > 
> > > >  Congratulations
> > > > 
> > > > 
> > > > 
> > > >  Best,
> > > >  Huajie Wang
> > > > 
> > > > 
> > > > 
> > > >  Leonard Xu  于2024年3月20日周三 21:36写道：
> > > > 
> > > > > Hi devs and users,
> > > > >
> > > > > We are thrilled to announce that the donation of Flink CDC as a
> > > > > sub-project of Apache Flink has completed. We invite you to
> > explore
> > > > the new
> > > > > resources available:
> > > > >
> > > > > - GitHub Repository: https://github.com/apache/flink-cdc
> > > > > - Flink CDC Documentation:
> > > > > https://nightlies.apache.org/flink/flink-cdc-docs-stable
> > > > >
> > > > > After Flink community accepted this donation[1], we have
> > completed
> > > > > software copyright signing, code repo migration, code cleanup,
> > > > website
> > > > > migration, CI migration and github issues migration etc.
> > > > > Here I am particularly grateful to Hang Ruan, Zhongqaing Gong,
> > > > > Qingsheng Ren, Jiabao Sun, LvYanquan, loserwang1024 and other
> > > > contributors
> > > > > for their contributions and help during this process！
> > > > >
> > > > >
> > > > > For all previous contributors: The contribution process has
> > > slightly
> > > > > changed to align with the main Flink project. To report bugs or
> > > > suggest new
> > > > > features, please open tickets
> > > > > Apache Jira (https://issues.apache.org/jira).  Note that we
> will
> > > no
> > > > > longer accept GitHub issues for these purposes.
> > > > >
> > > > >
> > > > > Welcome to explore the new repository and documentation. Your
> > > > feedback
> > > > > and contributions are invaluable as we continue to improve
> Flink
> > > CDC.
> > > > >
> > > > > Thanks everyone for your support and happy exploring Flink CDC!
> > > > >
> > > > > Best，
> > > > > Leonard
> > > > > [1]
> > > https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
> > > > >
> > > > >
> > > >
> > >
> >
>

[jira] [Created] (FLINK-34900) Check compatibility for classes in flink-core-api that skip japicmp

2024-03-20 Thread Weijie Guo (Jira)

Weijie Guo created FLINK-34900:
--

 Summary: Check compatibility for classes in flink-core-api that 
skip japicmp
 Key: FLINK-34900
 URL: https://issues.apache.org/jira/browse/FLINK-34900
 Project: Flink
  Issue Type: Sub-task
  Components: API / Core
Reporter: Weijie Guo
Assignee: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34899) Remove all classes that skip the japicmp check for flink-core-api

2024-03-20 Thread Weijie Guo (Jira)

Weijie Guo created FLINK-34899:
--

 Summary: Remove all classes that skip the japicmp check for 
flink-core-api
 Key: FLINK-34899
 URL: https://issues.apache.org/jira/browse/FLINK-34899
 Project: Flink
  Issue Type: Sub-task
  Components: API / Core
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Feng Jin

Congratulations!


Best,
Feng


On Thu, Mar 21, 2024 at 11:37 AM Ron liu  wrote:

> Congratulations!
>
> Best,
> Ron
>
> Jark Wu  于2024年3月21日周四 10:46写道：
>
> > Congratulations and welcome!
> >
> > Best,
> > Jark
> >
> > On Thu, 21 Mar 2024 at 10:35, Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Congratulations!
> > >
> > > Best,
> > > Rui
> > >
> > > On Thu, Mar 21, 2024 at 10:25 AM Hang Ruan 
> > wrote:
> > >
> > > > Congrattulations!
> > > >
> > > > Best,
> > > > Hang
> > > >
> > > > Lincoln Lee  于2024年3月21日周四 09:54写道：
> > > >
> > > >>
> > > >> Congrats, thanks for the great work!
> > > >>
> > > >>
> > > >> Best,
> > > >> Lincoln Lee
> > > >>
> > > >>
> > > >> Peter Huang  于2024年3月20日周三 22:48写道：
> > > >>
> > > >>> Congratulations
> > > >>>
> > > >>>
> > > >>> Best Regards
> > > >>> Peter Huang
> > > >>>
> > > >>> On Wed, Mar 20, 2024 at 6:56 AM Huajie Wang 
> > > wrote:
> > > >>>
> > > 
> > >  Congratulations
> > > 
> > > 
> > > 
> > >  Best,
> > >  Huajie Wang
> > > 
> > > 
> > > 
> > >  Leonard Xu  于2024年3月20日周三 21:36写道：
> > > 
> > > > Hi devs and users,
> > > >
> > > > We are thrilled to announce that the donation of Flink CDC as a
> > > > sub-project of Apache Flink has completed. We invite you to
> explore
> > > the new
> > > > resources available:
> > > >
> > > > - GitHub Repository: https://github.com/apache/flink-cdc
> > > > - Flink CDC Documentation:
> > > > https://nightlies.apache.org/flink/flink-cdc-docs-stable
> > > >
> > > > After Flink community accepted this donation[1], we have
> completed
> > > > software copyright signing, code repo migration, code cleanup,
> > > website
> > > > migration, CI migration and github issues migration etc.
> > > > Here I am particularly grateful to Hang Ruan, Zhongqaing Gong,
> > > > Qingsheng Ren, Jiabao Sun, LvYanquan, loserwang1024 and other
> > > contributors
> > > > for their contributions and help during this process！
> > > >
> > > >
> > > > For all previous contributors: The contribution process has
> > slightly
> > > > changed to align with the main Flink project. To report bugs or
> > > suggest new
> > > > features, please open tickets
> > > > Apache Jira (https://issues.apache.org/jira).  Note that we will
> > no
> > > > longer accept GitHub issues for these purposes.
> > > >
> > > >
> > > > Welcome to explore the new repository and documentation. Your
> > > feedback
> > > > and contributions are invaluable as we continue to improve Flink
> > CDC.
> > > >
> > > > Thanks everyone for your support and happy exploring Flink CDC!
> > > >
> > > > Best，
> > > > Leonard
> > > > [1]
> > https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
> > > >
> > > >
> > >
> >
>

Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Ron liu

Congratulations!

Best,
Ron

Jark Wu  于2024年3月21日周四 10:46写道：

> Congratulations and welcome!
>
> Best,
> Jark
>
> On Thu, 21 Mar 2024 at 10:35, Rui Fan <1996fan...@gmail.com> wrote:
>
> > Congratulations!
> >
> > Best,
> > Rui
> >
> > On Thu, Mar 21, 2024 at 10:25 AM Hang Ruan 
> wrote:
> >
> > > Congrattulations!
> > >
> > > Best,
> > > Hang
> > >
> > > Lincoln Lee  于2024年3月21日周四 09:54写道：
> > >
> > >>
> > >> Congrats, thanks for the great work!
> > >>
> > >>
> > >> Best,
> > >> Lincoln Lee
> > >>
> > >>
> > >> Peter Huang  于2024年3月20日周三 22:48写道：
> > >>
> > >>> Congratulations
> > >>>
> > >>>
> > >>> Best Regards
> > >>> Peter Huang
> > >>>
> > >>> On Wed, Mar 20, 2024 at 6:56 AM Huajie Wang 
> > wrote:
> > >>>
> > 
> >  Congratulations
> > 
> > 
> > 
> >  Best,
> >  Huajie Wang
> > 
> > 
> > 
> >  Leonard Xu  于2024年3月20日周三 21:36写道：
> > 
> > > Hi devs and users,
> > >
> > > We are thrilled to announce that the donation of Flink CDC as a
> > > sub-project of Apache Flink has completed. We invite you to explore
> > the new
> > > resources available:
> > >
> > > - GitHub Repository: https://github.com/apache/flink-cdc
> > > - Flink CDC Documentation:
> > > https://nightlies.apache.org/flink/flink-cdc-docs-stable
> > >
> > > After Flink community accepted this donation[1], we have completed
> > > software copyright signing, code repo migration, code cleanup,
> > website
> > > migration, CI migration and github issues migration etc.
> > > Here I am particularly grateful to Hang Ruan, Zhongqaing Gong,
> > > Qingsheng Ren, Jiabao Sun, LvYanquan, loserwang1024 and other
> > contributors
> > > for their contributions and help during this process！
> > >
> > >
> > > For all previous contributors: The contribution process has
> slightly
> > > changed to align with the main Flink project. To report bugs or
> > suggest new
> > > features, please open tickets
> > > Apache Jira (https://issues.apache.org/jira).  Note that we will
> no
> > > longer accept GitHub issues for these purposes.
> > >
> > >
> > > Welcome to explore the new repository and documentation. Your
> > feedback
> > > and contributions are invaluable as we continue to improve Flink
> CDC.
> > >
> > > Thanks everyone for your support and happy exploring Flink CDC!
> > >
> > > Best，
> > > Leonard
> > > [1]
> https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
> > >
> > >
> >
>

Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-20 Thread Jacky Lau

Congratulations!

Best,
Jacky Lau

Hongshun Wang  于2024年3月21日周四 10:32写道：

> Congratulations!
>
> Best,
> Hongshun
>
> On Tue, Mar 19, 2024 at 3:12 PM Shawn Huang  wrote:
>
> > Congratulations!
> >
> > Best,
> > Shawn Huang
> >
> >
> > Xuannan Su  于2024年3月19日周二 14:40写道：
> >
> > > Congratulations! Thanks for all the great work!
> > >
> > > Best regards,
> > > Xuannan
> > >
> > > On Tue, Mar 19, 2024 at 1:31 PM Yu Li  wrote:
> > > >
> > > > Congrats and thanks all for the efforts!
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > > On Tue, 19 Mar 2024 at 11:51, gongzhongqiang <
> > gongzhongqi...@apache.org>
> > > wrote:
> > > > >
> > > > > Congrats! Thanks to everyone involved!
> > > > >
> > > > > Best,
> > > > > Zhongqiang Gong
> > > > >
> > > > > Lincoln Lee  于2024年3月18日周一 16:27写道：
> > > > >>
> > > > >> The Apache Flink community is very happy to announce the release
> of
> > > Apache
> > > > >> Flink 1.19.0, which is the fisrt release for the Apache Flink 1.19
> > > series.
> > > > >>
> > > > >> Apache Flink® is an open-source stream processing framework for
> > > > >> distributed, high-performing, always-available, and accurate data
> > > streaming
> > > > >> applications.
> > > > >>
> > > > >> The release is available for download at:
> > > > >> https://flink.apache.org/downloads.html
> > > > >>
> > > > >> Please check out the release blog post for an overview of the
> > > improvements
> > > > >> for this bugfix release:
> > > > >>
> > >
> >
> https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/
> > > > >>
> > > > >> The full release notes are available in Jira:
> > > > >>
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353282
> > > > >>
> > > > >> We would like to thank all contributors of the Apache Flink
> > community
> > > who
> > > > >> made this release possible!
> > > > >>
> > > > >>
> > > > >> Best,
> > > > >> Yun, Jing, Martijn and Lincoln
> > >
> >
>

Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines

2024-03-20 Thread Feng Jin

Hi Ron and Lincoln

Thanks for driving this discussion.  I believe it will greatly improve the
convenience of managing user real-time pipelines.

I have some questions.

*Regarding Limitations of Dynamic Table:*

> Does not support modifying the select statement after the dynamic table
is created.

Although currently we restrict users from modifying the query, I wonder if
we can provide a better way to help users rebuild it without affecting
downstream OLAP queries.


*Regarding the management of background jobs:*

1. From the documentation, the definitions SQL and job information are
stored in the Catalog. Does this mean that if a system needs to adapt to
Dynamic Tables, it also needs to store Flink's job information in the
corresponding system?
For example, does MySQL's Catalog need to store flink job information as
well?


2. Users still need to consider how much memory is being used, how large
the concurrency is, which type of state backend is being used, and may need
to set TTL expiration.


*Regarding the Refresh Part:*

> If the refresh mode is continuous and a background job is running,
caution should be taken with the refresh command as it can lead to
inconsistent data.

When we submit a refresh command, can we help users detect if there are any
running jobs and automatically stop them before executing the refresh
command? Then wait for it to complete before restarting the background
streaming job?

Best,
Feng

On Tue, Mar 19, 2024 at 9:40 PM Lincoln Lee  wrote:

> Hi Yun,
>
> Thank you very much for your valuable input!
>
> Incremental mode is indeed an attractive idea, we have also discussed
> this, but in the current design,
>
> we first provided two refresh modes: CONTINUOUS and
> FULL. Incremental mode can be introduced
>
> once the execution layer has the capability.
>
> My answer for the two questions:
>
> 1.
> Yes, cascading is a good question.  Current proposal provides a
> freshness that defines a dynamic
> table relative to the base table’s lag. If users need to consider the
> end-to-end freshness of multiple
> cascaded dynamic tables, he can manually split them for now. Of
> course, how to let multiple cascaded
>  or dependent dynamic tables complete the freshness definition in a
> simpler way, I think it can be
> extended in the future.
>
> 2.
> Cascading refresh is also a part we focus on discussing. In this flip,
> we hope to focus as much as
> possible on the core features (as it already involves a lot things),
> so we did not directly introduce related
>  syntax. However, based on the current design, combined with the
> catalog and lineage, theoretically,
> users can also finish the cascading refresh.
>
>
> Best,
> Lincoln Lee
>
>
> Yun Tang  于2024年3月19日周二 13:45写道：
>
> > Hi Lincoln,
> >
> > Thanks for driving this discussion, and I am so excited to see this topic
> > being discussed in the Flink community!
> >
> > From my point of view, instead of the work of unifying streaming and
> batch
> > in DataStream API [1], this FLIP actually could make users benefit from
> one
> > engine to rule batch & streaming.
> >
> > If we treat this FLIP as an open-source implementation of Snowflake's
> > dynamic tables [2], we still lack an incremental refresh mode to make the
> > ETL near real-time with a much cheaper computation cost. However, I think
> > this could be done under the current design by introducing another
> refresh
> > mode in the future. Although the extra work of incremental view
> maintenance
> > would be much larger.
> >
> > For the FLIP itself, I have several questions below:
> >
> > 1. It seems this FLIP does not consider the lag of refreshes across ETL
> > layers from ODS ---> DWD ---> APP [3]. We currently only consider the
> > scheduler interval, which means we cannot use lag to automatically
> schedule
> > the upfront micro-batch jobs to do the work.
> > 2. To support the automagical refreshes, we should consider the lineage
> in
> > the catalog or somewhere else.
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-134%3A+Batch+execution+for+the+DataStream+API
> > [2] https://docs.snowflake.com/en/user-guide/dynamic-tables-about
> > [3] https://docs.snowflake.com/en/user-guide/dynamic-tables-refresh
> >
> > Best
> > Yun Tang
> >
> >
> > 
> > From: Lincoln Lee 
> > Sent: Thursday, March 14, 2024 14:35
> > To: dev@flink.apache.org 
> > Subject: Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for
> > Simplifying Data Pipelines
> >
> > Hi Jing,
> >
> > Thanks for your attention to this flip! I'll try to answer the following
> > questions.
> >
> > > 1. How to define query of dynamic table?
> > > Use flink sql or introducing new syntax?
> > > If use flink sql, how to handle the difference in SQL between streaming
> > and
> > > batch processing?
> > > For example, a query including window aggregate based on processing
> time?
> > > or a query including global order by?
> >
> > Similar to `CREATE TABLE AS query`,

Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Jark Wu

Congratulations and welcome!

Best,
Jark

On Thu, 21 Mar 2024 at 10:35, Rui Fan <1996fan...@gmail.com> wrote:

> Congratulations!
>
> Best,
> Rui
>
> On Thu, Mar 21, 2024 at 10:25 AM Hang Ruan  wrote:
>
> > Congrattulations!
> >
> > Best,
> > Hang
> >
> > Lincoln Lee  于2024年3月21日周四 09:54写道：
> >
> >>
> >> Congrats, thanks for the great work!
> >>
> >>
> >> Best,
> >> Lincoln Lee
> >>
> >>
> >> Peter Huang  于2024年3月20日周三 22:48写道：
> >>
> >>> Congratulations
> >>>
> >>>
> >>> Best Regards
> >>> Peter Huang
> >>>
> >>> On Wed, Mar 20, 2024 at 6:56 AM Huajie Wang 
> wrote:
> >>>
> 
>  Congratulations
> 
> 
> 
>  Best,
>  Huajie Wang
> 
> 
> 
>  Leonard Xu  于2024年3月20日周三 21:36写道：
> 
> > Hi devs and users,
> >
> > We are thrilled to announce that the donation of Flink CDC as a
> > sub-project of Apache Flink has completed. We invite you to explore
> the new
> > resources available:
> >
> > - GitHub Repository: https://github.com/apache/flink-cdc
> > - Flink CDC Documentation:
> > https://nightlies.apache.org/flink/flink-cdc-docs-stable
> >
> > After Flink community accepted this donation[1], we have completed
> > software copyright signing, code repo migration, code cleanup,
> website
> > migration, CI migration and github issues migration etc.
> > Here I am particularly grateful to Hang Ruan, Zhongqaing Gong,
> > Qingsheng Ren, Jiabao Sun, LvYanquan, loserwang1024 and other
> contributors
> > for their contributions and help during this process！
> >
> >
> > For all previous contributors: The contribution process has slightly
> > changed to align with the main Flink project. To report bugs or
> suggest new
> > features, please open tickets
> > Apache Jira (https://issues.apache.org/jira).  Note that we will no
> > longer accept GitHub issues for these purposes.
> >
> >
> > Welcome to explore the new repository and documentation. Your
> feedback
> > and contributions are invaluable as we continue to improve Flink CDC.
> >
> > Thanks everyone for your support and happy exploring Flink CDC!
> >
> > Best，
> > Leonard
> > [1] https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
> >
> >
>

Re:Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Xuyang

Cheers!




--

Best！
Xuyang

在 2024-03-21 10:28:45，"Rui Fan" <1996fan...@gmail.com> 写道：
>Congratulations!
>
>Best,
>Rui
>
>On Thu, Mar 21, 2024 at 10:25 AM Hang Ruan  wrote:
>
>> Congrattulations!
>>
>> Best,
>> Hang
>>
>> Lincoln Lee  于2024年3月21日周四 09:54写道：
>>
>>>
>>> Congrats, thanks for the great work!
>>>
>>>
>>> Best,
>>> Lincoln Lee
>>>
>>>
>>> Peter Huang  于2024年3月20日周三 22:48写道：
>>>
 Congratulations


 Best Regards
 Peter Huang

 On Wed, Mar 20, 2024 at 6:56 AM Huajie Wang  wrote:

>
> Congratulations
>
>
>
> Best,
> Huajie Wang
>
>
>
> Leonard Xu  于2024年3月20日周三 21:36写道：
>
>> Hi devs and users,
>>
>> We are thrilled to announce that the donation of Flink CDC as a
>> sub-project of Apache Flink has completed. We invite you to explore the 
>> new
>> resources available:
>>
>> - GitHub Repository: https://github.com/apache/flink-cdc
>> - Flink CDC Documentation:
>> https://nightlies.apache.org/flink/flink-cdc-docs-stable
>>
>> After Flink community accepted this donation[1], we have completed
>> software copyright signing, code repo migration, code cleanup, website
>> migration, CI migration and github issues migration etc.
>> Here I am particularly grateful to Hang Ruan, Zhongqaing Gong,
>> Qingsheng Ren, Jiabao Sun, LvYanquan, loserwang1024 and other 
>> contributors
>> for their contributions and help during this process！
>>
>>
>> For all previous contributors: The contribution process has slightly
>> changed to align with the main Flink project. To report bugs or suggest 
>> new
>> features, please open tickets
>> Apache Jira (https://issues.apache.org/jira).  Note that we will no
>> longer accept GitHub issues for these purposes.
>>
>>
>> Welcome to explore the new repository and documentation. Your feedback
>> and contributions are invaluable as we continue to improve Flink CDC.
>>
>> Thanks everyone for your support and happy exploring Flink CDC!
>>
>> Best，
>> Leonard
>> [1] https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
>>
>>

Re: [NOTICE] Flink CDC is importing GitHub issues into Apache Jira

2024-03-20 Thread Hongshun Wang

Congratulations. Thanks to Qingsheng for driving this job forward.

Best,
Hongshun

On Wed, Mar 20, 2024 at 9:08 PM Qingsheng Ren  wrote:

> FYI: The auto-import has completed. Totally 137 issues are migrated from
> GitHub issue to Apache Jira. Sorry again for flushing your inbox!
>
> Best,
> Qingsheng
>
> On Wed, Mar 20, 2024 at 3:47 PM Qingsheng Ren  wrote:
>
> > Hi devs,
> >
> > We are in the process of finalizing the donation of Flink CDC as a
> > sub-project of Apache Flink. Given that all issues in the Flink project
> are
> > tracked via Jira, we will be migrating open tickets for Flink CDC from
> > GitHub issues to Jira.
> >
> > We plan to trigger the auto-import around 5pm (UTC+8), Mar 20, 2024.
> > During this import process, you may receive automated emails regarding
> the
> > creation of Jira tickets. We apologize for any inconvenience and
> potential
> > overflow in your inbox!
> >
> > We have made efforts to eliminate 500+ outdated and invalid issues.
> > However, we estimate that there are still around 100 issues that are
> > significant and worth retaining.
> >
> > Thanks your understanding and support! And special thanks to community
> > volunteers helping with issue cleanup: Hang Ruan, Zhongqiang Gong, He
> Wang,
> > Xin Gong, Yanquan Lyu and Jiabao Sun
> >
> > Best,
> > Qingsheng
> >
>

Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Rui Fan

Congratulations!

Best,
Rui

On Thu, Mar 21, 2024 at 10:25 AM Hang Ruan  wrote:

> Congrattulations!
>
> Best,
> Hang
>
> Lincoln Lee  于2024年3月21日周四 09:54写道：
>
>>
>> Congrats, thanks for the great work!
>>
>>
>> Best,
>> Lincoln Lee
>>
>>
>> Peter Huang  于2024年3月20日周三 22:48写道：
>>
>>> Congratulations
>>>
>>>
>>> Best Regards
>>> Peter Huang
>>>
>>> On Wed, Mar 20, 2024 at 6:56 AM Huajie Wang  wrote:
>>>

 Congratulations



 Best,
 Huajie Wang



 Leonard Xu  于2024年3月20日周三 21:36写道：

> Hi devs and users,
>
> We are thrilled to announce that the donation of Flink CDC as a
> sub-project of Apache Flink has completed. We invite you to explore the 
> new
> resources available:
>
> - GitHub Repository: https://github.com/apache/flink-cdc
> - Flink CDC Documentation:
> https://nightlies.apache.org/flink/flink-cdc-docs-stable
>
> After Flink community accepted this donation[1], we have completed
> software copyright signing, code repo migration, code cleanup, website
> migration, CI migration and github issues migration etc.
> Here I am particularly grateful to Hang Ruan, Zhongqaing Gong,
> Qingsheng Ren, Jiabao Sun, LvYanquan, loserwang1024 and other contributors
> for their contributions and help during this process！
>
>
> For all previous contributors: The contribution process has slightly
> changed to align with the main Flink project. To report bugs or suggest 
> new
> features, please open tickets
> Apache Jira (https://issues.apache.org/jira).  Note that we will no
> longer accept GitHub issues for these purposes.
>
>
> Welcome to explore the new repository and documentation. Your feedback
> and contributions are invaluable as we continue to improve Flink CDC.
>
> Thanks everyone for your support and happy exploring Flink CDC!
>
> Best，
> Leonard
> [1] https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
>
>

Re: [VOTE] FLIP-433: State Access on DataStream API V2

2024-03-20 Thread Rui Fan

+1(binding)

Thanks to Weijie for driving this proposal, which solves the problem that I
raised in FLIP-359.

Best,
Rui

On Thu, Mar 21, 2024 at 10:10 AM Hangxiang Yu  wrote:

> +1 (binding)
>
> On Thu, Mar 21, 2024 at 10:04 AM Xintong Song 
> wrote:
>
> > +1 (binding)
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Wed, Mar 20, 2024 at 8:30 PM weijie guo 
> > wrote:
> >
> > > Hi everyone,
> > >
> > >
> > > Thanks for all the feedback about the FLIP-433: State Access on
> > > DataStream API V2 [1]. The discussion thread is here [2].
> > >
> > >
> > > The vote will be open for at least 72 hours unless there is an
> > > objection or insufficient votes.
> > >
> > >
> > > Best regards,
> > >
> > > Weijie
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-433%3A+State+Access+on+DataStream+API+V2
> > >
> > > [2] https://lists.apache.org/thread/8ghzqkvt99p4k7hjqxzwhqny7zb7xrwo
> > >
> >
>
>
> --
> Best,
> Hangxiang.
>

Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-20 Thread Hongshun Wang

Congratulations!

Best,
Hongshun

On Tue, Mar 19, 2024 at 3:12 PM Shawn Huang  wrote:

> Congratulations!
>
> Best,
> Shawn Huang
>
>
> Xuannan Su  于2024年3月19日周二 14:40写道：
>
> > Congratulations! Thanks for all the great work!
> >
> > Best regards,
> > Xuannan
> >
> > On Tue, Mar 19, 2024 at 1:31 PM Yu Li  wrote:
> > >
> > > Congrats and thanks all for the efforts!
> > >
> > > Best Regards,
> > > Yu
> > >
> > > On Tue, 19 Mar 2024 at 11:51, gongzhongqiang <
> gongzhongqi...@apache.org>
> > wrote:
> > > >
> > > > Congrats! Thanks to everyone involved!
> > > >
> > > > Best,
> > > > Zhongqiang Gong
> > > >
> > > > Lincoln Lee  于2024年3月18日周一 16:27写道：
> > > >>
> > > >> The Apache Flink community is very happy to announce the release of
> > Apache
> > > >> Flink 1.19.0, which is the fisrt release for the Apache Flink 1.19
> > series.
> > > >>
> > > >> Apache Flink® is an open-source stream processing framework for
> > > >> distributed, high-performing, always-available, and accurate data
> > streaming
> > > >> applications.
> > > >>
> > > >> The release is available for download at:
> > > >> https://flink.apache.org/downloads.html
> > > >>
> > > >> Please check out the release blog post for an overview of the
> > improvements
> > > >> for this bugfix release:
> > > >>
> >
> https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/
> > > >>
> > > >> The full release notes are available in Jira:
> > > >>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353282
> > > >>
> > > >> We would like to thank all contributors of the Apache Flink
> community
> > who
> > > >> made this release possible!
> > > >>
> > > >>
> > > >> Best,
> > > >> Yun, Jing, Martijn and Lincoln
> >
>

Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-20 Thread Yanquan Lv

Congratulations! Thanks to all contributors.

Best,
Yanquan

Shawn Huang  于2024年3月19日周二 15:12写道：

> Congratulations!
>
> Best,
> Shawn Huang
>
>
> Xuannan Su  于2024年3月19日周二 14:40写道：
>
> > Congratulations! Thanks for all the great work!
> >
> > Best regards,
> > Xuannan
> >
> > On Tue, Mar 19, 2024 at 1:31 PM Yu Li  wrote:
> > >
> > > Congrats and thanks all for the efforts!
> > >
> > > Best Regards,
> > > Yu
> > >
> > > On Tue, 19 Mar 2024 at 11:51, gongzhongqiang <
> gongzhongqi...@apache.org>
> > wrote:
> > > >
> > > > Congrats! Thanks to everyone involved!
> > > >
> > > > Best,
> > > > Zhongqiang Gong
> > > >
> > > > Lincoln Lee  于2024年3月18日周一 16:27写道：
> > > >>
> > > >> The Apache Flink community is very happy to announce the release of
> > Apache
> > > >> Flink 1.19.0, which is the fisrt release for the Apache Flink 1.19
> > series.
> > > >>
> > > >> Apache Flink® is an open-source stream processing framework for
> > > >> distributed, high-performing, always-available, and accurate data
> > streaming
> > > >> applications.
> > > >>
> > > >> The release is available for download at:
> > > >> https://flink.apache.org/downloads.html
> > > >>
> > > >> Please check out the release blog post for an overview of the
> > improvements
> > > >> for this bugfix release:
> > > >>
> >
> https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/
> > > >>
> > > >> The full release notes are available in Jira:
> > > >>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353282
> > > >>
> > > >> We would like to thank all contributors of the Apache Flink
> community
> > who
> > > >> made this release possible!
> > > >>
> > > >>
> > > >> Best,
> > > >> Yun, Jing, Martijn and Lincoln
> >
>

Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Hang Ruan

Congrattulations!

Best,
Hang

Lincoln Lee  于2024年3月21日周四 09:54写道：

>
> Congrats, thanks for the great work!
>
>
> Best,
> Lincoln Lee
>
>
> Peter Huang  于2024年3月20日周三 22:48写道：
>
>> Congratulations
>>
>>
>> Best Regards
>> Peter Huang
>>
>> On Wed, Mar 20, 2024 at 6:56 AM Huajie Wang  wrote:
>>
>>>
>>> Congratulations
>>>
>>>
>>>
>>> Best,
>>> Huajie Wang
>>>
>>>
>>>
>>> Leonard Xu  于2024年3月20日周三 21:36写道：
>>>
 Hi devs and users,

 We are thrilled to announce that the donation of Flink CDC as a
 sub-project of Apache Flink has completed. We invite you to explore the new
 resources available:

 - GitHub Repository: https://github.com/apache/flink-cdc
 - Flink CDC Documentation:
 https://nightlies.apache.org/flink/flink-cdc-docs-stable

 After Flink community accepted this donation[1], we have completed
 software copyright signing, code repo migration, code cleanup, website
 migration, CI migration and github issues migration etc.
 Here I am particularly grateful to Hang Ruan, Zhongqaing Gong,
 Qingsheng Ren, Jiabao Sun, LvYanquan, loserwang1024 and other contributors
 for their contributions and help during this process！

 For all previous contributors: The contribution process has slightly
 changed to align with the main Flink project. To report bugs or suggest new
 features, please open tickets
 Apache Jira (https://issues.apache.org/jira).  Note that we will no
 longer accept GitHub issues for these purposes.

 Welcome to explore the new repository and documentation. Your feedback
 and contributions are invaluable as we continue to improve Flink CDC.

 Thanks everyone for your support and happy exploring Flink CDC!

 Best，
 Leonard
 [1] https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob

Re: [VOTE] FLIP-433: State Access on DataStream API V2

2024-03-20 Thread Hangxiang Yu

+1 (binding)

On Thu, Mar 21, 2024 at 10:04 AM Xintong Song  wrote:

> +1 (binding)
>
> Best,
>
> Xintong
>
>
>
> On Wed, Mar 20, 2024 at 8:30 PM weijie guo 
> wrote:
>
> > Hi everyone,
> >
> >
> > Thanks for all the feedback about the FLIP-433: State Access on
> > DataStream API V2 [1]. The discussion thread is here [2].
> >
> >
> > The vote will be open for at least 72 hours unless there is an
> > objection or insufficient votes.
> >
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-433%3A+State+Access+on+DataStream+API+V2
> >
> > [2] https://lists.apache.org/thread/8ghzqkvt99p4k7hjqxzwhqny7zb7xrwo
> >
>


-- 
Best,
Hangxiang.

Re: [VOTE] FLIP-433: State Access on DataStream API V2

2024-03-20 Thread Xintong Song

+1 (binding)

Best,

Xintong



On Wed, Mar 20, 2024 at 8:30 PM weijie guo 
wrote:

> Hi everyone,
>
>
> Thanks for all the feedback about the FLIP-433: State Access on
> DataStream API V2 [1]. The discussion thread is here [2].
>
>
> The vote will be open for at least 72 hours unless there is an
> objection or insufficient votes.
>
>
> Best regards,
>
> Weijie
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-433%3A+State+Access+on+DataStream+API+V2
>
> [2] https://lists.apache.org/thread/8ghzqkvt99p4k7hjqxzwhqny7zb7xrwo
>

[jira] [Created] (FLINK-34898) Cannot create named STRUCT with a single field

2024-03-20 Thread Chloe He (Jira)

Chloe He created FLINK-34898:


 Summary: Cannot create named STRUCT with a single field
 Key: FLINK-34898
 URL: https://issues.apache.org/jira/browse/FLINK-34898
 Project: Flink
  Issue Type: Bug
Reporter: Chloe He


I'm trying to create named structs using Flink SQL and I found a previous 
ticket https://issues.apache.org/jira/browse/FLINK-9161 that mentions the use 
of the following syntax:

```sql

SELECT CAST(('a', 1) as ROW) AS row1;

```

However, my named struct has a single field and effectively it should look 
something like `\{"a": 1}`. I can't seem to be able to find a way to construct 
this. I have experimented with a few different syntax and it either throws 
parsing error or casting error:

```

Cast function cannot convert value of type INTEGER to type 
RecordType(VARCHAR(2147483647) a)

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Lincoln Lee

Congrats, thanks for the great work!


Best,
Lincoln Lee


Peter Huang  于2024年3月20日周三 22:48写道：

> Congratulations
>
>
> Best Regards
> Peter Huang
>
> On Wed, Mar 20, 2024 at 6:56 AM Huajie Wang  wrote:
>
>>
>> Congratulations
>>
>>
>>
>> Best,
>> Huajie Wang
>>
>>
>>
>> Leonard Xu  于2024年3月20日周三 21:36写道：
>>
>>> Hi devs and users,
>>>
>>> We are thrilled to announce that the donation of Flink CDC as a
>>> sub-project of Apache Flink has completed. We invite you to explore the new
>>> resources available:
>>>
>>> - GitHub Repository: https://github.com/apache/flink-cdc
>>> - Flink CDC Documentation:
>>> https://nightlies.apache.org/flink/flink-cdc-docs-stable
>>>
>>> After Flink community accepted this donation[1], we have completed
>>> software copyright signing, code repo migration, code cleanup, website
>>> migration, CI migration and github issues migration etc.
>>> Here I am particularly grateful to Hang Ruan, Zhongqaing Gong, Qingsheng
>>> Ren, Jiabao Sun, LvYanquan, loserwang1024 and other contributors for their
>>> contributions and help during this process！
>>>
>>>
>>> For all previous contributors: The contribution process has slightly
>>> changed to align with the main Flink project. To report bugs or suggest new
>>> features, please open tickets
>>> Apache Jira (https://issues.apache.org/jira).  Note that we will no
>>> longer accept GitHub issues for these purposes.
>>>
>>>
>>> Welcome to explore the new repository and documentation. Your feedback
>>> and contributions are invaluable as we continue to improve Flink CDC.
>>>
>>> Thanks everyone for your support and happy exploring Flink CDC!
>>>
>>> Best，
>>> Leonard
>>> [1] https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
>>>
>>>

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.8.0, release candidate #1

2024-03-20 Thread Mate Czagany

Hi,

+1 (non-binding)

- Verified checksums
- Verified signatures
- Verified no binaries in source distribution
- Verified Apache License and NOTICE files
- Executed tests
- Built container image
- Verified chart version and appVersion matches
- Verified Helm chart can be installed with default values
- Verify that RC repo works as Helm repo

Best Regards,
Mate

Alexander Fedulov  ezt írta (időpont: 2024.
márc. 19., K, 23:10):

> Hi Max,
>
> +1
>
> - Verified SHA checksums
> - Verified GPG signatures
> - Verified that the source distributions do not contain binaries
> - Verified built-in tests (mvn clean verify)
> - Verified build with Java 11 (mvn clean install -DskipTests -T 1C)
> - Verified that Helm and operator files contain Apache licenses (rg -L
> --files-without-match "http://www.apache.org/licenses/LICENSE-2.0; .).
>  I am not sure we need to
> include ./examples/flink-beam-example/dependency-reduced-pom.xml
> and ./flink-autoscaler-standalone/dependency-reduced-pom.xml though
> - Verified that chart and appVersion matches the target release (91d67d9)
> - Verified that Helm chart can be installed from the local Helm folder
> without overriding any parameters
> - Verified that Helm chart can be installed from the RC repo without
> overriding any parameters (
>
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.8.0-rc1
> )
> - Verified docker container build
>
> Best,
> Alex
>
>
> On Mon, 18 Mar 2024 at 20:50, Maximilian Michels  wrote:
>
> > @Rui @Gyula Thanks for checking the release!
> >
> > >A minor correction is that [3] in the email should point to:
> > >ghcr.io/apache/flink-kubernetes-operator:91d67d9 . But the helm chart
> and
> > > everything is correct. It's a typo in the vote email.
> >
> > Good catch. Indeed, for the linked Docker image 8938658 points to
> > HEAD^ of the rc branch, 91d67d9 is the HEAD. There are no code changes
> > between those two commits, except for updating the version. So the
> > votes are not impacted, especially because votes are casted against
> > the source release which, as you pointed out, contains the correct
> > image ref.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Mar 18, 2024 at 9:54 AM Gyula Fóra  wrote:
> > >
> > > Hi Max!
> > >
> > > +1 (binding)
> > >
> > >  - Verified source release, helm chart + checkpoints / signatures
> > >  - Helm points to correct image
> > >  - Deployed operator, stateful example and executed upgrade + savepoint
> > > redeploy
> > >  - Verified logs
> > >  - Flink web PR looks good +1
> > >
> > > A minor correction is that [3] in the email should point to:
> > > ghcr.io/apache/flink-kubernetes-operator:91d67d9 . But the helm chart
> > and
> > > everything is correct. It's a typo in the vote email.
> > >
> > > Thank you for preparing the release!
> > >
> > > Cheers,
> > > Gyula
> > >
> > > On Mon, Mar 18, 2024 at 8:26 AM Rui Fan <1996fan...@gmail.com> wrote:
> > >
> > > > Thanks Max for driving this release!
> > > >
> > > > +1(non-binding)
> > > >
> > > > - Downloaded artifacts from dist ( svn co
> > > >
> > > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.8.0-rc1/
> > > > )
> > > > - Verified SHA512 checksums : ( for i in *.tgz; do echo $i; sha512sum
> > > > --check $i.sha512; done )
> > > > - Verified GPG signatures : ( $ for i in *.tgz; do echo $i; gpg
> > --verify
> > > > $i.asc $i )
> > > > - Build the source with java-11 and java-17 ( mvn -T 20 clean install
> > > > -DskipTests )
> > > > - Verified the license header during build the source
> > > > - Verified that chart and appVersion matches the target release (less
> > the
> > > > index.yaml and Chart.yaml )
> > > > - RC repo works as Helm repo( helm repo add
> > flink-operator-repo-1.8.0-rc1
> > > >
> > > >
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.8.0-rc1/
> > > > )
> > > > - Verified Helm chart can be installed  ( helm install
> > > > flink-kubernetes-operator
> > > > flink-operator-repo-1.8.0-rc1/flink-kubernetes-operator --set
> > > > webhook.create=false )
> > > > - Submitted the autoscaling demo, the autoscaler works well with
> > *memory
> > > > tuning *(kubectl apply -f autoscaling.yaml)
> > > >- job.autoscaler.memory.tuning.enabled: "true"
> > > > - Download Autoscaler standalone: wget
> > > >
> > > >
> >
> https://repository.apache.org/content/repositories/orgapacheflink-1710/org/apache/flink/flink-autoscaler-standalone/1.8.0/flink-autoscaler-standalone-1.8.0.jar
> > > > - Ran Autoscaler standalone locally, it works well with rescale api
> and
> > > > JDBC state store/event handler
> > > >
> > > > Best,
> > > > Rui
> > > >
> > > > On Fri, Mar 15, 2024 at 1:45 AM Maximilian Michels 
> > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Please review and vote on the release candidate #1 for the version
> > > > > 1.8.0 of the Apache Flink Kubernetes Operator, as follows:
> > > > >
> > > > > [ ] +1, Approve the release
> > > > > [ ] -1, Do

Re: Flink Kubernetes Operator Failing Over FlinkDeployments to a New Cluster

2024-03-20 Thread Gyula Fóra

Sorry for the late reply Kevin.

I think what you are suggesting makes sense, it would be basically a
`last-state` startup mode. This would also help in cases where the current
last-state mechanism fails to locate HA metadata (and the state).

This is somewhat of a tricky feature to implement:
 1. The operator will need FS plugins and access to the different user envs
(this will not work in many prod environments unfortunately)
 2. Flink doesn't expose a good way to detect the latest checkpoint just by
looking at the FS so we need to figure out something here. Probably some
changes are necessary on Flink core side as well

Gyula

Re: Proposal to modify Flink’s startup sequence for MINIMUM downtime across deployments

2024-03-20 Thread Sergio Chong Loo

Hi Asimansu,

Thanks for chiming in!

Yes, this is pretty much what we’re trying to accomplish, a few notes:

- With 2 instances constantly running is hard to control who writes to the sink 
and when (only the active one should), we can only write to 1 sink and don’t 
want duplicates records going out.
- Your suggestion of “keep note of the offsets” is what we’re trying to 
accomplish via the Checkpoint “handover” portion.
- The Start point for the “new” deployment should be right after this 
checkpoint (state) is loaded, and yes by then the Ref. Data should be loaded 
already so the job can just go.

Did I misunderstand anything from your suggestion?

Sergio


> On Mar 19, 2024, at 1:55 PM, Asimansu Bera  wrote:
> 
> Resending ->
> 
> Hello Sergio,
> I'm a newbie(not an expert) and give it a try.
> 
> If I presume you've only a reference state on Flink and no other states and 
> Kafka is the source, you may see below approach as an option , should test.
> 
> a. Create two applications - Primary and Secondary (blue and green)
> b. Create two S3 buckets - Primary and secondary(S3/A and S3/B)
> c. S3/A and S3/B  -both are in sync with reference data only
> d. Refresh the reference data on both  Primary and Secondary buckets everyday 
> - separate Flink job(refresh reference data) with separate topic
> e. Just shut down the primary cluster/instance . Note the latest offset value 
> of Kafka topic(s) of transaction events data till which primary able to 
> complete processing
> f. intermittently processing could be ignored as processing would start from 
> the latest successful offset at secondary side
> f. start the secondary cluster with newer deployment with Kafka - offset 
> setting latest. No gap of offset value of Kafka topic between Primary side 
> process where it stopped vs secondary where it starts
> 
> Logically it should work with very minimum downtime. Downside is, you need to 
> maintain two S3 buckets.
> 
> I leave it to Flink committers and experts to comment as I may have 
> completely misunderstood the problem and internal operations.
> 
> -A
> 
> On Tue, Mar 19, 2024 at 1:52 PM Asimansu Bera  > wrote:
>> Hello Sergio,
>> I'm a newbie(not an expert) and give it a try.
>> 
>> If I presume you've only a reference state on Flink and no other states and 
>> Kafka is the source, you may see below approach as an option , should test.
>> 
>> a. Create two applications - Primary and Secondary (blue and green)
>> b. Create two S3 buckets - Primary and secondary(S3/A and S3/B)
>> c. S3/A and S3/B  -both are in sync with reference data only
>> d. Refresh the reference data on both  Primary and Secondary buckets 
>> everyday - separate Flink job(refresh reference data) with separate topic
>> e. Just shut down the primary cluster/instance . Note the latest offset 
>> value of Kafka topic(s) of transaction events data till which primary able 
>> to complete processing
>> f. intermittently processing could be ignored as processing would start from 
>> the latest successful offset at secondary side
>> f. start the secondary cluster with newer deployment with Kafka - offset 
>> setting latest. No gap of offset value of Kafka topic between Primary side 
>> process where it stopped vs secondary where it starts
>> 
>> Logically it should work with very minimum downtime. Down side is, you need 
>> to maintain two S3 buckets.
>> 
>> I leave it to Flink committers and experts to comments as I may have 
>> completely misunderstood the problem and internal operations.
>> 
>> -A
>> 
>> On Tue, Mar 19, 2024 at 11:29 AM Sergio Chong Loo 
>>  wrote:
>>> Hello Flink Community,
>>> 
>>> We have a particular challenging scenario, which we’d like to run by the 
>>> rest of the community experts and check
>>> 
>>> 1. If anything can be done with existing functionality that we’re 
>>> overlooking, or
>>> 2. The feasibility of this proposal.
>>> 
>>> I tried to keep it concise in a 1-pager type format.
>>> 
>>> Thanks in advance,
>>> Sergio Chong
>>> 
>>> —
>>> 
>>> Goal
>>> 
>>> We have a particular use case with a team evaluating and looking to adopt 
>>> Flink. Their pipelines have a specially intricate and long bootstrapping 
>>> sequence.
>>> 
>>> The main objective: to have a minimum downtime (~1 minute as a starting 
>>> point). The situation that would affect this downtime the most for this 
>>> team is a (re)deployment. All changes proposed here are the minimum 
>>> necessary within a Flink job’s lifecycle boundaries only.
>>> 
>>> 
>>> Non-goal
>>> 
>>> Anything that requires coordination of 2 jobs or more belongs in the 
>>> orchestration layer and is outside of the scope of this document; no 
>>> inter-job awareness/communication should be considered (initially). 
>>> However, any necessary “hooks” should be provided to make that integration 
>>> smoother.
>>> 
>>> 
>>> Scenario
>>> 
>>> The (re)deployments are particularly of concern here. Given the stateful 
>>> nature of Flink we

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-20 Thread Mingge Deng

Thanks Jark for all the insightful comments.

We have updated the proposal per our offline discussions:
1. Model will be treated as a new relation in FlinkSQL.
2. Include the common ML predict and evaluate functions into the open
source flink to complete the user journey.
And we should be able to extend the calcite SqlTableFunction to support
these two ML functions.

Best,
Mingge

On Mon, Mar 18, 2024 at 7:05 PM Jark Wu  wrote:

> Hi Hao,
>
> > I meant how the table name
> in window TVF gets translated to `SqlCallingBinding`. Probably we need to
> fetch the table definition from the catalog somewhere. Do we treat those
> window TVF specially in parser/planner so that catalog is looked up when
> they are seen?
>
> The table names are resolved and validated by Calcite SqlValidator.  We
> don' need to fetch from catalog manually.
> The specific checking logic of cumulate window happens in
> SqlCumulateTableFunction.OperandMetadataImpl#checkOperandTypes.
> The return type of SqlCumulateTableFunction is defined in
> #getRowTypeInference() method.
> Both are public interfaces provided by Calcite and it seems it's not
> specially handled in parser/planner.
>
> I didn't try that, but my gut feeling is that the framework is ready to
> extend a customized TVF.
>
> > For what model is, I'm wondering if it has to be datatype or relation.
> Can
> it be another kind of citizen parallel to datatype/relation/function/db?
> Redshift also supports `show models` operation, so it seems it's treated
> specially as well?
>
> If it is an entity only used in catalog scope (e.g., show xxx, create xxx,
> drop xxx), it is fine to introduce it.
> We have introduced such one before, called Module: "load module", "show
> modules" [1].
> But if we want to use Model in TVF parameters, it means it has to be a
> relation or datatype, because
> that is what it only accepts now.
>
> Thanks for sharing the reason of preferring TVF instead of Redshift way. It
> sounds reasonable to me.
>
> Best,
> Jark
>
>  [1]:
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/modules/
>
> On Fri, 15 Mar 2024 at 13:41, Hao Li  wrote:
>
> > Hi Jark,
> >
> > Thanks for the pointer. Sorry for the confusion: I meant how the table
> name
> > in window TVF gets translated to `SqlCallingBinding`. Probably we need to
> > fetch the table definition from the catalog somewhere. Do we treat those
> > window TVF specially in parser/planner so that catalog is looked up when
> > they are seen?
> >
> > For what model is, I'm wondering if it has to be datatype or relation.
> Can
> > it be another kind of citizen parallel to datatype/relation/function/db?
> > Redshift also supports `show models` operation, so it seems it's treated
> > specially as well? The reasons I don't like Redshift's syntax are:
> > 1. It's a bit verbose, you need to think of a model name as well as a
> > function name and the function name also needs to be unique.
> > 2. More importantly, prediction function isn't the only function that can
> > operate on models. There could be a set of inference functions [1] and
> > evaluation functions [2] which can operate on models. It's hard to
> specify
> > all of them in model creation.
> >
> > [1]:
> >
> >
> https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-predict
> > [2]:
> >
> >
> https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate
> >
> > Thanks,
> > Hao
> >
> > On Thu, Mar 14, 2024 at 8:18 PM Jark Wu  wrote:
> >
> > > Hi Hao,
> > >
> > > > Can you send me some pointers
> > > where the function gets the table information?
> > >
> > > Here is the code of cumulate window type checking [1].
> > >
> > > > Also is it possible to support  in
> > > window functions in addiction to table?
> > >
> > > Yes. It is not allowed in TVF.
> > >
> > > Thanks for the syntax links of other systems. The reason I prefer the
> > > Redshift way is
> > > that it avoids introducing Model as a relation or datatype (referenced
> > as a
> > > parameter in TVF).
> > > Model is not a relation because it can be queried directly (e.g.,
> SELECT
> > *
> > > FROM model).
> > > I'm also confused about making Model as a datatype, because I don't
> know
> > > what class the
> > > model parameter of the eval method of TableFunction/ScalarFunction
> should
> > > be. By defining
> > > the function with the model, users can directly invoke the function
> > without
> > > reference to the model name.
> > >
> > > Best,
> > > Jark
> > >
> > > [1]:
> > >
> > >
> >
> https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/sql/SqlCumulateTableFunction.java#L53
> > >
> > > On Fri, 15 Mar 2024 at 02:48, Hao Li  wrote:
> > >
> > > > Hi Jark,
> > > >
> > > > Thanks for the pointers. It's very helpful.
> > > >
> > > > 1. Looks like `tumble`, `hopping` are keywords in calcite parser. And
> > the
> > > > syntax

Re: [VOTE] FLIP-439: Externalize Kudu Connector from Bahir

2024-03-20 Thread Őrhidi Mátyás

+1 (binding)

On Wed, Mar 20, 2024 at 8:37 AM Gabor Somogyi 
wrote:

> +1 (binding)
>
> G
>
>
> On Wed, Mar 20, 2024 at 3:59 PM Gyula Fóra  wrote:
>
> > +1 (binding)
> >
> > Thanks!
> > Gyula
> >
> > On Wed, Mar 20, 2024 at 3:36 PM Mate Czagany  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Thank you,
> > > Mate
> > >
> > > Ferenc Csaky  ezt írta (időpont: 2024.
> márc.
> > > 20., Sze, 15:11):
> > >
> > > > Hello devs,
> > > >
> > > > I would like to start a vote about FLIP-439 [1]. The FLIP is about to
> > > > externalize the Kudu
> > > > connector from the recently retired Apache Bahir project [2] to keep
> it
> > > > maintainable and
> > > > make it up to date as well. Discussion thread [3].
> > > >
> > > > The vote will be open for at least 72 hours (until 2024 March 23
> 14:03
> > > > UTC) unless there
> > > > are any objections or insufficient votes.
> > > >
> > > > Thanks,
> > > > Ferenc
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-439%3A+Externalize+Kudu+Connector+from+Bahir
> > > > [2] https://attic.apache.org/projects/bahir.html
> > > > [3] https://lists.apache.org/thread/oydhcfkco2kqp4hdd1glzy5vkw131rkz
> > >
> >
>

Re: [VOTE] FLIP-439: Externalize Kudu Connector from Bahir

2024-03-20 Thread Gabor Somogyi

+1 (binding)

G


On Wed, Mar 20, 2024 at 3:59 PM Gyula Fóra  wrote:

> +1 (binding)
>
> Thanks!
> Gyula
>
> On Wed, Mar 20, 2024 at 3:36 PM Mate Czagany  wrote:
>
> > +1 (non-binding)
> >
> > Thank you,
> > Mate
> >
> > Ferenc Csaky  ezt írta (időpont: 2024. márc.
> > 20., Sze, 15:11):
> >
> > > Hello devs,
> > >
> > > I would like to start a vote about FLIP-439 [1]. The FLIP is about to
> > > externalize the Kudu
> > > connector from the recently retired Apache Bahir project [2] to keep it
> > > maintainable and
> > > make it up to date as well. Discussion thread [3].
> > >
> > > The vote will be open for at least 72 hours (until 2024 March 23 14:03
> > > UTC) unless there
> > > are any objections or insufficient votes.
> > >
> > > Thanks,
> > > Ferenc
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-439%3A+Externalize+Kudu+Connector+from+Bahir
> > > [2] https://attic.apache.org/projects/bahir.html
> > > [3] https://lists.apache.org/thread/oydhcfkco2kqp4hdd1glzy5vkw131rkz
> >
>

Re: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-03-20 Thread David Radley

Thank you very much for your feedback Mark. I have made the changes in the 
latest google document. On reflection I agree with you that the 
globalIdPlacement format configuration should apply to the deserialization as 
well, so it is declarative. I am also going to have a new configuration option 
to work with content IDs as well as global IDs. In line with the deser Apicurio 
IdHandler and headerHandlers. 

 kind regards, David.


On 2024/03/20 15:18:37 Mark Nuttall wrote:
> +1 to this
> 
> A few small comments: 
> 
> Currently, if users have Avro schemas in an Apicurio Registry (an open source 
> Apache 2 licensed schema registry), then the natural way to work with those 
> Avro flows is to use the schemas in the Apicurio Repository.
> 'those Avro flows' ... this is the first reference to flows.
> 
> The new format will use the global Id to look up the Avro schema that the 
> message was written during deserialization.
> I get the point, phrasing is awkward. Probably you're more interested in 
> content than word polish at this point though.
> 
> The Avro Schema Registry (apicurio-avro) format
> The Confluent format is called avro-confluent; this should be avro-apicurio
> 
> How to create tables with Apicurio-avro format
> s/Apicurio-avro/avro-apicurio/g
> 
> HEADER – globalId is put in the header
> LEGACY– global Id is put in the message as a long
> CONFLUENT - globalId is put in the message as an int.
> Please could we specify 'four-byte int' and 'eight-byte long' ?
> 
> For a Kafka source the globalId will be looked for in this order:
> - In the header
> - After a magic byte as an int
> - After a magic byte as a long.
> but apicurio-avro.globalid-placement has a default value of HEADER : why do 
> we have a search order as well? Isn't apicurio-avro.globalid-placement 
> enough? Don't the two mechanisms conflict?
> 
> In addition to the types listed there, Flink supports reading/writing 
> nullable types. Flink maps nullable types to Avro union(something, null), 
> where something is the Avro type converted from Flink type.
> Is that definitely the right way round? I know we've had multiple 
> conversations about how unions work with Flink
> 
>  This is because the writer schema is expanded, but this could not complete 
> if there are circularities.
> I understand your meaning but the sentence is awkward.
> 
> The registered schema will be created or if it exists be updated.
> same again
> 
> At some stage the lowest Flink level supported by the Kafka connector will 
> contain the additionalProperties methods in code flink.
> wording
> 
> There existing Kafka deserialization for the writer schema passes down the 
> message body to be deserialised.
> wording
> 
> @Override
> public void deserialize(ConsumerRecord message, Collector 
> out)
>   throws IOException {
>   Map additionalPropertiesMap =  new HashMap<>();
>   for (Header header : message.additionalProperties()) {
>   headersMap.put(header.key(), header.value());
>   }
>   deserializationSchema.deserialize(message.value(), headersMap, out);
> }
> This fails to compile at headersMap.
> 
> The input stream and additionalProperties will be sent so the Apicurio 
> SchemaCoder which will try getting the globalId from the headers, then 4 
> bytes from the payload then 8 bytes from the payload.
> I'm still stuck on apicurio-avro.globalid-placement having a default value of 
> HEADER . Should we try all three, or fail if this config param has a wrong 
> value?
> 
> Other considerations
> The implementation does not use the Apicurio deser libraries,
> Please can we refer to them as SerDes; this is the term used within the 
> documentation that you link to
> 
> 
> On 2024/03/20 10:09:08 David Radley wrote:
> > Hi,
> > As per the FLIP process I would like to raise a FLIP, but do not have 
> > authority, so have created a google doc for the Flip to introduce a new 
> > Apicurio Avro format. The document is 
> > https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing
> > 
> > I have prototyped a lot of the content to prove that this approach is 
> > feasible. I look forward to the discussion,
> >   Kind regards, David.
> > 
> > 
> > 
> > Unless otherwise stated above:
> > 
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> > 
>

Re: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-03-20 Thread Mark Nuttall

+1 to this

A few small comments: 

Currently, if users have Avro schemas in an Apicurio Registry (an open source 
Apache 2 licensed schema registry), then the natural way to work with those 
Avro flows is to use the schemas in the Apicurio Repository.
'those Avro flows' ... this is the first reference to flows.

The new format will use the global Id to look up the Avro schema that the 
message was written during deserialization.
I get the point, phrasing is awkward. Probably you're more interested in 
content than word polish at this point though.

The Avro Schema Registry (apicurio-avro) format
The Confluent format is called avro-confluent; this should be avro-apicurio

How to create tables with Apicurio-avro format
s/Apicurio-avro/avro-apicurio/g

HEADER – globalId is put in the header
LEGACY– global Id is put in the message as a long
CONFLUENT - globalId is put in the message as an int.
Please could we specify 'four-byte int' and 'eight-byte long' ?

For a Kafka source the globalId will be looked for in this order:
-   In the header
-   After a magic byte as an int
-   After a magic byte as a long.
but apicurio-avro.globalid-placement has a default value of HEADER : why do we 
have a search order as well? Isn't apicurio-avro.globalid-placement enough? 
Don't the two mechanisms conflict?

In addition to the types listed there, Flink supports reading/writing nullable 
types. Flink maps nullable types to Avro union(something, null), where 
something is the Avro type converted from Flink type.
Is that definitely the right way round? I know we've had multiple conversations 
about how unions work with Flink

 This is because the writer schema is expanded, but this could not complete if 
there are circularities.
I understand your meaning but the sentence is awkward.

The registered schema will be created or if it exists be updated.
same again

At some stage the lowest Flink level supported by the Kafka connector will 
contain the additionalProperties methods in code flink.
wording

There existing Kafka deserialization for the writer schema passes down the 
message body to be deserialised.
wording

@Override
public void deserialize(ConsumerRecord message, Collector 
out)
throws IOException {
Map additionalPropertiesMap =  new HashMap<>();
for (Header header : message.additionalProperties()) {
headersMap.put(header.key(), header.value());
}
deserializationSchema.deserialize(message.value(), headersMap, out);
}
This fails to compile at headersMap.

The input stream and additionalProperties will be sent so the Apicurio 
SchemaCoder which will try getting the globalId from the headers, then 4 bytes 
from the payload then 8 bytes from the payload.
I'm still stuck on apicurio-avro.globalid-placement having a default value of 
HEADER . Should we try all three, or fail if this config param has a wrong 
value?

Other considerations
The implementation does not use the Apicurio deser libraries,
Please can we refer to them as SerDes; this is the term used within the 
documentation that you link to

On 2024/03/20 10:09:08 David Radley wrote:
> Hi,
> As per the FLIP process I would like to raise a FLIP, but do not have 
> authority, so have created a google doc for the Flip to introduce a new 
> Apicurio Avro format. The document is 
> https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing
> 
> I have prototyped a lot of the content to prove that this approach is 
> feasible. I look forward to the discussion,
>   Kind regards, David.
> 
> 
> 
> Unless otherwise stated above:
> 
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>

Re: [VOTE] FLIP-439: Externalize Kudu Connector from Bahir

2024-03-20 Thread Gyula Fóra

+1 (binding)

Thanks!
Gyula

On Wed, Mar 20, 2024 at 3:36 PM Mate Czagany  wrote:

> +1 (non-binding)
>
> Thank you,
> Mate
>
> Ferenc Csaky  ezt írta (időpont: 2024. márc.
> 20., Sze, 15:11):
>
> > Hello devs,
> >
> > I would like to start a vote about FLIP-439 [1]. The FLIP is about to
> > externalize the Kudu
> > connector from the recently retired Apache Bahir project [2] to keep it
> > maintainable and
> > make it up to date as well. Discussion thread [3].
> >
> > The vote will be open for at least 72 hours (until 2024 March 23 14:03
> > UTC) unless there
> > are any objections or insufficient votes.
> >
> > Thanks,
> > Ferenc
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-439%3A+Externalize+Kudu+Connector+from+Bahir
> > [2] https://attic.apache.org/projects/bahir.html
> > [3] https://lists.apache.org/thread/oydhcfkco2kqp4hdd1glzy5vkw131rkz
>

[jira] [Created] (FLINK-34897) JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip needs to be enabled again

2024-03-20 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34897:
-

 Summary: 
JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip
 needs to be enabled again
 Key: FLINK-34897
 URL: https://issues.apache.org/jira/browse/FLINK-34897
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.19.0, 1.17.2, 1.20.0
Reporter: Matthias Pohl


While working on FLINK-34672 I noticed that 
{{JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip}}
 is disabled without a reason.

It looks like I disabled it accidentally as part of FLINK-31783.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Peter Huang

Congratulations


Best Regards
Peter Huang

On Wed, Mar 20, 2024 at 6:56 AM Huajie Wang  wrote:

>
> Congratulations
>
>
>
> Best,
> Huajie Wang
>
>
>
> Leonard Xu  于2024年3月20日周三 21:36写道：
>
>> Hi devs and users,
>>
>> We are thrilled to announce that the donation of Flink CDC as a
>> sub-project of Apache Flink has completed. We invite you to explore the new
>> resources available:
>>
>> - GitHub Repository: https://github.com/apache/flink-cdc
>> - Flink CDC Documentation:
>> https://nightlies.apache.org/flink/flink-cdc-docs-stable
>>
>> After Flink community accepted this donation[1], we have completed
>> software copyright signing, code repo migration, code cleanup, website
>> migration, CI migration and github issues migration etc.
>> Here I am particularly grateful to Hang Ruan, Zhongqaing Gong, Qingsheng
>> Ren, Jiabao Sun, LvYanquan, loserwang1024 and other contributors for their
>> contributions and help during this process！
>>
>>
>> For all previous contributors: The contribution process has slightly
>> changed to align with the main Flink project. To report bugs or suggest new
>> features, please open tickets
>> Apache Jira (https://issues.apache.org/jira).  Note that we will no
>> longer accept GitHub issues for these purposes.
>>
>>
>> Welcome to explore the new repository and documentation. Your feedback
>> and contributions are invaluable as we continue to improve Flink CDC.
>>
>> Thanks everyone for your support and happy exploring Flink CDC!
>>
>> Best，
>> Leonard
>> [1] https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
>>
>>

Re: [VOTE] FLIP-439: Externalize Kudu Connector from Bahir

2024-03-20 Thread Mate Czagany

+1 (non-binding)

Thank you,
Mate

Ferenc Csaky  ezt írta (időpont: 2024. márc.
20., Sze, 15:11):

> Hello devs,
>
> I would like to start a vote about FLIP-439 [1]. The FLIP is about to
> externalize the Kudu
> connector from the recently retired Apache Bahir project [2] to keep it
> maintainable and
> make it up to date as well. Discussion thread [3].
>
> The vote will be open for at least 72 hours (until 2024 March 23 14:03
> UTC) unless there
> are any objections or insufficient votes.
>
> Thanks,
> Ferenc
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-439%3A+Externalize+Kudu+Connector+from+Bahir
> [2] https://attic.apache.org/projects/bahir.html
> [3] https://lists.apache.org/thread/oydhcfkco2kqp4hdd1glzy5vkw131rkz

[VOTE] FLIP-439: Externalize Kudu Connector from Bahir

2024-03-20 Thread Ferenc Csaky

Hello devs,

I would like to start a vote about FLIP-439 [1]. The FLIP is about to 
externalize the Kudu
connector from the recently retired Apache Bahir project [2] to keep it 
maintainable and
make it up to date as well. Discussion thread [3].

The vote will be open for at least 72 hours (until 2024 March 23 14:03 UTC) 
unless there
are any objections or insufficient votes.

Thanks,
Ferenc

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-439%3A+Externalize+Kudu+Connector+from+Bahir
[2] https://attic.apache.org/projects/bahir.html
[3] https://lists.apache.org/thread/oydhcfkco2kqp4hdd1glzy5vkw131rkz

Re: [DISCUSS] FLIP-423 ～FLIP-428: Introduce Disaggregated State Storage and Management in Flink 2.0

2024-03-20 Thread Piotr Nowojski

Hey Zakelly!

Sorry for the late reply. I still have concerns about the proposed
solution, with my main concerns coming from
the implications of the asynchronous state access API on the checkpointing
and responsiveness of Flink.

>> What also worries me a lot in this fine grained model is the effect on
the checkpointing times.
>
> Your concerns are very reasonable. Faster checkpointing is always a core
advantage of disaggregated state,
> but only for the async phase. There will be some complexity introduced by
in-flight requests, but I'd suggest
> a checkpoint containing those in-flight state requests as part of the
state, to accelerate the sync phase by
> skipping the buffer draining. This makes the buffer size have little
impact on checkpoint time. And all the
> changes keep within the execution model we proposed while the checkpoint
barrier alignment or handling
> will not be touched in our proposal, so I guess the complexity is
relatively controllable. I have faith in that :)

As we discussed off-line, you agreed that we can not checkpoint while some
records are in the middle of being
processed. That we would have to drain the in-progress records before doing
the checkpoint. You also argued
that this is not a problem, because the size of this buffer can be
configured.

I'm really afraid of such a solution. I've seen in the past plenty of
times, that whenever Flink has to drain some
buffered records, eventually that always brakes timely checkpointing (and
hence ability for Flink to rescale in
a timely manner). Even a single record with a `flatMap` like operator
currently in Flink causes problems during
back pressure. That's especially true for example for processing
watermarks. At the same time, I don't see how
this value could be configured by even Flink's power users, let alone an
average user. The size of that in-flight
buffer not only depends on a particular query/job, but also the "good"
value changes dynamically over time,
and can change very rapidly. Sudden spikes of records or backpressure, some
hiccup during emitting watermarks,
all of those could change in an instant the theoretically optimal buffer
size of let's say "6000" records, down to "1".
And when those changes happen, those are the exact times when timely
checkpointing matters the most.
If the load pattern suddenly changes, and checkpointing takes suddenly tens
of minutes instead of a couple of
seconds, it means you can not use rescaling and you are forced to
overprovision the resources. And there also
other issues if checkpointing takes too long.

At the same time, I still don't understand why we can not implement things
incrementally? First
let's start with the current API, without the need to rewrite all of the
operators, we can asynchronously fetch whole
state for a given record using its key. That should already vastly improve
many things, and this way we could
perform a checkpoint without a need of draining the in-progress/in-flight
buffer. We could roll that version out,
test it out in practice, and then we could see if the fine grained state
access is really needed. Otherwise it sounds
to me like a premature optimization, that requires us to not only rewrite a
lot of the code, but also to later maintain
it, even if it ultimately proves to be not needed. Which I of course can
not be certain but I have a feeling that it
might be the case.

Best,
Piotrek

wt., 19 mar 2024 o 10:42 Zakelly Lan  napisał(a):

> Hi everyone,
>
> Thanks for your valuable feedback!
>
> Our discussions have been going on for a while and are nearing a
> consensus. So I would like to start a vote after 72 hours.
>
> Please let me know if you have any concerns, thanks!
>
>
> Best,
> Zakelly
>
> On Tue, Mar 19, 2024 at 3:37 PM Zakelly Lan  wrote:
>
> > Hi Yunfeng,
> >
> > Thanks for the suggestion!
> >
> > I will reorganize the FLIP-425 accordingly.
> >
> >
> > Best,
> > Zakelly
> >
> > On Tue, Mar 19, 2024 at 3:20 PM Yunfeng Zhou <
> flink.zhouyunf...@gmail.com>
> > wrote:
> >
> >> Hi Xintong and Zakelly,
> >>
> >> > 2. Regarding Strictly-ordered and Out-of-order of Watermarks
> >> I agree with it that watermarks can use only out-of-order mode for
> >> now, because there is still not a concrete example showing the
> >> correctness risk about it. However, the strictly-ordered mode should
> >> still be supported as the default option for non-record event types
> >> other than watermark, at least for checkpoint barriers.
> >>
> >> I noticed that this information has already been documented in "For
> >> other non-record events, such as RecordAttributes ...", but it's at
> >> the bottom of the "Watermark" section, which might not be very
> >> obvious. Thus it might be better to reorganize the FLIP to better
> >> claim that the two order modes are designed for all non-record events,
> >> and which mode this FLIP would choose for each type of event.
> >>
> >> Best,
> >> Yunfeng
> >>
> >> On Tue, Mar 19, 2024 at 1:09 PM Xintong Song 
> >> wrote:
> >> >
> >> >

Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Huajie Wang

Congratulations



Best,
Huajie Wang



Leonard Xu  于2024年3月20日周三 21:36写道：

> Hi devs and users,
>
> We are thrilled to announce that the donation of Flink CDC as a
> sub-project of Apache Flink has completed. We invite you to explore the new
> resources available:
>
> - GitHub Repository: https://github.com/apache/flink-cdc
> - Flink CDC Documentation:
> https://nightlies.apache.org/flink/flink-cdc-docs-stable
>
> After Flink community accepted this donation[1], we have completed
> software copyright signing, code repo migration, code cleanup, website
> migration, CI migration and github issues migration etc.
> Here I am particularly grateful to Hang Ruan, Zhongqaing Gong, Qingsheng
> Ren, Jiabao Sun, LvYanquan, loserwang1024 and other contributors for their
> contributions and help during this process！
>
>
> For all previous contributors: The contribution process has slightly
> changed to align with the main Flink project. To report bugs or suggest new
> features, please open tickets
> Apache Jira (https://issues.apache.org/jira).  Note that we will no
> longer accept GitHub issues for these purposes.
>
>
> Welcome to explore the new repository and documentation. Your feedback and
> contributions are invaluable as we continue to improve Flink CDC.
>
> Thanks everyone for your support and happy exploring Flink CDC!
>
> Best，
> Leonard
> [1] https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob
>
>

[ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Leonard Xu

Hi devs and users,

We are thrilled to announce that the donation of Flink CDC as a sub-project of 
Apache Flink has completed. We invite you to explore the new resources 
available:

- GitHub Repository: https://github.com/apache/flink-cdc
- Flink CDC Documentation: 
https://nightlies.apache.org/flink/flink-cdc-docs-stable

After Flink community accepted this donation[1], we have completed software 
copyright signing, code repo migration, code cleanup, website migration, CI 
migration and github issues migration etc. 
Here I am particularly grateful to Hang Ruan, Zhongqaing Gong, Qingsheng Ren, 
Jiabao Sun, LvYanquan, loserwang1024 and other contributors for their 
contributions and help during this process！


For all previous contributors: The contribution process has slightly changed to 
align with the main Flink project. To report bugs or suggest new features, 
please open tickets 
Apache Jira (https://issues.apache.org/jira).  Note that we will no longer 
accept GitHub issues for these purposes.


Welcome to explore the new repository and documentation. Your feedback and 
contributions are invaluable as we continue to improve Flink CDC.

Thanks everyone for your support and happy exploring Flink CDC!

Best，
Leonard
[1] https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob

Re: [VOTE] FLIP-402: Extend ZooKeeper Curator configurations

2024-03-20 Thread Ferenc Csaky

+1 (non-binding), thanks for driving this!

Best,
Ferenc


On Wednesday, March 20th, 2024 at 10:57, Yang Wang  
wrote:

> 
> 
> +1 (binding) since ZK HA is still widely used.
> 
> 
> Best,
> Yang
> 
> On Thu, Mar 14, 2024 at 6:27 PM Matthias Pohl
> matthias.p...@aiven.io.invalid wrote:
> 
> > Nothing to add from my side. Thanks, Alex.
> > 
> > +1 (binding)
> > 
> > On Thu, Mar 7, 2024 at 4:09 PM Alex Nitavsky alexnitav...@gmail.com
> > wrote:
> > 
> > > Hi everyone,
> > > 
> > > I'd like to start a vote on FLIP-402 [1]. It introduces new configuration
> > > options for Apache Flink's ZooKeeper integration for high availability by
> > > reflecting existing Apache Curator configuration options. It has been
> > > discussed in this thread [2].
> > > 
> > > I would like to start a vote. The vote will be open for at least 72
> > > hours
> > > (until March 10th 18:00 GMT) unless there is an objection or
> > > insufficient votes.
> > > 
> > > [1]
> > 
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-402%3A+Extend+ZooKeeper+Curator+configurations
> > 
> > > [2] https://lists.apache.org/thread/gqgs2jlq6bmg211gqtgdn8q5hp5v9l1z
> > > 
> > > Thanks
> > > Alex

[jira] [Created] (FLINK-34896) Mi

2024-03-20 Thread Sergey Nuyanzin (Jira)

Sergey Nuyanzin created FLINK-34896:
---

 Summary: Mi
 Key: FLINK-34896
 URL: https://issues.apache.org/jira/browse/FLINK-34896
 Project: Flink
  Issue Type: Sub-task
Reporter: Sergey Nuyanzin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [NOTICE] Flink CDC is importing GitHub issues into Apache Jira

2024-03-20 Thread Qingsheng Ren

FYI: The auto-import has completed. Totally 137 issues are migrated from
GitHub issue to Apache Jira. Sorry again for flushing your inbox!

Best,
Qingsheng

On Wed, Mar 20, 2024 at 3:47 PM Qingsheng Ren  wrote:

> Hi devs,
>
> We are in the process of finalizing the donation of Flink CDC as a
> sub-project of Apache Flink. Given that all issues in the Flink project are
> tracked via Jira, we will be migrating open tickets for Flink CDC from
> GitHub issues to Jira.
>
> We plan to trigger the auto-import around 5pm (UTC+8), Mar 20, 2024.
> During this import process, you may receive automated emails regarding the
> creation of Jira tickets. We apologize for any inconvenience and potential
> overflow in your inbox!
>
> We have made efforts to eliminate 500+ outdated and invalid issues.
> However, we estimate that there are still around 100 issues that are
> significant and worth retaining.
>
> Thanks your understanding and support! And special thanks to community
> volunteers helping with issue cleanup: Hang Ruan, Zhongqiang Gong, He Wang,
> Xin Gong, Yanquan Lyu and Jiabao Sun
>
> Best,
> Qingsheng
>

[jira] [Created] (FLINK-34895) Migrate FlinkRewriteSubQueryRule

2024-03-20 Thread Sergey Nuyanzin (Jira)

Sergey Nuyanzin created FLINK-34895:
---

 Summary: Migrate FlinkRewriteSubQueryRule
 Key: FLINK-34895
 URL: https://issues.apache.org/jira/browse/FLINK-34895
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Sergey Nuyanzin
Assignee: Sergey Nuyanzin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-426: Grouping Remote State Access

2024-03-20 Thread Jinzhong Li

Hi Yue,

Thanks for your comments.
I get your point.I think there are two possible ways we can support
multiGet in synchronization models in future:
1) Implement the multiGet interface in the existing Rocksdb State
interface, then integrate this interface to the synchronous Operators layer.
2) Integrate the ForStStateBackend interface to synchronous model/operators.

No matter which approach we will choose in the future, I don't think this
flip's proposal will be a block on extending the use of multiGet on
synchronous models. And although ForStStateBackend is first used for
asynchronous models, it will preserve the extensibility used by the
synchronization model. We don't need to worry too much about this :-)

Best,
Jinzhong

On Wed, Mar 20, 2024 at 4:37 PM yue ma  wrote:

> hi Jinzhong
> Thanks for your reply
> The reason why I mentioned this point is because according to the official
> Rocksdb documentation
> https://rocksdb.org/blog/2022/10/07/asynchronous-io-in-rocksdb.html. if we
> turn on async_io and use multiGet, it can improve the performance of point
> look upc  by 100%.  Moreover, especially in Flink SQL tasks, there are many
> ways to access state through mini batch, so I believe this feature also
> greatly optimizes the synchronous access method and is worth doing.
> If we first support batch access for asynchronous models, I think it would
> be okay. My point is, should we consider whether it can be easily extended
> if we support synchronous models in the future
>
> Jinzhong Li  于2024年3月19日周二 20:59写道：
>
> > Hi Yue,
> >
> > Thanks for your feedback!
> >
> > > 1. Does Grouping Remote State Access only support asynchronous
> > interfaces?
> > >--If it is: IIUC, MultiGet can also greatly improve performance for
> > > synchronous access modes. Do we need to support it ？
> >
> > Yes. If we want to support MultiGet on existing synchronous access mode,
> we
> > have to introduce a grouping component akin to the AEC described in
> > FLIP-425[1].
> > I think such a change would introduce additional complexity to the
> current
> > synchronous model, and the extent of performance gains remains uncertain.
> > Therefore, I recommend only asynchronous interfaces support "Grouping
> > Remote State Access", which is designed to efficiently minimize latency
> in
> > accessing remote state storage.
> >
> > > 2. Can a simple example be added to FLip on how to use Batch to access
> > > states and obtain the results of states on the API?
> >
> > Sure. I have added a code example in the Flip[2]. Note that the multiget
> in
> > this Flip is an internal interface, not a user-facing interface.
> >
> > > 3. I also agree with XiaoRui's viewpoint. Is there a corresponding
> Config
> > > to control the  state access batch strategy?
> >
> > Yes, we would offer some configurable options that allow users to adjust
> > the behavior of batching and grouping state access (eg. batching size,
> > etc.).
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-425%3A+Asynchronous+Execution+Model
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-426%3A+Grouping+Remote+State+Access#FLIP426:GroupingRemoteStateAccess-CodeExampleonHowtoAccessStateUsingBatch
> >
> > Best,
> > Jinzhong Li
> >
> >
> > On Tue, Mar 19, 2024 at 5:52 PM yue ma  wrote:
> >
> > > Hi Jinzhong,
> > >
> > > Thanks for the FLIP.  I have the following questions:
> > >
> > > 1. Does Grouping Remote State Access only support asynchronous
> > interfaces?
> > > --If it is: IIUC, MultiGet can also greatly improve performance for
> > > synchronous access modes. Do we need to support it ？
> > > --If not, how can we distinguish between using Grouping State
> Access
> > in
> > > asynchronous and synchronous modes?
> > > 2.  Can a simple example be added to FLip on how to use Batch to access
> > > states and obtain the results of states on the API?
> > > 3. I also agree with XiaoRui's viewpoint. Is there a corresponding
> Config
> > > to control the  state access batch strategy?
> > >
> > > --
> > > Best,
> > > Yue
> > >
> >
>
>
> --
> Best,
> Yue
>

Re: [DISCUSS] FLIP-428: Fault Tolerance/Rescale Integration for Disaggregated State

2024-03-20 Thread Jinzhong Li

Hi Yue,

Thanks for your feedback!

> 1. If we choose Option-3 for ForSt , how would we handle Manifest File
> ? Should we take a snapshot of the Manifest during the synchronization
phase?

IIUC, the GetLiveFiles() API in Option-3 can also catch the fileInfo of
Manifest files, and this api also return the manifest file size, which
means this api could take snapshot for Manifest FileInfo (filename +
fileSize) during the synchronization phase.
You could refer to the rocksdb source code[1] to verify this.

 > However, many distributed storage systems do not support the
> ability of Fast Duplicate (such as HDFS). But ForSt has the ability to
> directly read and write remote files. Can we not copy or Fast duplicate
> these files, but instand of directly reuse and. reference these remote
> files? I think this can reduce file download time and may be more useful
> for most users who use HDFS (do not support Fast Duplicate)?

Firstly, as far as I know, most remote file systems support the
FastDuplicate, eg. S3 copyObject/Azure Blob Storage copyBlob/OSS
copyObject, and the HDFS indeed does not support FastDuplicate.

Actually，we have considered the design which reuses remote files. And that
is what we want to implement in the coming future, where both checkpoints
and restores can reuse existing files residing on the remote state storage.
However, this design conflicts with the current file management system in
Flink.  At present, remote state files are managed by the ForStDB
(TaskManager side), while checkpoint files are managed by the JobManager,
which is a major hindrance to file reuse. For example, issues could arise
if a TM reuses a checkpoint file that is subsequently deleted by the JM.
Therefore, as mentioned in FLIP-423[2], our roadmap is to first integrate
checkpoint/restore mechanisms with existing framework  at milestone-1.
Then, at milestone-2, we plan to introduce TM State Ownership and Faster
Checkpointing mechanisms, which will allow both checkpointing and restoring
to directly reuse remote files, thus achieving faster checkpointing and
restoring.

[1]
https://github.com/facebook/rocksdb/blob/6ddfa5f06140c8d0726b561e16dc6894138bcfa0/db/db_filesnapshot.cc#L77
[2]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046855#FLIP423:DisaggregatedStateStorageandManagement(UmbrellaFLIP)-RoadMap+LaunchingPlan

Best,
Jinzhong

On Wed, Mar 20, 2024 at 4:01 PM yue ma  wrote:

> Hi Jinzhong
>
> Thank you for initiating this FLIP.
>
> I have just some minor question:
>
> 1. If we choice Option-3 for ForSt , how would we handle Manifest File
> ? Should we take snapshot of the Manifest during the synchronization phase?
> Otherwise, may the Manifest and MetaInfo information be inconsistent during
> recovery?
> 2. For the Restore Operation , we need Fast Duplicate  Checkpoint Files to
> Working Dir . However, many distributed storage systems do not support the
> ability of Fast Duplicate (such as HDFS). But ForSt has the ability to
> directly read and write remote files. Can we not copy or Fast duplicate
> these files, but instand of directly reuse and. reference these remote
> files? I think this can reduce file download time and may be more useful
> for most users who use HDFS (do not support Fast Duplicate)?
>
> --
> Best,
> Yue
>

[jira] [Created] (FLINK-34894) Migrate JoinDependentConditionDerivationRule

2024-03-20 Thread Sergey Nuyanzin (Jira)

Sergey Nuyanzin created FLINK-34894:
---

 Summary: Migrate JoinDependentConditionDerivationRule
 Key: FLINK-34894
 URL: https://issues.apache.org/jira/browse/FLINK-34894
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Sergey Nuyanzin
Assignee: Sergey Nuyanzin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34893) Bump Checkstyle to 9+

2024-03-20 Thread Sergey Nuyanzin (Jira)

Sergey Nuyanzin created FLINK-34893:
---

 Summary: Bump Checkstyle to 9+
 Key: FLINK-34893
 URL: https://issues.apache.org/jira/browse/FLINK-34893
 Project: Flink
  Issue Type: Bug
  Components: Build System
Reporter: Sergey Nuyanzin
Assignee: Sergey Nuyanzin


The issue with current checkstyle is that there is checkstyle IntellijIdea 
plugin

And recently it dropped checkstyle 8 support [1]

At the same time we can not move to Checkstyle 10 since 10.x requires java 11+

[1] https://github.com/jshiell/checkstyle-idea/blob/main/CHANGELOG.md



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[VOTE] FLIP-433: State Access on DataStream API V2

2024-03-20 Thread weijie guo

Hi everyone,


Thanks for all the feedback about the FLIP-433: State Access on
DataStream API V2 [1]. The discussion thread is here [2].


The vote will be open for at least 72 hours unless there is an
objection or insufficient votes.


Best regards,

Weijie


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-433%3A+State+Access+on+DataStream+API+V2

[2] https://lists.apache.org/thread/8ghzqkvt99p4k7hjqxzwhqny7zb7xrwo

Re: [DISCUSS] Manual savepoint triggering in flink-kubernetes-operator

2024-03-20 Thread Yang Wang

Using a separate CR for managing the savepoints is really a good idea.
After then managing savepoints will be easier and we will not leak any
unusable savepoints on the object storage.


Best,
Yang

On Wed, Mar 13, 2024 at 4:40 AM Gyula Fóra  wrote:

> That would be great Mate! If you could draw up a FLIP for this that would
> be nice as this is a rather large change that will have a significant
> impact for existing users.
>
> If possible it would be good to provide some backward compatibility /
> transition period while we preserve the current content of the status so
> it's easy to migrate to the new savepoint CRs.
>
> Cheers,
> Gyula
>
> On Tue, Mar 12, 2024 at 9:22 PM Mate Czagany  wrote:
>
> > Hi,
> >
> > I really like this idea as well, I think it would be a great improvement
> > compared to how manual savepoints currently work, and suits Kubernetes
> > workflows a lot better.
> >
> > If there are no objections, I can investigate it during the next few
> weeks
> > and see how this could be implemented in the current code.
> >
> > Cheers,
> > Mate
> >
> > Gyula Fóra  ezt írta (időpont: 2024. márc. 12., K,
> > 16:01):
> >
> > > That's definitely a good improvement Robert and we should add it at
> some
> > > point. At the point in time when this was implemented we went with the
> > > current simpler / more lightweight approach.
> > > However if anyone is interested in working on this / contributing this
> > > improvement I would personally support it.
> > >
> > > Gyula
> > >
> > > On Tue, Mar 12, 2024 at 3:53 PM Robert Metzger 
> > > wrote:
> > >
> > > > Have you guys considered making savepoints a first class citizen in
> the
> > > > Kubernetes operator?
> > > > E.g. to trigger a savepoint, you create a "FlinkSavepoint" CR, the
> K8s
> > > > operator picks up that resource and tries to create a savepoint
> > > > indefinitely until the savepoint has been successfully created. We
> > report
> > > > the savepoint status and location in the "status" field.
> > > >
> > > > We could even add an (optional) finalizer to delete the physical
> > > savepoint
> > > > from the savepoint storage once the "FlinkSavepoint" CR has been
> > deleted.
> > > > optional: the savepoint spec could contain a field "retain
> > > > physical savepoint" or something, that controls the delete behavior.
> > > >
> > > >
> > > > On Thu, Mar 3, 2022 at 4:02 AM Yang Wang 
> > wrote:
> > > >
> > > > > I agree that we could start with the annotation approach and
> collect
> > > the
> > > > > feedback at the same time.
> > > > >
> > > > > Best,
> > > > > Yang
> > > > >
> > > > > Őrhidi Mátyás  于2022年3月2日周三 20:06写道：
> > > > >
> > > > > > Thank you for your feedback!
> > > > > >
> > > > > > The annotation on the
> > > > > >
> > > > > > @ControllerConfiguration(generationAwareEventProcessing = false)
> > > > > > FlinkDeploymentController
> > > > > >
> > > > > > already enables the event triggering based on metadata changes.
> It
> > > was
> > > > > set
> > > > > > earlier to support some failure scenarios. (It can be used for
> > > example
> > > > to
> > > > > > manually reenable the reconcile loop when it got stuck in an
> error
> > > > phase)
> > > > > >
> > > > > > I will go ahead and propose a PR using annotations then.
> > > > > >
> > > > > > Cheers,
> > > > > > Matyas
> > > > > >
> > > > > > On Wed, Mar 2, 2022 at 12:47 PM Yang Wang  >
> > > > wrote:
> > > > > >
> > > > > > > I also like the annotation approach since it is more natural.
> > > > > > > But I am not sure about whether the meta data change will
> trigger
> > > an
> > > > > > event
> > > > > > > in java-operator-sdk.
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Yang
> > > > > > >
> > > > > > > Gyula Fóra  于2022年3月2日周三 16:29写道：
> > > > > > >
> > > > > > > > Thanks Matyas,
> > > > > > > >
> > > > > > > > From a user perspective I think the annotation is pretty nice
> > and
> > > > > user
> > > > > > > > friendly so I personally prefer that approach.
> > > > > > > >
> > > > > > > > You said:
> > > > > > > >  "It seems, the java-operator-sdk handles the changes of the
> > > > > .metadata
> > > > > > > and
> > > > > > > > .spec fields of custom resources differently."
> > > > > > > >
> > > > > > > > What implications does this have on the above mentioned 2
> > > > approaches?
> > > > > > > Does
> > > > > > > > it make one more difficult than the other?
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > > Gyula
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 1, 2022 at 1:52 PM Őrhidi Mátyás <
> > > > > matyas.orh...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi All!
> > > > > > > > >
> > > > > > > > > I'd like to start a quick discussion about the way we allow
> > > users
> > > > > to
> > > > > > > > > trigger savepoints manually in the operator [FLINK-26181]
> > > > > > > > > . There
> > are
> > > > > > > existing
> > > > > > > > > solutions already for

[jira] [Created] (FLINK-34890) io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource :.java.sql.SQLException: ORA-01291: missing log file

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34890:
--

 Summary: 
io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource 
:.java.sql.SQLException: ORA-01291: missing log file
 Key: FLINK-34890
 URL: https://issues.apache.org/jira/browse/FLINK-34890
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues](https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.14.2

### Flink CDC version

2.4.1

### Database and its version

oracle19c

### Minimal reproduce step

2023-11-03 13:37:44.675 WARN 
[debezium-oracleconnector-oracle_logminer-change-event-source-coordinator] 
io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource : 
Failed to start Oracle LogMiner session, retrying...
2023-11-03 13:37:45.286 
ERROR[debezium-oracleconnector-oracle_logminer-change-event-source-coordinator] 
io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource : 
Failed to start Oracle LogMiner after '5' attempts.

java.sql.SQLException: ORA-01291: missing log file
ORA-06512: at "SYS.DBMS_LOGMNR", line 72
ORA-06512: at line 1

at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:509) 
~[com.oracle.database.jdbc-ojdbc8-19.3.0.0.jar!/:19.3.0.0.0]
at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:461) 
~[com.oracle.database.jdbc-ojdbc8-19.3.0.0.jar!/:19.3.0.0.0]  

 

In addition, I clean the archive logs by executing commands from the rman tool 
command:

Crosscheck archivelog all;
Delete noprompt archivelog until time 'sysdate -1';


Through the above command, the archived logs were retained for one day. During 
the insertion, deletion, and modification of the table within one day, I also 
encountered the issue of losing the logs mentioned above.

May I ask how to solve this problem。

### What did you expect to see?

flink cdc normal run

### What did you see instead?


java.sql.SQLException: ORA-01291: missing log file
ORA-06512: at "SYS.DBMS_LOGMNR", line 72
ORA-06512: at line 1

 

 io.debezium.connector.common.BaseSourceTask : Going to restart connector after 
10 sec. after a retriable exception
2023-11-03 16:09:11.458 INFO [pool-18-thread-1] io.debezium.jdbc.JdbcConnection 
: Connection gracefully closed
2023-11-03 16:09:11.459 INFO [debezium-engine] 
io.debezium.embedded.EmbeddedEngine : Retrieable exception thrown, connector 
will be restarted
org.apache.kafka.connect.errors.RetriableException: An exception occurred in 
the change event producer. This connector will be restarted.
        at 
io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:46) 
~[debezium-core-1.9.7.Final.jar!/:1.9.7.Final]


### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2615
Created by: https://github.com/jasonhewg
Labels: bug,
Created at: Mon Dec 18 19:07:44 CST 2023
State: open



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34888) How to do MySQL to MySQL pipeline

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34888:
--

 Summary: How to do MySQL to MySQL pipeline 
 Key: FLINK-34888
 URL: https://issues.apache.org/jira/browse/FLINK-34888
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


Hi, I just found out demo of MySQL to Doris, But I want to do a whole sync 
database of MySQL to another MySQL. flink-cdc-pipeline-connector-mysql is just 
for source. How about the sink still is MySQL?

Cloud anyone tell me some information to implement.

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3140
Created by: [ysmintor|https://github.com/ysmintor]
Labels: 
Created at: Wed Mar 13 19:04:58 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34889) Flink CDC may occur binlog can't be find when dynamic added table repeatedly

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34889:
--

 Summary: Flink CDC may occur binlog can't be find when dynamic 
added table repeatedly
 Key: FLINK-34889
 URL: https://issues.apache.org/jira/browse/FLINK-34889
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


We met a strange problem of the Flink CDC: When repeatedly adding table to a 
Flink CDC link, it may fails and report a very old gtid can't be find, we 
digging the source code and found the reason is bellow:

1. When CDC full phase change to incremental phase, binlog need pull ending 
offset of all chunk, and it will take the minimum of these offset as the 
stating offset of the incremental phase.Ending offset of each chunk are store 
in the JM.

2. If we added table repeatedly, and each time we need to suspend the job, 
alter the config, and then resume form latest checkpoint.

3. Normally, when finished adding table, we pull the ending offset of each 
chunk. The pull process will transfer a size between the jm and tm, which means 
when there is 100 tables in jm, and we have processed 80, we need process 81 to 
pull the next offset.

4. There has one problem because the order of the split in jm and tm is not the 
same.The jm will order by table name (such as a:0, a:1, b:0, b:1), when added 
table, we need pull the ending offset of the newly added table, while jm order 
the split by the table name, and the newly added table may occurs in middle, so 
we may get a ending offset of a very old split.
https://github.com/apache/flink-cdc/assets/5321584/f8383f59-82d9-4d97-bad7-1aea54c6ac81;>


 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3141
Created by: [zlzhang0122|https://github.com/zlzhang0122]
Labels: 
Created at: Wed Mar 13 21:17:17 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34855) [Improve] When using miniCluster, don't check 'FLINK_HOME'.

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34855:
--

 Summary: [Improve] When using miniCluster, don't check 
'FLINK_HOME'. 
 Key: FLINK-34855
 URL: https://issues.apache.org/jira/browse/FLINK-34855
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

When Flink CDC uses a mini-cluster, pipelines don't need to load Flink JARs, 
but instead use the local environment. So, when using mini cluster, we don't 
need to check 'FLINK_HOME'.

### Solution

_No response_

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2955
Created by: [joyCurry30|https://github.com/joyCurry30]
Labels: enhancement, 
Created at: Tue Jan 02 23:01:19 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34883) Error on Postgres-CDC using incremental snapshot with UUID column as PK

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34883:
--

 Summary: Error on Postgres-CDC using incremental snapshot with 
UUID column as PK
 Key: FLINK-34883
 URL: https://issues.apache.org/jira/browse/FLINK-34883
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


A majority of our Postgres databases use UUIDs as primary keys.
When we enable 'scan.incremental.snapshot.enabled = true', Flink-CDC will try 
to split into chunks.
The splitTableIntoChunks function relies on the queryMinMax function, which 
fails when trying to calculate the MIN(UUID) and MAX(UUID), as that is not 
supported in Postgres.

Is there a way around this?

When we convert our column to VARCHAR, rather than UUID, everything seems to 
work.
We did not find a way to cast our UUIDs to VARCHAR while splitting them into 
chunks without editing the source code or altering the source table.

Disabling incremental snapshots also fixes the issue, as we do not split into 
chunks anymore, but this would mean we get a global read lock on the data 
before snapshot reading, which we want to avoid.

Thanks in advance for the help!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3108
Created by: [olivier-derom|https://github.com/olivier-derom]
Labels: 
Created at: Wed Mar 06 16:55:03 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34871) Oracle add data has error

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34871:
--

 Summary: Oracle add data has error
 Key: FLINK-34871
 URL: https://issues.apache.org/jira/browse/FLINK-34871
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.18.1

### Flink CDC version

3.0.0

### Database and its version

oracle 11g

### Minimal reproduce step

I run the document at 
https://ververica.github.io/flink-cdc-connectors/release-3.0/content/connectors/oracle-cdc.html,when
 i add a data to table,it has some error 
   Properties debeziumProperties = new Properties();
debeziumProperties.setProperty("log.mining.strategy", "online_catalog");
debeziumProperties.setProperty("include.schema.changes", "true");
debeziumProperties.setProperty("value.converter.schemas.enable", 
"true");

OracleSourceBuilder.OracleIncrementalSource oracleChangeEventSource =
new OracleSourceBuilder()
.hostname("hostname")
.port(1521)
.databaseList("ORCL")
.schemaList("FLINKUSER")
.tableList("FLINKUSER.CREATETABLE")
.username("flinkuser")
.password("flinkpw")
.deserializer(new JsonDebeziumDeserializationSchema())
.includeSchemaChanges(true) // output the schema 
changes as well
.startupOptions(StartupOptions.initial())
.debeziumProperties(debeziumProperties)
.splitSize(2)
.build();

StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(3000L);

DataStreamSource oracleChangeEventStream = env.fromSource(
oracleChangeEventSource,
WatermarkStrategy.noWatermarks(),
"OracleParallelSource")
.setParallelism(4)
.setParallelism(1);

oracleChangeEventStream.print(];

![image|https://github.com/ververica/flink-cdc-connectors/assets/43168824/14d5e0e2-daed-4d8a-bce5-1ed83ccd5c88]


### What did you expect to see?

Is it my configuration problem

### What did you see instead?

null error

### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3056
Created by: [ccczhouxin|https://github.com/ccczhouxin]
Labels: bug, 
Created at: Fri Feb 02 10:05:38 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34862) [Bug] oracle-cdc ORA-01371: Complete LogMiner dictionary not found

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34862:
--

 Summary: [Bug] oracle-cdc ORA-01371: Complete LogMiner dictionary 
not found
 Key: FLINK-34862
 URL: https://issues.apache.org/jira/browse/FLINK-34862
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

flink-1.15.2

### Flink CDC version

flink-connector-oracle-cdc-2.4.2

### Database and its version

oracle 12c cdb

### Minimal reproduce step

**Translation: There is no problem running the following code segment 
individually：**
Properties prop = new Properties();
prop.setProperty("log.mining.strategy", "redo_log_catalog");
prop.setProperty("log.mining.continuous.mine", "true");
prop.setProperty("decimal.handling.mode", "string");
prop.setProperty("interval.handling.mode", "string");
prop.setProperty("database.tablename.case.insensitive", "false");
prop.setProperty("snapshot.locking.mode", "none");
prop.setProperty("lob.enabled", "true");
prop.setProperty("database.history.store.only.captured.tables.ddl", 
"true");
prop.setProperty("database.dbname", "ORCL");
prop.setProperty("database.pdb.name", "ORCLPDB");

SourceFunction sourceFunction = OracleSource.builder()
.url("jdbc:oracle:thin:@//192.168.xxx.xxx:1521/ORCL")
.hostname("192.168.xxx.xxx")
.port(1521)
.database("ORCL")
.schemaList("C##DBZUSER")
.tableList("C##DBZUSER.TEST_TABLE_02")
.username("c##dbzuser")
.password("dbz")
.startupOptions(StartupOptions.initial())
.debeziumProperties(prop)
.deserializer(new JsonDebeziumDeserializationSchema()) // 
converts SourceRecord to JSON String
.build();


**Run the following code simultaneously, an error is reported：**
Properties prop = new Properties();
prop.setProperty("log.mining.strategy", "redo_log_catalog");
prop.setProperty("log.mining.continuous.mine", "true");
prop.setProperty("decimal.handling.mode", "string");
prop.setProperty("interval.handling.mode", "string");
prop.setProperty("database.tablename.case.insensitive", "false");
prop.setProperty("snapshot.locking.mode", "none");
prop.setProperty("lob.enabled", "true");
prop.setProperty("database.history.store.only.captured.tables.ddl", 
"true");
prop.setProperty("database.dbname", "ORCL");
prop.setProperty("database.pdb.name", "ORCLPDB");

SourceFunction sourceFunction = OracleSource.builder()
.url("jdbc:oracle:thin:@//192.168.xxx.xxx:1521/ORCL")
.hostname("192.168.xxx.xxx")
.port(1521)
.database("ORCL")
.schemaList("C##DBZUSER")
.tableList("C##DBZUSER.TEST_TABLE_01")
.username("c##dbzuser")
.password("dbz")
.startupOptions(StartupOptions.initial())
.debeziumProperties(prop)
.deserializer(new JsonDebeziumDeserializationSchema()) // 
converts SourceRecord to JSON String
.build(];

error info：
java.sql.SQLException: ORA-01371: Complete LogMiner dictionary not found
ORA-06512: at "SYS.DBMS_LOGMNR", line 58
ORA-06512: at line 1



### What did you expect to see?

able to run multiple Oracle CDC instances simultaneously.

### What did you see instead?


Error occurred when running multiple oracle-cdc instances.

### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2998
Created by: [Toroidals|https://github.com/Toroidals]
Labels: bug, 
Created at: Mon Jan 15 09:56:17 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34869) [Bug][mysql] Remove all previous table and add new added table will throw Exception.

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34869:
--

 Summary: [Bug][mysql] Remove all previous table and add new added 
table will throw Exception.
 Key: FLINK-34869
 URL: https://issues.apache.org/jira/browse/FLINK-34869
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.18

### Flink CDC version

3.0.1

### Database and its version

anyone 

### Minimal reproduce step

1. Stop job in savepoint.
2. Set 'scan.incremental.snapshot.enabled' = 'true' and then set tableList with 
tables which not includes in last time.
3. Then assign status will be chaos.
Take a test case for example:
```java
public class NewlyAddedTableITCase extends MySqlSourceTestBase {
@Test
public void testRemoveAndAddTablesOneByOne() throws Exception {
testRemoveAndAddTablesOneByOne(
1, "address_hangzhou", "address_beijing", "address_shanghai");
}

private void testRemoveAndAddTablesOneByOne(int parallelism, String... 
captureAddressTables)
throws Exception {

MySqlConnection connection = getConnection();
// step 1: create mysql tables with all tables included
initialAddressTables(connection, captureAddressTables);

final TemporaryFolder temporaryFolder = new TemporaryFolder();
temporaryFolder.create();
final String savepointDirectory = 
temporaryFolder.newFolder().toURI().toString();

// get all expected data
List fetchedDataList = new ArrayList<>();

String finishedSavePointPath = null;
// test removing and adding table one by one
for (int round = 0; round < captureAddressTables.length; round++] {
String captureTableThisRound = captureAddressTables[round];
String cityName = captureTableThisRound.split("_")[1];
StreamExecutionEnvironment env =
getStreamExecutionEnvironment(finishedSavePointPath, 
parallelism);
StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);

String createTableStatement =
getCreateTableStatement(new HashMap<>(), 
captureTableThisRound);
tEnv.executeSql(createTableStatement);
tEnv.executeSql(
"CREATE TABLE sink ("
+ " table_name STRING,"
+ " id BIGINT,"
+ " country STRING,"
+ " city STRING,"
+ " detail_address STRING,"
+ " primary key (table_name,id) not enforced"
+ ") WITH ("
+ " 'connector' = 'values',"
+ " 'sink-insert-only' = 'false'"
+ ")");
TableResult tableResult = tEnv.executeSql("insert into sink select 
* from address");
JobClient jobClient = tableResult.getJobClient().get();

// this round's snapshot data
fetchedDataList.addAll(
Arrays.asList(
format(
"+I[%s, 416874195632735147, China, %s, %s 
West Town address 1]",
captureTableThisRound, cityName, cityName),
format(
"+I[%s, 416927583791428523, China, %s, %s 
West Town address 2]",
captureTableThisRound, cityName, cityName),
format(
"+I[%s, 417022095255614379, China, %s, %s 
West Town address 3]",
captureTableThisRound, cityName, 
cityName)));
waitForSinkSize("sink", fetchedDataList.size());
assertEqualsInAnyOrder(fetchedDataList, 
TestValuesTableFactory.getRawResults("sink"));

// only this round table's data is captured.
// step 3: make binlog data for all tables before this round(also 
includes this round)
for (int i = 0; i <= round; i++) {
String tableName = captureAddressTables[i];
makeBinlogForAddressTable(connection, tableName, round);
}
// this round's binlog data
fetchedDataList.addAll(
Arrays.asList(
format(
"-U[%s, 416874195632735147, China, %s, %s 
West Town address 1]",
captureTableThisRound, cityName, cityName),
format(
"+U[%s, 416874195632735147, CHINA_%s, %s, 
%s West Town

[jira] [Created] (FLINK-34832) [Bug] BinlogOffset's comparative method may be cause incremental snapshot scan data lose

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34832:
--

 Summary: [Bug] BinlogOffset's comparative method may be cause 
incremental snapshot scan data lose
 Key: FLINK-34832
 URL: https://issues.apache.org/jira/browse/FLINK-34832
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Flink version

1.14.5

### Flink CDC version

2.4.0

### Database and its version

MariaDB 10.4.13

### Minimal reproduce step

1. create mysql test table
2. create a flink cdc sql task and set the startup mode to initial
3. while writing 1 records to the test table, start the flink task and 
check the number of records sent by the data.

### What did you expect to see?

The number of records is consistent with the mysql test table

### What did you see instead?

data loss or duplication

### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!
![932A614C-53BF-4592-9E6C-D7F87BD6CD6F|https://github.com/ververica/flink-cdc-connectors/assets/39044001/742cd5a2-1312-47e6-9f6b-14e250198c36]


 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2794
Created by: [EchoLee5|https://github.com/EchoLee5]
Labels: bug, 
Created at: Fri Dec 01 19:28:48 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[DISCUSS] FLIP-XXX Apicurio-avro format

2024-03-20 Thread David Radley

Hi,
As per the FLIP process I would like to raise a FLIP, but do not have 
authority, so have created a google doc for the Flip to introduce a new 
Apicurio Avro format. The document is 
https://docs.google.com/document/d/14LWZPVFQ7F9mryJPdKXb4l32n7B0iWYkcOdEd1xTC7w/edit?usp=sharing

I have prototyped a lot of the content to prove that this approach is feasible. 
I look forward to the discussion,
  Kind regards, David.



Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

[jira] [Created] (FLINK-34878) [Feature][Pipeline] Flink CDC pipeline transform supports CASE WHEN

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34878:
--

 Summary: [Feature][Pipeline] Flink CDC pipeline transform supports 
CASE WHEN
 Key: FLINK-34878
 URL: https://issues.apache.org/jira/browse/FLINK-34878
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

To be supplemented.

### Solution

To be supplemented.

### Alternatives

None.

### Anything else?

To be supplemented.

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3079
Created by: [aiwenmo|https://github.com/aiwenmo]
Labels: enhancement, 
Created at: Mon Feb 26 23:47:53 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34874) [MongoDB] Support initial.snapshotting.pipeline related configs in table api

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34874:
--

 Summary: [MongoDB] Support initial.snapshotting.pipeline related 
configs in table api
 Key: FLINK-34874
 URL: https://issues.apache.org/jira/browse/FLINK-34874
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Motivation

MongoDB's startup.mode.copy.existing.pipeline(akka 
initial.snapshotting.pipeline in mongo-cdc] is an array of JSON objects 
describing the pipeline operations to run when copying existing data, see 
[link|https://www.mongodb.com/docs/kafka-connector/current/source-connector/configuration-properties/startup/#std-label-source-configuration-startup].
 This can improve the use of indexes by the copying manager and make copying 
more efficient, which is very important in some user scenarios. Besides, there 
are also some related configs, like startup.mode.copy.existing.queue.size, 
startup.mode.copy.existing.max.threads.
Currently we only support these configs in datastream api, for the convenience 
of users, we should also support them in table api.

### Solution

Support initial.snapshotting.pipeline related configs in table api

### Alternatives

_No response_

### Anything else?

Note that in 2.3.0, we remove these configs from table api when support 
incremental snapshot mode for MongoDB in this 
[commit|https://github.com/ververica/flink-cdc-connectors/commit/301e5a8ab08f7b6c4414c0a81561b9a1bf7fab19],
 since in incremental snapshot mode, the semantic is inconsistent when uses the 
pipeline operations. The reason is that in snapshot phase of incremental 
snapshot mode, the oplog will be played back after each snapshot to compensate 
for changes, but the pipeline operations in copy.existing.pipeline are not 
applied to the playback oplog, which means the semantic of this config is 
inconsistent.
But in legacy debezium mode, the behaviour is correct, so we add these configs 
back in debezium mode for better forward compatibility. And notify user not to 
use them in incremental snapshot mode due to above reason.

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3069
Created by: [herunkang2018|https://github.com/herunkang2018]
Labels: enhancement, 
Assignee: [herunkang2018|https://github.com/herunkang2018]
Created at: Tue Feb 20 10:28:11 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34866) [Bug] Meet NoClassDefFoundError when add new table in mysql cdc source

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34866:
--

 Summary: [Bug] Meet NoClassDefFoundError when add new table in 
mysql cdc source
 Key: FLINK-34866
 URL: https://issues.apache.org/jira/browse/FLINK-34866
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.18

### Flink CDC version

3.0.1

### Database and its version

source: mysql: 5.7
sink: file system

### Minimal reproduce step

1. create mysql cdc flink job
2. add new table in mysql source

 meet exception:

java.lang.RuntimeException: One or more fetchers have encountered exception
at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:263)
at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:185)
at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:147)
at 
org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:419)
at 
org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:562)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:932)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NoClassDefFoundError: 
com/ververica/cdc/common/utils/StringUtils
at 
com.ververica.cdc.connectors.mysql.source.utils.RecordUtils.isTableChangeRecord(RecordUtils.java:395)
at 
com.ververica.cdc.connectors.mysql.debezium.reader.BinlogSplitReader.shouldEmit(BinlogSplitReader.java:258)
at 
com.ververica.cdc.connectors.mysql.debezium.reader.BinlogSplitReader.pollSplitRecords(BinlogSplitReader.java:165)
at 
com.ververica.cdc.connectors.mysql.source.reader.MySqlSplitReader.pollSplitRecords(MySqlSplitReader.java:125)
at 
com.ververica.cdc.connectors.mysql.source.reader.MySqlSplitReader.fetch(MySqlSplitReader.java:87)
at 
org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58)
at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:165)
at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:117)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624]
... 1 more



### What did you expect to see?

should not capture exception when add new table on mysql cdc source

### What did you see instead?

meet java.lang.NoClassDefFoundError Exception and job failover

### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3035
Created by: [pengmide|https://github.com/pengmide]
Labels: bug, 
Created at: Fri Jan 26 09:20:19 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34825) [flink-cdc-pipeline-connectors] Add Implementation of DataSource in MongoDB

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34825:
--

 Summary: [flink-cdc-pipeline-connectors] Add Implementation of 
DataSource in MongoDB
 Key: FLINK-34825
 URL: https://issues.apache.org/jira/browse/FLINK-34825
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

After https://github.com/ververica/flink-cdc-connectors/pull/2638 merged, we 
can try to add implementation in MongoDB to simulate enterprise scenarios.

You may need to wait for 
https://github.com/ververica/flink-cdc-connectors/issues/2642 
https://github.com/ververica/flink-cdc-connectors/issues/2644 completed. 

### Solution

_No response_

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2648
Created by: [lvyanquan|https://github.com/lvyanquan]
Labels: enhancement, task, 【3.0】, 
Assignee: [Jiabao-Sun|https://github.com/Jiabao-Sun]
Created at: Tue Nov 07 11:18:18 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34828) sql server Limited date synchronization

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34828:
--

 Summary: sql server Limited date synchronization
 Key: FLINK-34828
 URL: https://issues.apache.org/jira/browse/FLINK-34828
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Flink version

1.16

### Flink CDC version

2.4.1

### Database and its version

sqlserver2014

### Minimal reproduce step

Support for functions  SettlementTime>=CURRENT_DATE - INTERVAL '90' DAY;
Cannot start from function definition range

### What did you expect to see?

I hope to start from the function definition range, rather than starting from 
all data starting positions

### What did you see instead?

I hope to synchronize the data from the last three months, but the program 
still starts from 2022. Although the data has not been synchronized, there has 
been continuous detection
![image|https://github.com/ververica/flink-cdc-connectors/assets/8586973/8488159c-a95b-4528-9ff9-1438afac46d7]
The program execution will not have any problems, but it will cause the first 
synchronization to be very slow

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2696
Created by: [ysq5202121|https://github.com/ysq5202121]
Labels: bug, 
Created at: Wed Nov 15 10:05:40 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34850) [test] Support to execute the (mysql) test cases with custom database deployment

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34850:
--

 Summary: [test] Support to execute the (mysql) test cases with 
custom database deployment
 Key: FLINK-34850
 URL: https://issues.apache.org/jira/browse/FLINK-34850
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

Our IT workflow are  built on GitHub now, but in some cases I need to execute 
some of test cases locally with local deployment but not docker container. I 
think it will be better to add some configuration to support locally execution 
of test cases so that we can reuse existing test cases easily.

### Solution

_No response_

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2922
Created by: [whhe|https://github.com/whhe]
Labels: enhancement, 
Created at: Mon Dec 25 14:55:38 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34856) [Bug] timeout when Invocating Remote RPC.

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34856:
--

 Summary: [Bug] timeout when Invocating Remote RPC.
 Key: FLINK-34856
 URL: https://issues.apache.org/jira/browse/FLINK-34856
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.18.0

### Flink CDC version

3.0

### Database and its version

Mysql 8.0

### Minimal reproduce step

```
source:
  type: mysql
  hostname: localhost
  port: 3306
  username: root
  password: 123456
  tables: inventory.\.*
  server-id: 5400-5404
  server-time-zone: Asia/Shanghai

sink:
  type: values

pipeline:
  name: Sync MySQL Database to Values
  parallelism: 2
```

### What did you expect to see?

running correctly.

### What did you see instead?

```
2024-01-03 11:20:14
java.lang.IllegalStateException: Failed to send request to coordinator: 
com.ververica.cdc.runtime.operators.schema.event.SchemaChangeRequest@f75b9e4
at 
com.ververica.cdc.runtime.operators.schema.SchemaOperator.sendRequestToCoordinator(SchemaOperator.java:126)
at 
com.ververica.cdc.runtime.operators.schema.SchemaOperator.requestSchemaChange(SchemaOperator.java:110)
at 
com.ververica.cdc.runtime.operators.schema.SchemaOperator.handleSchemaChangeEvent(SchemaOperator.java:95)
at 
com.ververica.cdc.runtime.operators.schema.SchemaOperator.processElement(SchemaOperator.java:85)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:50)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
at 
org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask$AsyncDataOutputToOutput.emitRecord(SourceOperatorStreamTask.java:309)
at 
org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:110)
at 
org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:101)
at 
com.ververica.cdc.connectors.mysql.source.reader.MySqlPipelineRecordEmitter.sendCreateTableEvent(MySqlPipelineRecordEmitter.java:125)
at 
com.ververica.cdc.connectors.mysql.source.reader.MySqlPipelineRecordEmitter.processElement(MySqlPipelineRecordEmitter.java:109)
at 
com.ververica.cdc.connectors.mysql.source.reader.MySqlRecordEmitter.emitRecord(MySqlRecordEmitter.java:82)
at 
com.ververica.cdc.connectors.mysql.source.reader.MySqlRecordEmitter.emitRecord(MySqlRecordEmitter.java:55)
at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:160)
at 
org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:419)
at 
org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:562)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:932)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.ExecutionException: 
java.util.concurrent.TimeoutException: Invocation of 
[RemoteRpcInvocation(JobMasterOperatorEventGateway.sendRequestToCoordinator(OperatorID,
 SerializedValue)]] at recipient 
[pekko.tcp://flink@localhost:6123/user/rpc/jobmanager_11] timed out. This is 
usually caused by: 1) Pekko failed sending the message silently, due to 
problems like oversized payload or serialization failures. In that case, you 
should find detailed error information in the logs. 2) The recipient needs more 
time for responding, due to problems like slow machines or network jitters. In 
that case, you can try to increase pekko.ask.timeout.
at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
at

[jira] [Created] (FLINK-34860) [Bug] Jackson version conflicts among MySQL/TiDB/MongoDB connectors

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34860:
--

 Summary: [Bug] Jackson version conflicts among MySQL/TiDB/MongoDB 
connectors
 Key: FLINK-34860
 URL: https://issues.apache.org/jira/browse/FLINK-34860
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.14

### Flink CDC version

3.0.0

### Database and its version

MySQL 5.7

### Minimal reproduce step

Put `flink-sql-connector-tidb-cdc-3.0.0.jar`, 
`flink-sql-connector-mysql-cdc-3.0.0.jar` and 
`flink-sql-connector-mongodb-cdc-3.0.0.jar` into lib folder at Flink home 
(Flink 1.14 in my case], then create a MySQL CDC table and submit a simple 
query like `select * from mysql_cdc_table`, there would be an error:

`java.io.InvalidClassException: 
com.ververica.cdc.connectors.shaded.com.fasterxml.jackson.databind.cfg.MapperConfig;
 incompatible types for field _mapperFeatures`

It's caused by inconsistent jackson versions of the CDC connectors:

- `flink-sql-connector-mongodb-cdc-3.0.0.jar`: 2.10.2
- `flink-sql-connector-mysql-cdc-3.0.0.jar`: 2.13.2
- `flink-sql-connector-tidb-cdc-3.0.0.jar`: 2.12.3

### What did you expect to see?

All CDC connectors use the same jackson version.

### What did you see instead?

Inconsistent jackson versions among the CDC connectors which conflicts with 
each other.

### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2983
Created by: [link3280|https://github.com/link3280]
Labels: bug, 
Assignee: [link3280|https://github.com/link3280]
Created at: Wed Jan 10 10:28:58 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34861) [Bug] Inconsistent Kafka shading among cdc connectors

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34861:
--

 Summary: [Bug] Inconsistent Kafka shading among cdc connectors
 Key: FLINK-34861
 URL: https://issues.apache.org/jira/browse/FLINK-34861
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.14.3

### Flink CDC version

3.0.0

### Database and its version

MySQL 5.7
Mongo 4.4

### Minimal reproduce step

Put flink-sql-connector-tidb-cdc-3.0.0.jar, 
flink-sql-connector-mysql-cdc-3.0.0.jar and 
flink-sql-connector-mongodb-cdc-3.0.0.jar into lib folder at Flink home (Flink 
1.14 in my case], then create a MySQL CDC table and submit a simple query like 
select * from mysql_cdc_table, there would be an error:

`java.lang.NoClassDefFoundError: org/apache/kafka/connect/source/SourceRecord`

It's caused by inconsistent shading of  Kafka classes. Only MySQL CDC connector 
relocated Kafka classes, so if the classpath contains un-relocated classes from 
other CDC connectors the above error occurs.

### What did you expect to see?

The cdc connectors would not have class conflicts.

### What did you see instead?

The cdc connectors have class conflicts.

### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2984
Created by: [link3280|https://github.com/link3280]
Labels: bug, 
Created at: Wed Jan 10 11:27:16 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34849) Flink CDC3.0 Demo Mysql binlog to Starrocks Exception amount of AddColumnEvent is already existed[Bug]

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34849:
--

 Summary: Flink CDC3.0 Demo Mysql binlog to Starrocks Exception 
amount of AddColumnEvent is already existed[Bug] 
 Key: FLINK-34849
 URL: https://issues.apache.org/jira/browse/FLINK-34849
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.18

### Flink CDC version

3.0

### Database and its version

Starrocks  3.1.6

### Minimal reproduce step

I Can run the Doris Demo (mysq to doris)，But run Starrocks Demo Exception; For 
the Step:
1.Mysql Sync  to Starrocks database and table pass.
2. INSERT INTO app_db.orders (id, price) VALUES (3, 100.00); pass.
3. ALTER TABLE app_db.orders ADD amount varchar(100) NULL;  Flink CDC Exception 
:2023-12-21 11:40:16
java.lang.IllegalStateException: Failed to send request to coordinator: 
com.ververica.cdc.runtime.operators.schema.event.SchemaChangeRequest@ecbe495a 
...Caused by: java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: amount of AddColumnEvent is already existed.
But Starrocks table: order.amount is added; 


### What did you expect to see?

2023-12-21 11:40:16
java.lang.IllegalStateException: Failed to send request to coordinator: 
com.ververica.cdc.runtime.operators.schema.event.SchemaChangeRequest@ecbe495a
at 
com.ververica.cdc.runtime.operators.schema.SchemaOperator.sendRequestToCoordinator(SchemaOperator.java:126)
at 
com.ververica.cdc.runtime.operators.schema.SchemaOperator.requestSchemaChange(SchemaOperator.java:110)
at 
com.ververica.cdc.runtime.operators.schema.SchemaOperator.handleSchemaChangeEvent(SchemaOperator.java:95)
at 
com.ververica.cdc.runtime.operators.schema.SchemaOperator.processElement(SchemaOperator.java:85)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:50)
at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
at 
org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask$AsyncDataOutputToOutput.emitRecord(SourceOperatorStreamTask.java:309)
at 
org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:110)
at 
org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:101)
at 
com.ververica.cdc.connectors.mysql.source.reader.MySqlRecordEmitter$OutputCollector.collect(MySqlRecordEmitter.java:150)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at 
com.ververica.cdc.debezium.event.DebeziumEventDeserializationSchema.deserialize(DebeziumEventDeserializationSchema.java:92)
at 
com.ververica.cdc.connectors.mysql.source.reader.MySqlRecordEmitter.emitElement(MySqlRecordEmitter.java:128)
at 
com.ververica.cdc.connectors.mysql.source.reader.MySqlRecordEmitter.processElement(MySqlRecordEmitter.java:105)
at 
com.ververica.cdc.connectors.mysql.source.reader.MySqlPipelineRecordEmitter.processElement(MySqlPipelineRecordEmitter.java:119)
at 
com.ververica.cdc.connectors.mysql.source.reader.MySqlRecordEmitter.emitRecord(MySqlRecordEmitter.java:82)
at 
com.ververica.cdc.connectors.mysql.source.reader.MySqlRecordEmitter.emitRecord(MySqlRecordEmitter.java:55)
at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:160)
at 
org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:419)
at 
org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:562)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:932)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.lang.Thread.run(Thread.java:750)

[jira] [Created] (FLINK-34875) [Feature] CDC Action supports update and delete record (logical deletion)

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34875:
--

 Summary: [Feature] CDC Action supports update and delete record 
(logical deletion) 
 Key: FLINK-34875
 URL: https://issues.apache.org/jira/browse/FLINK-34875
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Motivation
There is a very common scenario in government departments (such as health 
insurance data reporting]. When data is reported from the city to the province 
and from the province to the country, deleted and modified data must be 
retained to prevent data from being deleted at the next level.



### Solution

_No response_

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3075
Created by: [melin|https://github.com/melin]
Labels: enhancement, 
Created at: Mon Feb 26 13:01:57 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34844) [3.1][pipeline-connectors] Add pipeline Sink for Pulsar.

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34844:
--

 Summary: [3.1][pipeline-connectors] Add pipeline Sink for Pulsar.
 Key: FLINK-34844
 URL: https://issues.apache.org/jira/browse/FLINK-34844
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

Add pipeline sink  support for [pulsar|https://github.com/apache/pulsar]

### Solution

_No response_

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2879
Created by: [lvyanquan|https://github.com/lvyanquan]
Labels: enhancement, 
Created at: Mon Dec 18 10:49:25 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34873) [Bug] After starting Streaming ELT from MySQL to StarRocks using Flink CDC 3.0, the newly created tables are not being synchronized.

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34873:
--

 Summary: [Bug] After starting Streaming ELT from MySQL to 
StarRocks using Flink CDC 3.0, the newly created tables are not being 
synchronized.
 Key: FLINK-34873
 URL: https://issues.apache.org/jira/browse/FLINK-34873
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

image flink:scala_2.12-java8， consistent with the doc

### Flink CDC version

Both 3.0.0 and 3.0.1 have the same issue

### Database and its version

debezium/example-mysql:1.1，registry.starrocks.io/starrocks/allin1-ubuntu 
consistent with the doc

### Minimal reproduce step

Upon following the steps in the documentation, after completing the 'Submit job 
using FlinkCDC cli' step, I created a new table orders_1  and inserted columns 
into it
Here is the schema for the orders_1 table I created:
CREATE TABLE `orders_1` (
`id` INT NOT NULL,
`price` DECIMAL(10,2) NOT NULL,
PRIMARY KEY (`id`)
];

### What did you expect to see?

Table orderrs_1 should be synchronized to starrocks

### What did you see instead?

A few days ago, while testing and inserting columns into 'orders_1', an error 
stating 'schema orders_1 does not exist' occurred. Now, I am unable to 
reproduce this error. However, the issue of CDC showing no response to the new 
table 'orders_1' persists.

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3068
Created by: [tatianguiqu|https://github.com/tatianguiqu]
Labels: bug, 
Created at: Mon Feb 19 18:59:33 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34809) [mysql] Add notifications to Slack when the Snapshot phase ends or Binlog stream phase begins

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34809:
--

 Summary: [mysql] Add notifications to Slack when the Snapshot 
phase ends or Binlog stream phase begins
 Key: FLINK-34809
 URL: https://issues.apache.org/jira/browse/FLINK-34809
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

On our team, we use Flink CDC to perform MySql CDC.

Since [there is no Snapshot Only mode for MySql 
yet|https://github.com/ververica/flink-cdc-connectors/issues/1687], we had a 
need to be notified when a snapshot is completed and when the binlog stream is 
started.

To accomplish this, we **implemented a notification when a snapshot ends and 
when a binlog stream starts with  GTIDs.**

--- 

Here's the team's use case in more detail 
1. We set parallelism to 2 or more for large tables.
2. And we send change event log to kafka to use Debezium's JDBC Sink Connector, 
which supports [Schema 
Evolution|https://debezium.io/documentation/reference/stable/connectors/jdbc.html#jdbc-schema-evolution).
3. Sinking to Kafka is slower than MySqlSource operator, so we give more 
paralleisms to sink operator more parallelism than MySqlSource.
4. In this case, the transfer is done in rebalance mode from source operator to 
sink operator, so the order for the same PK is not guaranteed when transferring 
binlogs.
5. So we restart the job based on GTIDs with parallelism equal to 1 at the end 
of the snapshot phase .

To do this, we needed (1) to be notified that the snapshot ended and the binlog 
stream started, and (2) to know from which GTIDs the binlog stream started.

---

This is portion of our code. We **assumed that we only capture 1 table per 
flink cdc job.**
```scala
  def getMySQLSourceOperator(]: MySqlSource[String] = {
MySqlSource.builder[String|)
  .hostname(mySqlConfig.host)
  .port(mySqlConfig.port)
  .serverTimeZone(mySqlConfig.timeZone)
  .databaseList(mySqlConfig.database)
  .tableList(mySqlConfig.table)
  .username(mySqlConfig.user)
  .serverId(mySqlConfig.serverIdRange)
  .password(mySqlConfig.password)
  .startupOptions(mySqlConfig.startupMode)
  .fetchSize(mySqlConfig.fetchSize)
  .splitSize(mySqlConfig.splitSize)
  .chunkKeyColumn(new ObjectPath(mySqlConfig.database, mySqlConfig.table), 
mySqlConfig.chunkKeyColumn)
  .connectionPoolSize(mySqlConfig.poolSize)
  .scanNewlyAddedTableEnabled(false)
  .includeSchemaChanges(false)
  .debeziumProperties(mySqlConfig.dbzProps)
  .closeIdleReaders(true)
  .notifySnapshotToBinlogSwitch("slack-hook-url") // here what we 
implemented
  .deserializer(new JsonDebeziumDeserializationSchema(true, 
mySqlConfig.jsonConverterProps))
  .build()
  }
```

--- 

We can't share a real picture of the notification, because our company 
recommends using in-house tools rather than Slack(沈沈沈] and has some security 
policy.

But it looks something like the format below!
- Snapshot finished notifiaction.
```
[SNAPSHOT FINISHED]
 Database: test_database
 Table: test_table
```
- Binlog stream start notification.
```
[BINLOG STREAM START]
 Database: test_database
 Table: test_table
 GTIDs: 
3bda59bb-2fc8-11eb-855f-fa163e2550e3:1-128377129,3c3a6a1b-c931-11ed-b0db-b4055dec129e:1-273703345,901d637c-8add-11eb-8e3f-b4055d3355a6:1-3641352422,b46b8251-5254-11ed-a648-d0946637df48:1-1069556331,db449f07-c53e-11e8-b8c2-d094663d3d1d:1-3543229125,e8d62f95-c77a-11e8-9270-d0946637df48:1-5715021533
```


### Solution

Our implementation and PR is here. 
https://github.com/ververica/flink-cdc-connectors/pull/2453

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2454
Created by: [SML0127|https://github.com/SML0127]
Labels: enhancement, 
Created at: Sat Sep 02 14:34:50 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34812) [Bug] During the snapshot phase, projection is not being performed according to the user defined schema.

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34812:
--

 Summary: [Bug] During the snapshot phase, projection is not being 
performed according to the user defined schema.
 Key: FLINK-34812
 URL: https://issues.apache.org/jira/browse/FLINK-34812
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.17.0

### Flink CDC version

<=master

### Database and its version

such as mysql,oracle,pg,sqlserver,take oracle 11g as an example.

### Minimal reproduce step

Assuming the ABC table contains three fields, A, B, and C.

**ddl:**
CREATE TABLE ABC (
 A BIGINT NOT NULL,
 B STRING,
 PRIMARY KEY(A) NOT ENFORCED
 ) WITH (
 'connector' = 'oracle-cdc',
 'hostname' = '192.168.xxx.xxx',
 'port' = '1521',
 'username' = 'xxx',
 'password' = 'xxx',
 'database-name' = 'xxx',
 'schema-name' = 'xxx',
 'table-name' = 'ABC',
 'debezium.log.mining.strategy' = 'online_catalog',
 'debezium.log.mining.continuous.mine' = 'true',
 'debezium.database.tablename.case.insensitive' = 'false'];

**dml:**
select * from ABC;

### What did you expect to see?

**should be:**
![查部分|https://github.com/ververica/flink-cdc-connectors/assets/57552918/b8a302e8-63dd-42ca-b2f2-67e4bd356bfa]
Projection is performed according to the schema definition, and it takes 2.534 
seconds.

### What did you see instead?

**snapshot phase:**
![查全部|https://github.com/ververica/flink-cdc-connectors/assets/57552918/6504278f-4a00-43ba-8818-558a732c3f82]
The screenshot indicates that an `*` was used as the query condition, and 
projection was not performed as defined by the schema. And it took 3.532 
seconds to execute this query.

### Anything else?

When dealing with a large amount of data and numerous partitions, this time can 
become quite significant.

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2470
Created by: [hzjhjjyy|https://github.com/hzjhjjyy]
Labels: bug, 
Created at: Fri Sep 08 09:43:07 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34830) [3.1][cdc-composer] verify the options of Context for Factory.

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34830:
--

 Summary: [3.1][cdc-composer] verify the options of Context for 
Factory.
 Key: FLINK-34830
 URL: https://issues.apache.org/jira/browse/FLINK-34830
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

[Factory 
interface|https://github.com/ververica/flink-cdc-connectors/blob/a26607a02767fcaba5eb3524fd0257973caac876/flink-cdc-common/src/main/java/com/ververica/cdc/common/factories/Factory.java#L54C1-L54C1]
 defines a set of ConfigOption in requiredOptions and optionalOptions method, 
we need to check them When using Factory.

We can do this before calling createDataSource and createDataSink method, you 
can refer to 
[FactoryUtil|https://github.com/apache/flink/blob/6c429c5450a003d6521693116e0fbb2dab543d6e/flink-table/flink-table-common/src/main/java/org/apache/flink/table/factories/FactoryUtil.java#L1000C25-L1000C25].

### Solution

_No response_

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2752
Created by: [lvyanquan|https://github.com/lvyanquan]
Labels: enhancement, 
Created at: Sat Nov 25 17:42:41 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34886) Support Sybase database

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34886:
--

 Summary: Support Sybase database
 Key: FLINK-34886
 URL: https://issues.apache.org/jira/browse/FLINK-34886
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


Support Sybase database

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3133
Created by: [melin|https://github.com/melin]
Labels: 
Created at: Wed Mar 13 15:19:25 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34868) [Bug] oracle-cdc cannot read oracle multitenant pdb binlog: ORA-00942: table or view does not exist (LOG_MINING_FLUSH)

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34868:
--

 Summary: [Bug] oracle-cdc cannot read oracle multitenant pdb 
binlog: ORA-00942: table or view does not exist (LOG_MINING_FLUSH)
 Key: FLINK-34868
 URL: https://issues.apache.org/jira/browse/FLINK-34868
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.15.3

### Flink CDC version

2.4.0

### Database and its version

oracle 19c

### Minimal reproduce step

1. setup oracle env according to oracle cdc cdb-database doc: 
https://ververica.github.io/flink-cdc-connectors/release-2.4/content/connectors/oracle-cdc.html
2. using oracle cdc to read oracle log and print the data

### What did you expect to see?

when i insert one record to my oracle table, i expect to see the line printed 
in flink log.
When i use oracle-cdc 2.3, it works fine.

### What did you see instead?

error message as below:
java.lang.RuntimeException: One or more fetchers have encountered exception
at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:225)
at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:169)
at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:130)
at 
org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:385)
at 
org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:519)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:807)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:756)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: SplitFetcher thread 0 received 
unexpected exception while polling the records
at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:150)
at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:105)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: 
com.ververica.cdc.connectors.shaded.org.apache.kafka.connect.errors.ConnectException:
 An exception occurred in the change event producer. This connector will be 
stopped.
at 
io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:50)
at 
io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:261)
at 
com.ververica.cdc.connectors.oracle.source.reader.fetch.OracleStreamFetchTask$RedoLogSplitReadTask.execute(OracleStreamFetchTask.java:134)
at 
com.ververica.cdc.connectors.oracle.source.reader.fetch.OracleStreamFetchTask.execute(OracleStreamFetchTask.java:72)
at 
com.ververica.cdc.connectors.base.source.reader.external.IncrementalSourceStreamFetcher.lambda$submitTask$0(IncrementalSourceStreamFetcher.java:89)
... 5 more
Caused by: io.debezium.DebeziumException: Failed to flush Oracle LogWriter 
(LGWR) buffers to disk
at 
io.debezium.connector.oracle.logminer.logwriter.CommitLogWriterFlushStrategy.flush(CommitLogWriterFlushStrategy.java:89)
at 
io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:208)
... 8 more
Caused by: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not 
exist

at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:509)
at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:461)
at

[jira] [Created] (FLINK-34829) When will opengauss or gaussdb be supported?

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34829:
--

 Summary: When will opengauss or gaussdb be supported?
 Key: FLINK-34829
 URL: https://issues.apache.org/jira/browse/FLINK-34829
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

Opengauss and Huawei Cloud's gaussdb have been widely used in the financial 
industry in mainland China, and I hope that flink-cdc-connectors will have 
plans to support opengauss and gaussdb.

### Solution

I see a two year old case here, not sure if it can work. Reference: 
https://github.com/dafei1288/flink-connector-opengauss

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2697
Created by: [wangbinhang|https://github.com/wangbinhang]
Labels: enhancement, 
Created at: Wed Nov 15 11:13:08 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34843) [Bug] BinlogSplitReader#pollSplitRecords return finishedSplit when exception occurs

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34843:
--

 Summary: [Bug] BinlogSplitReader#pollSplitRecords return 
finishedSplit when exception occurs
 Key: FLINK-34843
 URL: https://issues.apache.org/jira/browse/FLINK-34843
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


## Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


## Flink version

1.18

## Flink CDC version

3.0

## Database and its version

any

## Minimal reproduce step

### Current Code
In current BinlogSplitReader#pollSplitRecords, when the currentTaskRunning = 
false, will return null, which is seen as fininished split.
See:
```java
//com.ververica.cdc.connectors.mysql.source.reader.MySqlSplitReader#pollSplitRecords
return dataIt == null ? finishedSplit() : forRecords(dataIt);
```

### Problem occurs:
However, the currentTaskRunning = false in four situations:
1. the bounded stream split is finished( later in issue: 
https://github.com/ververica/flink-cdc-connectors/issues/2867)
2. the stream split is paused for new scanly tables(See 
MySqlSplitReader#suspendBinlogReaderIfNeed)
3. some exception occurs(See executorService.submit)
4. The BinlogSplitReader#close
Only in the former two situations, the spilt is finished, otherwise problem 
will occor.

For example, there is an unbounded stream split:
https://github.com/ververica/flink-cdc-connectors/assets/125648852/ca9dd77c-d111-47c2-afb6-fc13232339a5;>
* t1, add this unbouned stream split and start a new thread to fetch binlog.
* t2, BinlogSplitReader#pollSplitRecords check there is no Exception at first.
* t3,  some excetpion occurs in binlogSplitTask(network, data error, and more), 
set currentTaskRunning = false.
* t4, BinlogSplitReader#pollSplitRecords check currentTaskRunning is fasle, so 
return null, which is seen as fininished split. Then MysqlSourceReader move to 
next split.

Thus, when the task is not running, we also need to distinguish whether the 
split is finished more carefully. I have two idea:
1. add lock(not a good choice]
2. when the stream split is paused, we also add an END watermark to queue. Only 
when get an END watermark, BinlogSplitReader#pollSplitRecords return null, 
otherwise return empty collections.

## What did you expect to see?

 Only when get an END watermark, BinlogSplitReader#pollSplitRecords return 
null, otherwise return empty collections.

## What did you see instead?

[Bug] BinlogSplitReader#pollSplitRecords return finishedSplit(null) when 
exception occurs

## Anything else?

_No response_

## Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2878
Created by: [loserwang1024|https://github.com/loserwang1024]
Labels: bug, 
Created at: Mon Dec 18 09:48:18 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34811) [Bug] [sqlserver cdc] the field "snapshot" is always "false"

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34811:
--

 Summary: [Bug]  [sqlserver cdc] the field "snapshot" is always 
"false" 
 Key: FLINK-34811
 URL: https://issues.apache.org/jira/browse/FLINK-34811
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.13.6

### Flink CDC version

2.5-SNAPSHOT

### Database and its version

sqlserver 2016

### Minimal reproduce step

execute demo like
`SqlServerSourceBuilder.SqlServerIncrementalSource 
sqlServerSource = new SqlServerSourceBuilder()
.hostname("xxx")
.port(1433)
.databaseList("xxx") // monitor sqlserver database
.tableList("dbo.xxx") // monitor products table
.username("xxx")
.password("xxx")
.deserializer(new JsonDebeziumDeserializationSchema()) // 
converts SourceRecord to JSON String
.startupOptions(StartupOptions.initial())
.build();
DataStream originDataStream = 
env.fromSource(sqlServerSource,WatermarkStrategy.noWatermarks(),"cdc");
originDataStream.print();
env.execute("Print SqlServer Snapshot + Change Stream"];`



### What did you expect to see?

I can distinguish whether the data printed is snapshot or incremental by the 
'snapshot' field

### What did you see instead?

the field "snapshot" is always "false" 

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2465
Created by: [edmond-kk|https://github.com/edmond-kk]
Labels: bug, 
Created at: Wed Sep 06 17:31:20 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34835) [cdc-common] Add default value of pipeline parallelism.

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34835:
--

 Summary: [cdc-common] Add default value of pipeline parallelism.
 Key: FLINK-34835
 URL: https://issues.apache.org/jira/browse/FLINK-34835
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

Our current 'parallelism' parameter doesn't have a default value. In cases 
where users do not provide input, it leads to a npe. Additionally, I believe 
users may not necessarily need to concern pipelines with parallelism. In the 
absence of a specified value, the default behavior should be to execute tasks 
with a single concurrency.

### Solution

Add default value for PipelineOptions#PIPELINE_PARALLELISM

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2855
Created by: [joyCurry30|https://github.com/joyCurry30]
Labels: enhancement, 
Created at: Tue Dec 12 15:08:57 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34822) [Bug] oracle clob

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34822:
--

 Summary: [Bug] oracle clob
 Key: FLINK-34822
 URL: https://issues.apache.org/jira/browse/FLINK-34822
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Flink version

1.12

### Flink CDC version

2.1.0

### Database and its version

oracle11

### Minimal reproduce step

In Oracle with Flink CDC, one of the fields is a CLOB. When modifying a data 
record, as long as the CLOB field remains unchanged, Flink CDC, during the 
incremental phase, will not include the CLOB field data in the received Oracle 
data updates. All other fields will be included. Only when modifying the CLOB 
field data will Flink CDC transmit it. Is this a bug? Why isn’t the CLOB field 
transmitted?

### What did you expect to see?

flinkcdc will transfer clob field even oralce not update this field

### What did you see instead?

1

### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2573
Created by: [liquidmavis|https://github.com/liquidmavis]
Labels: bug, 
Created at: Mon Oct 23 10:40:22 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34806) [Feature][Postgres] Support automatically identify newly added tables

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34806:
--

 Summary: [Feature][Postgres] Support automatically identify newly 
added tables
 Key: FLINK-34806
 URL: https://issues.apache.org/jira/browse/FLINK-34806
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

When we start a job with a regular expression, if a new table name matches the 
regular expression and is created, the event emitted by the table will not be 
captured and IllegalArgumentException will be thrown, causing the entire flash 
job to fail over.

### Solution

When instantiating EventDispatcher, InconsistenceSchemaHandler is passed in, 
and all internal schema information is refreshed at this time.

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2397
Created by: [TyrantLucifer|https://github.com/TyrantLucifer]
Labels: enhancement, 
Created at: Tue Aug 15 19:35:17 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34877) [Feature][Pipeline] Flink CDC pipeline transform supports type conversion

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34877:
--

 Summary: [Feature][Pipeline] Flink CDC pipeline transform supports 
type conversion
 Key: FLINK-34877
 URL: https://issues.apache.org/jira/browse/FLINK-34877
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

To be supplemented.

### Solution

To be supplemented.

### Alternatives

None.

### Anything else?

To be supplemented.

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3078
Created by: [aiwenmo|https://github.com/aiwenmo]
Labels: enhancement, 
Created at: Mon Feb 26 23:47:03 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34839) Flink CDC 3.1.0 Plan

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34839:
--

 Summary: Flink CDC 3.1.0 Plan
 Key: FLINK-34839
 URL: https://issues.apache.org/jira/browse/FLINK-34839
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


## Motivation

This is an umbrella issue for Flink CDC 3.1 version

## Solution

**[module] flink-cdc-common**
 * [x] ([#2857|https://github.com/apache/flink-cdc/issues/2857] | 
[FLINK-2857|https://issues.apache.org/jira/browse/FLINK-2857])
 * [x] ([#2936|https://github.com/apache/flink-cdc/issues/2936] | 
[FLINK-2936|https://issues.apache.org/jira/browse/FLINK-2936])
 * [x] ([#2943|https://github.com/apache/flink-cdc/issues/2943] | 
[FLINK-2943|https://issues.apache.org/jira/browse/FLINK-2943])

**[module] flink-cdc-cli**
 * [ ] ([#2934|https://github.com/apache/flink-cdc/issues/2934] | 
[FLINK-2934|https://issues.apache.org/jira/browse/FLINK-2934])
 * [x] ([#2940|https://github.com/apache/flink-cdc/issues/2940] | 
[FLINK-2940|https://issues.apache.org/jira/browse/FLINK-2940])

**[module] flink-cdc-composer**
* [ ] ([#2932|https://github.com/apache/flink-cdc/issues/2932] | 
[FLINK-2932|https://issues.apache.org/jira/browse/FLINK-2932])
* [ ] ([#2882|https://github.com/apache/flink-cdc/issues/2882] | 
[FLINK-2882|https://issues.apache.org/jira/browse/FLINK-2882])
* [ ] ([#2854|https://github.com/apache/flink-cdc/issues/2854] | 
[FLINK-2854|https://issues.apache.org/jira/browse/FLINK-2854])

**[module] flink-cdc-connect/flink-cdc-source-connectors**
* [ ] ([#2869|https://github.com/apache/flink-cdc/issues/2869] | 
[FLINK-2869|https://issues.apache.org/jira/browse/FLINK-2869])
* [x] ([#1747|https://github.com/apache/flink-cdc/issues/1747] | 
[FLINK-1747|https://issues.apache.org/jira/browse/FLINK-1747])
* [x] ([#2867|https://github.com/apache/flink-cdc/issues/2867] | 
[FLINK-2867|https://issues.apache.org/jira/browse/FLINK-2867])
* [x] ([#1152|https://github.com/apache/flink-cdc/issues/1152] | 
[FLINK-1152|https://issues.apache.org/jira/browse/FLINK-1152])
* [ ] ([#2941|https://github.com/apache/flink-cdc/issues/2941] | 
[FLINK-2941|https://issues.apache.org/jira/browse/FLINK-2941])

**[module] flink-cdc-connect/flink-cdc-pipeline-connectors**
* [ ] ([#2691|https://github.com/apache/flink-cdc/issues/2691] | 
[FLINK-2691|https://issues.apache.org/jira/browse/FLINK-2691])
* [ ] ([#2856|https://github.com/apache/flink-cdc/issues/2856] | 
[FLINK-2856|https://issues.apache.org/jira/browse/FLINK-2856])

**[module] flink-cdc-e2e-tests**
* [ ] ([#2859|https://github.com/apache/flink-cdc/issues/2859] | 
[FLINK-2859|https://issues.apache.org/jira/browse/FLINK-2859])

**[module] docs** 
* [x] ([#2935|https://github.com/apache/flink-cdc/issues/2935] | 
[FLINK-2935|https://issues.apache.org/jira/browse/FLINK-2935])
* [x] ([#2940|https://github.com/apache/flink-cdc/issues/2940] | 
[FLINK-2940|https://issues.apache.org/jira/browse/FLINK-2940])

**Bug fix**
* [x] ([#2865|https://github.com/apache/flink-cdc/issues/2865] | 
[FLINK-2865|https://issues.apache.org/jira/browse/FLINK-2865])
* [ ] ([#2853|https://github.com/apache/flink-cdc/issues/2853] | 
[FLINK-2853|https://issues.apache.org/jira/browse/FLINK-2853])
* [x] ([#2966|https://github.com/apache/flink-cdc/issues/2966] | 
[FLINK-2966|https://issues.apache.org/jira/browse/FLINK-2966]) 
* [x] ([#2905|https://github.com/apache/flink-cdc/issues/2905] | 
[FLINK-2905|https://issues.apache.org/jira/browse/FLINK-2905])

## Design docs
https://docs.google.com/document/d/1sC045l08hqZ8C9bVxeZdH5fY5A4Ljz__f5SqZ2Z8qrM/edit?usp=sharing

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2861
Created by: [leonardBang|https://github.com/leonardBang]
Labels: enhancement, 3.1, 
Milestone: V3.1.0 
Assignee: [leonardBang|https://github.com/leonardBang]
Created at: Wed Dec 13 13:58:30 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34892) Nightly AWS connectors build fails on running python tests

2024-03-20 Thread Aleksandr Pilipenko (Jira)

Aleksandr Pilipenko created FLINK-34892:
---

 Summary: Nightly AWS connectors build fails on running python tests
 Key: FLINK-34892
 URL: https://issues.apache.org/jira/browse/FLINK-34892
 Project: Flink
  Issue Type: Bug
  Components: Connectors / AWS
Affects Versions: aws-connector-4.2.0
Reporter: Aleksandr Pilipenko


 

Build for externalized python connector code fails: 
https://github.com/apache/flink-connector-aws/actions/runs/8351768294/job/22860710449
{code:java}
2024-03-20T00:14:35.5215863Z __ 
FlinkKinesisTest.test_kinesis_streams_sink __
2024-03-20T00:14:35.5216781Z 
.tox/py310-cython/lib/python3.10/site-packages/pyflink/testing/test_case_utils.py:149:
 in setUp
2024-03-20T00:14:35.5217584Z self.env = 
StreamExecutionEnvironment.get_execution_environment()
2024-03-20T00:14:35.5218901Z 
.tox/py310-cython/lib/python3.10/site-packages/pyflink/datastream/stream_execution_environment.py:876:
 in get_execution_environment
2024-03-20T00:14:35.5219751Z gateway = get_gateway()
2024-03-20T00:14:35.5220635Z 
.tox/py310-cython/lib/python3.10/site-packages/pyflink/java_gateway.py:64: in 
get_gateway
2024-03-20T00:14:35.5221378Z _gateway = launch_gateway()
2024-03-20T00:14:35.5222111Z 
.tox/py310-cython/lib/python3.10/site-packages/pyflink/java_gateway.py:110: in 
launch_gateway
2024-03-20T00:14:35.5222956Z p = launch_gateway_server_process(env, args)
2024-03-20T00:14:35.5223854Z 
.tox/py310-cython/lib/python3.10/site-packages/pyflink/pyflink_gateway_server.py:262:
 in launch_gateway_server_process
2024-03-20T00:14:35.5224649Z java_executable = find_java_executable()
2024-03-20T00:14:35.5225583Z 
.tox/py310-cython/lib/python3.10/site-packages/pyflink/pyflink_gateway_server.py:75:
 in find_java_executable
2024-03-20T00:14:35.5226449Z java_home = 
read_from_config(KEY_ENV_JAVA_HOME, None, flink_conf_file)
2024-03-20T00:14:35.5227099Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2024-03-20T00:14:35.5227450Z 
2024-03-20T00:14:35.5227774Z key = 'env.java.home', default_value = None
2024-03-20T00:14:35.5228925Z flink_conf_file = 
'/home/runner/work/flink-connector-aws/flink-connector-aws/flink-python/.tox/py310-cython/lib/python3.10/site-packages/pyflink/conf/flink-conf.yaml'
2024-03-20T00:14:35.5229778Z 
2024-03-20T00:14:35.5230010Z def read_from_config(key, default_value, 
flink_conf_file):
2024-03-20T00:14:35.5230581Z value = default_value
2024-03-20T00:14:35.5231236Z # get the realpath of tainted path value 
to avoid CWE22 problem that constructs a path or URI
2024-03-20T00:14:35.5232195Z # using the tainted value and might allow 
an attacker to access, modify, or test the existence
2024-03-20T00:14:35.5232940Z # of critical or sensitive files.
2024-03-20T00:14:35.5233417Z >   with 
open(os.path.realpath(flink_conf_file), "r") as f:
2024-03-20T00:14:35.5234874Z E   FileNotFoundError: [Errno 2] No such file 
or directory: 
'/home/runner/work/flink-connector-aws/flink-connector-aws/flink-python/.tox/py310-cython/lib/python3.10/site-packages/pyflink/conf/flink-conf.yaml'
2024-03-20T00:14:35.5235954Z 
2024-03-20T00:14:35.5236484Z 
.tox/py310-cython/lib/python3.10/site-packages/pyflink/pyflink_gateway_server.py:58:
 FileNotFoundError {code}
Failure started after the release of apache-flink python package for 1.19.0 due 
to change of default config file provided within artifact.

 

 

Issue comes from outdated copy of pyflink_gateway_server.py created as part of 
[https://github.com/apache/flink-connector-kafka/pull/69] (same change is 
duplicated in AWS connectors repository).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34880) [Feature][Pipeline] SchemeRegistry saves the routing rules used to restore the TransformDataOperator

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34880:
--

 Summary: [Feature][Pipeline] SchemeRegistry saves the routing 
rules used to restore the TransformDataOperator
 Key: FLINK-34880
 URL: https://issues.apache.org/jira/browse/FLINK-34880
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

To be supplemented.

### Solution

To be supplemented.

### Alternatives

None.

### Anything else?

To be supplemented.

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3081
Created by: [aiwenmo|https://github.com/aiwenmo]
Labels: enhancement, 
Created at: Mon Feb 26 23:50:04 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34859) [Bug] Oracle cdc in table api does no support server-time-zone option

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34859:
--

 Summary: [Bug] Oracle cdc in table api does no support 
server-time-zone option
 Key: FLINK-34859
 URL: https://issues.apache.org/jira/browse/FLINK-34859
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.17.1

### Flink CDC version

3.0.0

### Database and its version

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production  
With the Partitioning, OLAP, Data Mining and Real Application Testing options

### Minimal reproduce step

## Create a cdc source in table api with `server-time-zone` option specified.

```java
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.createLocalEnvironment();
env.setParallelism(1);

Schema schema = Schema.newBuilder()
.column("NAME", DataTypes.STRING())
.column("ADDR", DataTypes.STRING())
.build();

String factoryIdentifier = new 
OracleTableSourceFactory().factoryIdentifier();
TableDescriptor tableDescriptor = 
TableDescriptor.forConnector(factoryIdentifier)
.schema(schema)
// .format(DebeziumJsonFormatFactory.IDENTIFIER)
.option(OracleSourceOptions.HOSTNAME, "my-oracle-host")
.option(OracleSourceOptions.PORT, 1521)
.option(OracleSourceOptions.USERNAME, "my-oracle-username")
.option(OracleSourceOptions.PASSWORD, "my-oracle-password")
.option(OracleSourceOptions.DATABASE_NAME, "my-oracle-database")
.option(OracleSourceOptions.SCHEMA_NAME, "my-oracle-schema")
.option(OracleSourceOptions.TABLE_NAME, "TEST")
.option(OracleSourceOptions.SCAN_STARTUP_MODE, "initial")
.option(OracleSourceOptions.SCAN_INCREMENTAL_SNAPSHOT_ENABLED, 
false)
.option(OracleSourceOptions.SCAN_SNAPSHOT_FETCH_SIZE, 10)
.option(OracleSourceOptions.SERVER_TIME_ZONE, "Asia/Shanghai")
.option("debezium.include.schema.changes", "false")

.option("debezium.database.history.store.only.captured.tables.ddl", "true")
.build();

StreamTableEnvironmentImpl tEnv = (StreamTableEnvironmentImpl) 
StreamTableEnvironmentImpl.create(env, 
EnvironmentSettings.newInstance().inStreamingMode().build());
Table table = tEnv.from(tableDescriptor);
tEnv.toChangelogStream(table).print();
env.execute();
```
## Exceptions are:
```text
Exception in thread "main" org.apache.flink.table.api.ValidationException: 
Unable to create a source for reading table '*anonymous_oracle-cdc$1*'.

Table options are:

'connector'='oracle-cdc'
'database-name'='my-oracle-database'
'debezium.database.history.store.only.captured.tables.ddl'='true'
'debezium.include.schema.changes'='false'
'hostname'='my-oracle-host'
'password'='**'
'port'='1521'
'scan.incremental.snapshot.enabled'='false'
'scan.snapshot.fetch.size'='10'
'scan.startup.mode'='initial'
'schema-name'='my-oracle-schema'
'server-time-zone'='Asia/Shanghai'
'table-name'='TEST'
'username'='my-oracle-username'
at 
org.apache.flink.table.factories.FactoryUtil.createDynamicTableSource(FactoryUtil.java:167)
at 
org.apache.flink.table.factories.FactoryUtil.createDynamicTableSource(FactoryUtil.java:192)
at 
org.apache.flink.table.planner.plan.schema.CatalogSourceTable.createDynamicTableSource(CatalogSourceTable.java:175)
at 
org.apache.flink.table.planner.plan.schema.CatalogSourceTable.toRel(CatalogSourceTable.java:115)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter$SingleRelVisitor.visit(QueryOperationConverter.java:357)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter$SingleRelVisitor.visit(QueryOperationConverter.java:158)
at 
org.apache.flink.table.operations.SourceQueryOperation.accept(SourceQueryOperation.java:86)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:155)
at 
org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:135)
at 
org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:92)
at 
org.apache.flink.table.operations.SourceQueryOperation.accept(SourceQueryOperation.java:86)
at 
org.apache.flink.table.planner.calcite.FlinkRelBuilder.queryOperation(FlinkRelBuilder.java:261)
at 
org.apache.flink.table.planner.delegation.PlannerBase.translateToRel(PlannerBase.scala:289)
at

[jira] [Created] (FLINK-34798) [Bug] flink cdc mysql, use datastream to synchronize multiple tables, read full amount of data in a loop

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34798:
--

 Summary: [Bug] flink cdc mysql, use datastream to synchronize 
multiple tables, read full amount of data in a loop
 Key: FLINK-34798
 URL: https://issues.apache.org/jira/browse/FLINK-34798
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Flink version

1.16

### Flink CDC version

2.4.0

### Database and its version

mysql 8

### Minimal reproduce step

Occasionally, when using the flinkcdc datastream api to synchronize MySQL 
multi-table data, after the full phase is read, the increment will not start, 
and the full read will continue in a loop.
No restarts or failures while the task is running

### What did you expect to see?

After the full amount is read, incremental reading is performed

### What did you see instead?

After reading in full, continue to read in full

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2286
Created by: [JNSimba|https://github.com/JNSimba]
Labels: bug, 
Created at: Tue Jul 11 11:16:26 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34876) [Feature][Pipeline] Flink CDC pipeline transform supports UDF

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34876:
--

 Summary: [Feature][Pipeline] Flink CDC pipeline transform supports 
UDF
 Key: FLINK-34876
 URL: https://issues.apache.org/jira/browse/FLINK-34876
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

To be supplemented.

### Solution

To be supplemented.

### Alternatives

None.

### Anything else?

To be supplemented.

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3077
Created by: [aiwenmo|https://github.com/aiwenmo]
Labels: enhancement, 
Created at: Mon Feb 26 23:44:59 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34848) refactor: some class in flink-connector-mysql-cdc do not inherit from the base class in flink-cdc-base

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34848:
--

 Summary: refactor: some class in flink-connector-mysql-cdc do not 
inherit from the base class in flink-cdc-base
 Key: FLINK-34848
 URL: https://issues.apache.org/jira/browse/FLINK-34848
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

Currently, some classes in `flink-connector-mysql-cdc` do not inherit from the 
base classes in `flink-cdc-base`, making the code somewhat redundant and not 
very tidy. such as MySqlSourceEnumerator...

Seizing this opportunity for refactoring, many functionalities implemented in 
`flink-connector-mysql-cdc` might be exposed in the form of interfaces, 
allowing other connectors to implement them. 

### Solution

_No response_

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2907
Created by: [cobolbaby|https://github.com/cobolbaby]
Labels: enhancement, 
Created at: Thu Dec 21 14:59:30 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34853) [Feature] submit Flink CDC pipeline job to k8s cluster.

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34853:
--

 Summary: [Feature] submit Flink CDC pipeline job to k8s cluster.
 Key: FLINK-34853
 URL: https://issues.apache.org/jira/browse/FLINK-34853
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

Currently, there is no a clear description to tell how to run a pipeline on k8s.
So If there is no need to modify the code, please add some docs to guide users 
how to submit job to a Kubernetes cluster.
If it's necessary to modify the code, you can submit a pr and add doc to fix 
it. 

### Solution

_No response_

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2934
Created by: [lvyanquan|https://github.com/lvyanquan]
Labels: enhancement, 
Created at: Wed Dec 27 11:01:17 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34867) [Bug] [StarRocks] [cdc master branch] Unsupported CDC data type BYTES/VARBINARY

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34867:
--

 Summary: [Bug] [StarRocks] [cdc master branch] Unsupported CDC 
data type BYTES/VARBINARY
 Key: FLINK-34867
 URL: https://issues.apache.org/jira/browse/FLINK-34867
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

Flink 1.18

### Flink CDC version

3.0.1 
master 

### Database and its version

mysql 5.7.44
StarRocks 3.2.2

### Minimal reproduce step

To synchronize the entire database using regular expressions, tables: db.\.*,


```log
2024-01-29 16:32:18
java.lang.UnsupportedOperationException: Unsupported CDC data type BYTES
at 
com.ververica.cdc.connectors.starrocks.sink.StarRocksUtils$CdcDataTypeTransformer.defaultMethod(StarRocksUtils.java:368)
at 
com.ververica.cdc.connectors.starrocks.sink.StarRocksUtils$CdcDataTypeTransformer.defaultMethod(StarRocksUtils.java:236)
at 
com.ververica.cdc.common.types.DataTypeDefaultVisitor.visit(DataTypeDefaultVisitor.java:49)
at 
com.ververica.cdc.common.types.VarBinaryType.accept(VarBinaryType.java:86)
at 
com.ververica.cdc.connectors.starrocks.sink.StarRocksUtils.toStarRocksDataType(StarRocksUtils.java:112)
at 
com.ververica.cdc.connectors.starrocks.sink.StarRocksUtils.toStarRocksTable(StarRocksUtils.java:86)
at 
com.ververica.cdc.connectors.starrocks.sink.StarRocksMetadataApplier.applyCreateTable(StarRocksMetadataApplier.java:87)
at 
com.ververica.cdc.connectors.starrocks.sink.StarRocksMetadataApplier.applySchemaChange(StarRocksMetadataApplier.java:70)
at 
com.ververica.cdc.runtime.operators.schema.coordinator.SchemaRegistryRequestHandler.applySchemaChange(SchemaRegistryRequestHandler.java:82)
at 
com.ververica.cdc.runtime.operators.schema.coordinator.SchemaRegistryRequestHandler.flushSuccess(SchemaRegistryRequestHandler.java:149)
at 
com.ververica.cdc.runtime.operators.schema.coordinator.SchemaRegistry.handleEventFromOperator(SchemaRegistry.java:123)
at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.handleEventFromOperator(OperatorCoordinatorHolder.java:204)
at 
org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.deliverOperatorEventToCoordinator(DefaultOperatorCoordinatorHandler.java:121)
at 
org.apache.flink.runtime.scheduler.SchedulerBase.deliverOperatorEventToCoordinator(SchedulerBase.java:1062)
at 
org.apache.flink.runtime.jobmaster.JobMaster.sendOperatorEventToCoordinator(JobMaster.java:604)
at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:309)
at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:307)
at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:222)
at 
org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85)
at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168)
at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33)
at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
at 
org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
at org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547)
at org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545)
at 
org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229)
at org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590)
at org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557)
at org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280)
at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241)
at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253)
at

[jira] [Created] (FLINK-34864) [enhancement] [ cdc-connector] [ignore no primary key table] When using regular expressions to synchronize the entire database with Flink CDC, skip tables that do not ha

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34864:
--

 Summary: [enhancement] [ cdc-connector] [ignore no primary key 
table] When using regular expressions to synchronize the entire database with 
Flink CDC, skip tables that do not have a primary key.
 Key: FLINK-34864
 URL: https://issues.apache.org/jira/browse/FLINK-34864
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [x] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

If regular expressions are used to match the tables to be synchronized, and 
some tables in the database do not have a primary key, when submitting a task 
with Flink CDC 3.0, an error occurs. Can we add an option to skip the 
synchronization of tables without a primary key, such as 
ignore-no-primary-key-table: true?


**Flink version**
1.18

**Flink CDC version**
3.0.0

**Database and its version**
Mysql 5.7 ,  StarRocks 3.2.2

### Solution

ignore-no-primary-key-table: true

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3006
Created by: [everhopingandwaiting|https://github.com/everhopingandwaiting]
Labels: enhancement, 
Created at: Thu Jan 18 11:41:12 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34882) mysql cdc to doris: TIME type converting fail

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34882:
--

 Summary: mysql cdc to doris:  TIME type converting fail
 Key: FLINK-34882
 URL: https://issues.apache.org/jira/browse/FLINK-34882
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


```txt
2024-03-05 18:23:42
java.lang.RuntimeException: Failed to schema change, 
CreateTableEvent{tableId=app.xxx, schema=columns={`id` BIGINT NOT 
NULL,`create_time` TIMESTAMP(6) NOT NULL,`end_time` TIME(0) NOT 
NULL,`start_time` TIME(0) NOT NULL,`update_time` TIMESTAMP(6)}, primaryKeys=id, 
options=()}, reason: Flink doesn't support converting type TIME(0) to Doris 
type yet.
at 
com.ververica.cdc.connectors.doris.sink.DorisMetadataApplier.applySchemaChange(DorisMetadataApplier.java:86)
at 
com.ververica.cdc.runtime.operators.schema.coordinator.SchemaRegistryRequestHandler.applySchemaChange(SchemaRegistryRequestHandler.java:82)
at 
com.ververica.cdc.runtime.operators.schema.coordinator.SchemaRegistryRequestHandler.flushSuccess(SchemaRegistryRequestHandler.java:149)
at 
com.ververica.cdc.runtime.operators.schema.coordinator.SchemaRegistry.handleEventFromOperator(SchemaRegistry.java:123)
at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.handleEventFromOperator(OperatorCoordinatorHolder.java:204)
at 
org.apache.flink.runtime.scheduler.DefaultOperatorCoordinatorHandler.deliverOperatorEventToCoordinator(DefaultOperatorCoordinatorHandler.java:121)
at 
org.apache.flink.runtime.scheduler.SchedulerBase.deliverOperatorEventToCoordinator(SchedulerBase.java:1062)
at 
org.apache.flink.runtime.jobmaster.JobMaster.sendOperatorEventToCoordinator(JobMaster.java:604)
at jdk.internal.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:309)
at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:307)
at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:222)
at 
org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85)
at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168)
at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33)
at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
at 
org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
at org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547)
at org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545)
at 
org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229)
at org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590)
at org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557)
at org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280)
at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241)
at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253)
at 
java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
at 
java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)


```

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3099
Created by: [laizuan|https://github.com/laizuan]
Labels: 
Created at: Tue Mar 05 18:30:52 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34885) mysql to sr pipeline， float value bug

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34885:
--

 Summary: mysql to  sr   pipeline， float  value  bug
 Key: FLINK-34885
 URL: https://issues.apache.org/jira/browse/FLINK-34885
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


env:   flink 1.18.1   flinkcdc 3.0.1   starrocks: 3.2.3mysql:  5.6.* 

xxx.yaml :
`pipeline: {name: 5411_total_info_to_sr, parallelism: 1}
route: null
sink: {jdbc-url: 'jdbc:mysql://172.16.100.40:9030', load-url: 
'172.16.100.40:8030',
  name: StarRocks Sink, password:***, 
properties.properties.max_filter_ratio: 0.9,
  properties.properties.strict_mode: false, 
table.create.properties.replication_num: 1,
  type: starrocks, username: **}
source: {hostname: mysql-cluster.internal.**.com, password:**, port: 
3306,
  scan.startup.mode: timestamp, scan.startup.timestamp-millis: 1710233721754, 
schema-change.enabled: false,
  server-id: 5411, server-time-zone: Asia/Shanghai, tables: 
'enlightent_daily.\.*_total_info,enlightent_daily.\.video_basic_info',
  type: mysql, username:**}`

comand: bin/flink-cdc.sh   xxx.yaml

bug:   the fileds of float type  are   error，the fileds of double type are 
right.

then  I  try  myql-cdc alone, the values  print  right on the  flink sql client 
screen.

then I try  create flink table  for source table(mysql) and  sink 
table(starrocks)  on flink sql client.

`insert into master_cost_float_sr select * from master_cost_float/*+ 
OPTIONS('server-id'='5801-5804') */;

[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:
Job ID: 14620b59ae166ea75a6ab3eda071872f
`
by this way, the  float  fields   are  right for  cdc. 




 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3132
Created by: [dickson-bit|https://github.com/dickson-bit]
Labels: 
Created at: Wed Mar 13 11:20:48 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34881) [Feature] Flink CDC Oracle support tables without primary keys and scan newly added tables

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34881:
--

 Summary: [Feature] Flink CDC Oracle support tables without primary 
keys and scan newly added tables
 Key: FLINK-34881
 URL: https://issues.apache.org/jira/browse/FLINK-34881
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation
   In the  Incremental Snapshot based DataStream,I hope this support, thank you!
-  Like MySQL CDC support tables that do not have a primary key. To use a table 
without primary keys, you must configure the 
scan.incremental.snapshot.chunk.key-column option and specify one non-null 
field.
- Scan Newly Added Tables feature enables you add new tables to monitor for 
existing running pipeline, the newly added tables will read theirs snapshot 
data firstly and then read their changelog automatically.

### Solution

_No response_

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3095
Created by: [Fly365|https://github.com/Fly365]
Labels: enhancement, 
Created at: Tue Mar 05 10:34:53 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34879) [Feature][Pipeline] Flink CDC pipeline transform supports the strategy of schema evolution

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34879:
--

 Summary: [Feature][Pipeline] Flink CDC pipeline transform supports 
the strategy of schema evolution
 Key: FLINK-34879
 URL: https://issues.apache.org/jira/browse/FLINK-34879
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

To be supplemented.

### Solution

To be supplemented.

### Alternatives

None.

### Anything else?

To be supplemented.

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3080
Created by: [aiwenmo|https://github.com/aiwenmo]
Labels: enhancement, 
Created at: Mon Feb 26 23:48:38 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

RE: Additional metadata available for Kafka serdes

2024-03-20 Thread David Radley

Hi Balint,
Excellent, I have just put up a discussion thread on the dev list for a new Fli 
top add this, please review and feedback and +1 if this is what you are looking 
for.
At the moment, I have not got the topic name in there. I wonder where we would 
use the topic name during deserialization or serialization? The only place I 
could see it being used could be in serialization, where we register a 
(potentially new) schema, but this may not be desired as the schema could be a 
nested schema.
WDYT
 kind regards, David.

From: Balint Bene 
Date: Thursday, 14 March 2024 at 20:16
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: Additional metadata available for Kafka serdes
Hi David!

I think passing the headers as a map (as opposed to
ConsumerRecord/ProducerRecord) is a great idea that should work. That way
the core Flink package doesn't have Kafka dependencies, it seems like
they're meant to be decoupled anyway. The one bonus that using the Record
objects has is that it also provides the topic name, which is a part of the
signature (but usually unused) for Kafka serdes. Do you think it's
worthwhile to also have the topic name included in the signature along with
the map?

Happy to test things out, provide feedback. I'm not working on an Apicurio
format myself, but the use case is very similar.

Thanks,
Balint

On Thu, Mar 14, 2024 at 12:41 PM David Radley 
wrote:

> Hi ,
> I am currently prototyping an Avro Apicurio format that I hope to raise as
> a FLIP very soon (hopefully by early  next week). In my prototyping , I am
> passing through the Kafka headers content as a map to the
> DeserializationSchema and have extended the SerializationSchema to pass
> back headers. I am using new default methods in the interface so as to be
> backwardly compatible. I have the deserialise working and the serialise is
> close.
>
> We did consider trying to use the Apicurio deser libraries but this is
> tricky due to the way the code is split.
>
> Let me know what you think – I hope this approach will meet your needs,
> Kind regards, David.
>
> From: Balint Bene 
> Date: Tuesday, 12 March 2024 at 22:18
> To: dev@flink.apache.org 
> Subject: [EXTERNAL] Additional metadata available for Kafka serdes
> Hello! Looking to get some guidance for a problem around the Flink formats
> used for Kafka.
>
> Flink currently uses common serdes interfaces across all formats. However,
> some data formats used in Kafka require headers for serdes.  It's the same
> problem for serialization and deserialization, so I'll just use
> DynamicKafkaDeserialationSchema
> <
> https://github.com/Shopify/shopify-flink-connector-kafka/blob/979791c4c71e944c16c51419cf9a84aa1f8fea4c/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/DynamicKafkaDeserializationSchema.java#L130
> >
> as
> an example. It has access to the Kafka record headers, but it can't pass
> them to the DeserializationSchema
> <
> https://github.com/apache/flink/blob/94b55d1ae61257f21c7bb511660e7497f269abc7/flink-core/src/main/java/org/apache/flink/api/common/serialization/DeserializationSchema.java#L81
> >
> implemented
> by the format since the interface is generic.
>
> If it were possible to pass the headers, then open source formats such as
> Apicurio could be supported. Unlike the Confluent formats which store the
> metadata (schema ID) appended to the serialized bytes in the key and value,
> the Apicurio formats store their metadata in the record headers.
>
> I have bandwidth to work on this, but it would be great to have direction
> from the community. I have a simple working prototype that's able to load a
> custom version of the format with a modified interface that can accept the
> headers (I just put the entire Apache Kafka ConsumerRecord
> <
> https://kafka.apache.org/36/javadoc/org/apache/kafka/clients/consumer/ConsumerRecord.html
> >
> /ProducerRecord
> <
> https://kafka.apache.org/36/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html
> >
> for simplicity). The issues I foresee is that the class-loader
> <
> https://github.com/apache/flink/blob/94b55d1ae61257f21c7bb511660e7497f269abc7/flink-core/src/main/java/org/apache/flink/core/classloading/ComponentClassLoader.java
> >
> exists in the Flink repo along with interfaces for the formats, but these
> changes are specific to Kafka. This solution could require migrating
> formats to the Flink-connector-kafka repo which is a decent amount of work.
>
> Feedback is appreciated!
> Thanks
> Balint
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

[jira] [Created] (FLINK-34870) [Bug] MongoCDC checkpoint error

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34870:
--

 Summary: [Bug] MongoCDC checkpoint error
 Key: FLINK-34870
 URL: https://issues.apache.org/jira/browse/FLINK-34870
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.15.2

### Flink CDC version

2.4.1

### Database and its version

5

### Minimal reproduce step

When MongoCDC starts from a checkpoint, if there have been no data changes 
(such as inserts, updates, or deletes] before the checkpoint, starting from the 
checkpoint will still initiate the capture of all data. Only after data changes 
occur, will the initialization data not be captured.

### What did you expect to see?

After starting from a checkpoint, regardless of whether there have been data 
changes in the database, data will be captured starting from the offset.

### What did you see instead?

capture all data

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3053
Created by: [illsiveblue|https://github.com/illsiveblue]
Labels: bug, 
Created at: Thu Feb 01 10:22:33 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34865) [enhancement] [StarRocks] When synchronizing tables using Flink CDC 3.x, is it possible to include the comments of the source table's fields when creating the target tab

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34865:
--

 Summary: [enhancement] [StarRocks] When synchronizing tables using 
Flink CDC 3.x, is it possible to include the comments of the source table's 
fields when creating the target table?
 Key: FLINK-34865
 URL: https://issues.apache.org/jira/browse/FLINK-34865
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues] and found 
nothing similar.


### Motivation

When synchronizing tables using Flink CDC 3.x, is it possible to include the 
comments of the source table's fields when creating the target table?

### Solution

_No response_

### Alternatives

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/3007
Created by: [everhopingandwaiting|https://github.com/everhopingandwaiting]
Labels: enhancement, 
Created at: Thu Jan 18 11:58:45 CST 2024
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-402: Extend ZooKeeper Curator configurations

2024-03-20 Thread Yang Wang

+1 (binding) since ZK HA is still widely used.


Best,
Yang

On Thu, Mar 14, 2024 at 6:27 PM Matthias Pohl
 wrote:

> Nothing to add from my side. Thanks, Alex.
>
> +1 (binding)
>
> On Thu, Mar 7, 2024 at 4:09 PM Alex Nitavsky 
> wrote:
>
> > Hi everyone,
> >
> > I'd like to start a vote on FLIP-402 [1]. It introduces new configuration
> > options for Apache Flink's ZooKeeper integration for high availability by
> > reflecting existing Apache Curator configuration options. It has been
> > discussed in this thread [2].
> >
> > I would like to start a vote.  The vote will be open for at least 72
> hours
> > (until March 10th 18:00 GMT) unless there is an objection or
> > insufficient votes.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-402%3A+Extend+ZooKeeper+Curator+configurations
> > [2] https://lists.apache.org/thread/gqgs2jlq6bmg211gqtgdn8q5hp5v9l1z
> >
> > Thanks
> > Alex
> >
>

[jira] [Created] (FLINK-34814) [Bug] java.lang.NoClassDefFoundError: Could not initialize class org.apache.kafka.connect.json.JsonConverterConfig

2024-03-20 Thread Flink CDC Issue Import (Jira)

Flink CDC Issue Import created FLINK-34814:
--

 Summary: [Bug] java.lang.NoClassDefFoundError: Could not 
initialize class org.apache.kafka.connect.json.JsonConverterConfig
 Key: FLINK-34814
 URL: https://issues.apache.org/jira/browse/FLINK-34814
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Flink CDC Issue Import


### Search before asking

- [X] I searched in the 
[issues|https://github.com/ververica/flink-cdc-connectors/issues) and found 
nothing similar.


### Flink version

1.14.0

### Flink CDC version

2.3.0

### Database and its version

MySQL5.7

### Minimal reproduce step

 //设置相关ck的参数
env.enableCheckpointing(36L);
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
env.getCheckpointConfig().setCheckpointTimeout(36L);
env.getCheckpointConfig().setFailOnCheckpointingErrors(false);

env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);

env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

//cdc的相关参数
Properties prop = new Properties();
prop.setProperty("snapshot.locking.mode","none");

//使用cdc来读取binlog日志
DebeziumSourceFunction dataSource = 
MySqlSource.builder()
.hostname("")
.port(3306)
.username("rp")
.password("u5")
.databaseList("iap")
.tableList("iap.cdc_test_1")
.deserializer(new JsonDebeziumDeserializationSchema())
.startupOptions(StartupOptions.latest())
.debeziumProperties(prop)
.build();

SingleOutputStreamOperator streamSource = 
env.addSource(dataSource).name("data-source");
streamSource.print("==>"+streamSource];
but it causes an error:
 java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.kafka.connect.json.JsonConverterConfig

### What did you expect to see?

run success

### What did you see instead?

success

### Anything else?

no

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

 Imported from GitHub 
Url: https://github.com/apache/flink-cdc/issues/2491
Created by: [ccc6|https://github.com/ccc6]
Labels: bug, 
Created at: Thu Sep 14 11:14:15 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 >

1 - 100 of 195 matches

Mail list logo