[jira] [Created] (FLINK-33789) Expose restart time as metric

2023-12-10 Thread Alexander Fedulov (Jira)
Alexander Fedulov created FLINK-33789:
-

 Summary: Expose restart time as metric
 Key: FLINK-33789
 URL: https://issues.apache.org/jira/browse/FLINK-33789
 Project: Flink
  Issue Type: New Feature
  Components: Autoscaler, Kubernetes Operator
Reporter: Alexander Fedulov
Assignee: Alexander Fedulov
 Fix For: kubernetes-operator-1.8.0


Currently the autoscaler uses a preconfigured restart time for the job. We 
should dynamically adjust this on the observered restart times for scale 
operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] Experimental Java 21 support now available on master

2023-12-10 Thread Sergey Nuyanzin
thanks for checking and creation the ticket

yes, that probably makes sense

On Thu, Nov 30, 2023 at 1:08 PM Yun Tang  wrote:

> Hi Sergey,
>
> I checked the CI [1] which was executed with Java21, and noticed that the
> StatefulJobSnapshotMigrationITCase-related tests have passed, which proves
> what I guessed before, most checkpoints/savepoints should be restored
> successfully.
>
> I think we shall introduce such snapshot migration tests, which restore
> snapshots containing scala code. I also create a ticket focused on Java17
> [2]
>
>
> [1]
> https://dev.azure.com/snuyanzin/flink/_build/results?buildId=2620&view=logs&j=0a15d512-44ac-5ba5-97ab-13a5d066c22c&t=9a028d19-6c4b-5a4e-d378-03fca149d0b1
> [2] https://issues.apache.org/jira/browse/FLINK-33707
>
>
> Best
> Yun Tang
> 
> From: Sergey Nuyanzin 
> Sent: Thursday, November 30, 2023 14:41
> To: dev@flink.apache.org 
> Subject: Re: [ANNOUNCE] Experimental Java 21 support now available on
> master
>
> Thanks Yun Tang
>
> One question to clarify: since the scala version was also bumped for java
> 17, shouldn't there be a similar task for java 17?
>
> On Thu, Nov 30, 2023 at 3:43 AM Yun Tang  wrote:
>
> > Hi Sergey,
> >
> > You can leverage all tests extending SnapshotMigrationTestBase[1] to
> > verify the logic. I believe all binary _metadata existing in the
> resources
> > folder[2] were built by JDK8.
> >
> > I also create a ticket FLINK-33699[3] to track this.
> >
> > [1]
> >
> https://github.com/apache/flink/blob/master/flink-tests/src/test/java/org/apache/flink/test/checkpointing/utils/SnapshotMigrationTestBase.java
> > [2]
> >
> https://github.com/apache/flink/tree/master/flink-tests/src/test/resources
> > [3] https://issues.apache.org/jira/browse/FLINK-33699
> >
> > Best
> > Yun Tang
> > 
> > From: Sergey Nuyanzin 
> > Sent: Wednesday, November 29, 2023 22:56
> > To: dev@flink.apache.org 
> > Subject: Re: [ANNOUNCE] Experimental Java 21 support now available on
> > master
> >
> > thanks for the response
> >
> >
> > >I feel doubt about the conclusion that "don't try to load a savepoint
> from
> > a Java 8/11/17 build due to bumping to scala-2.12.18", since the
> > snapshotted state (operator/keyed state-backend),  and most key/value
> > serializer snapshots are generated by pure-java code.
> > >The only left part is that the developer uses scala UDF or scala types
> for
> > key/value types. However, since all user-facing scala APIs have been
> > deprecated, I don't think we have so many cases. Maybe we can give
> > descriptions without such strong suggestions.
> >
> > That is the area where I feel I lack the knowledge to answer this
> > precisely.
> > My assumption was that statement about Java 21 regarding this should be
> > similar to Java 17 which is almost same [1]
> > Sorry for the inaccuracy
> > Based on your statements I agree that the conclusion could be more
> relaxed.
> >
> > I'm curious whether there are some tests or anything which could clarify
> > this?
> >
> > [1] https://lists.apache.org/thread/mz0m6wqjmqy8htob3w4469pjbg9305do
> >
> > On Wed, Nov 29, 2023 at 12:25 PM Yun Tang  wrote:
> >
> > > Thanks Sergey for the great work.
> > >
> > > I feel doubt about the conclusion that "don't try to load a savepoint
> > from
> > > a Java 8/11/17 build due to bummping to scala-2.12.18", since the
> > > snapshotted state (operator/keyed state-backend),  and most key/value
> > > serializer snapshots are generated by pure-java code. The only left
> part
> > is
> > > that the developer uses scala UDF or scala types for key/value types.
> > > However, since all user-facing scala APIs have been deprecated [1], I
> > don't
> > > think we have so many cases. Maybe we can give descriptions without
> such
> > > strong suggestions.
> > >
> > > Please correct me if I am wrong.
> > >
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-29740
> > >
> > > Best
> > > Yun Tang
> > >
> > > 
> > > From: Rui Fan <1996fan...@gmail.com>
> > > Sent: Wednesday, November 29, 2023 16:43
> > > To: dev@flink.apache.org 
> > > Subject: Re: [ANNOUNCE] Experimental Java 21 support now available on
> > > master
> > >
> > > Thanks Sergey for the great work!
> > >
> > > Best,
> > > Rui
> > >
> > > On Wed, Nov 29, 2023 at 4:42 PM Leonard Xu  wrote:
> > >
> > > > Cool !
> > > >
> > > > Thanks Sergey for the great effort and all involved.
> > > >
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > > > 2023年11月29日 下午4:31,Swapnal Varma  写道:
> > > > >
> > > > > Congratulations Sergey, and everyone involved!
> > > > >
> > > > > Excited to work with and on this!
> > > > >
> > > > > Best,
> > > > > Swapnal
> > > > >
> > > > >
> > > > > On Wed, 29 Nov 2023, 13:58 Sergey Nuyanzin, 
> > > wrote:
> > > > >
> > > > >> The master branch now builds and runs with Java 21 out-of-the-box.
> > > > >>
> > > > >> Notes:
> > > > >> - a nightly cron build was set up.
> > > > >> - In Java 21 builds, 

Re: [VOTE] Release flink-shaded 18.0, release candidate #1

2023-12-10 Thread Sergey Nuyanzin
Hey everyone,

The vote for flink-shaded 18.0 is still open. Please test and vote for
rc1, so that we can release it.

On Thu, Nov 30, 2023 at 4:03 PM Jing Ge  wrote:

> +1(not binding)
>
> - validate checksum
> - validate hash
> - checked the release notes
> - verified that no binaries exist in the source archive
> - build the source with Maven 3.8.6 and jdk11
> - checked repo
> - checked tag
> - verified web PR
>
> Best regards,
> Jing
>
> On Thu, Nov 30, 2023 at 11:39 AM Sergey Nuyanzin 
> wrote:
>
> > +1 (non-binding)
> >
> > - Downloaded all the resources
> > - Validated checksum hash
> > - Build the source with Maven and jdk8
> > - Build Flink master with new flink-shaded and check that all the tests
> are
> > passing
> >
> > one minor thing that I noticed during releasing: for ci it uses maven
> 3.8.6
> > at the same time for release profile there is an enforcement plugin to
> > check that maven version is less than 3.3
> > I created a jira issue[1] for that
> > i made the release with 3.2.5 maven version (I suppose previous version
> was
> > also done with 3.2.5 because of same issue)
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-33703
> >
> > On Wed, Nov 29, 2023 at 11:41 AM Matthias Pohl 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > * Downloaded all resources
> > > * Extracts sources and compilation on these sources
> > > * Diff of git tag checkout with downloaded sources
> > > * Verifies SHA512 checksums & GPG certification
> > > * Checks that all POMs have the right expected version
> > > * Generated diffs to compare pom file changes with NOTICE files:
> Nothing
> > > suspicious found except for a minor (non-blocking) typo [1]
> > >
> > > Thanks for driving this effort, Sergey. :)
> > >
> > > [1] https://github.com/apache/flink-shaded/pull/126/files#r1409080162
> > >
> > > On Wed, Nov 29, 2023 at 10:25 AM Rui Fan <1996fan...@gmail.com> wrote:
> > >
> > >> Sorry, it's non-binding.
> > >>
> > >> On Wed, Nov 29, 2023 at 5:19 PM Rui Fan <1996fan...@gmail.com> wrote:
> > >>
> > >> > Thanks Matthias for the clarification!
> > >> >
> > >> > After I import the latest KEYS, it works fine.
> > >> >
> > >> > +1(binding)
> > >> >
> > >> > - Validated checksum hash
> > >> > - Verified signature
> > >> > - Verified that no binaries exist in the source archive
> > >> > - Build the source with Maven and jdk8
> > >> > - Verified licenses
> > >> > - Verified web PRs, and left a comment
> > >> >
> > >> > Best,
> > >> > Rui
> > >> >
> > >> > On Wed, Nov 29, 2023 at 5:05 PM Matthias Pohl
> > >> >  wrote:
> > >> >
> > >> >> The key is the last key in the KEYS file. It's just having a
> > different
> > >> >> format with spaces being added (due to different gpg versions?):
> F752
> > >> 9FAE
> > >> >> 2481 1A5C 0DF3  CA74 1596 BBF0 7268 35D8
> > >> >>
> > >> >> On Wed, Nov 29, 2023 at 9:41 AM Rui Fan <1996fan...@gmail.com>
> > wrote:
> > >> >>
> > >> >> > Hey Sergey,
> > >> >> >
> > >> >> > Thank you for driving this release.
> > >> >> >
> > >> >> > I try to check this signature, the whole key is
> > >> >> > F7529FAE24811A5C0DF3CA741596BBF0726835D8,
> > >> >> > it matches your 1596BBF0726835D8, but I cannot
> > >> >> > find it from the Flink KEYS[1].
> > >> >> >
> > >> >> > Please correct me if my operation is wrong, thanks~
> > >> >> >
> > >> >> > [1] https://dist.apache.org/repos/dist/release/flink/KEYS
> > >> >> >
> > >> >> > Best,
> > >> >> > Rui
> > >> >> >
> > >> >> >
> > >> >> > On Wed, Nov 29, 2023 at 6:09 AM Sergey Nuyanzin <
> > snuyan...@gmail.com
> > >> >
> > >> >> > wrote:
> > >> >> >
> > >> >> > > Hi everyone,
> > >> >> > > Please review and vote on the release candidate #1 for the
> > version
> > >> >> 18.0,
> > >> >> > as
> > >> >> > > follows:
> > >> >> > > [ ] +1, Approve the release
> > >> >> > > [ ] -1, Do not approve the release (please provide specific
> > >> comments)
> > >> >> > >
> > >> >> > >
> > >> >> > > The complete staging area is available for your review, which
> > >> >> includes:
> > >> >> > > * JIRA release notes [1],
> > >> >> > > * the official Apache source release to be deployed to
> > >> >> dist.apache.org
> > >> >> > > [2],
> > >> >> > > which are signed with the key with fingerprint 1596BBF0726835D8
> > >> [3],
> > >> >> > > * all artifacts to be deployed to the Maven Central Repository
> > [4],
> > >> >> > > * source code tag "release-18.0-rc1" [5],
> > >> >> > > * website pull request listing the new release [6].
> > >> >> > >
> > >> >> > > The vote will be open for at least 72 hours. It is adopted by
> > >> majority
> > >> >> > > approval, with at least 3 PMC affirmative votes.
> > >> >> > >
> > >> >> > > Thanks,
> > >> >> > > Sergey
> > >> >> > >
> > >> >> > > [1]
> > >> https://issues.apache.org/jira/projects/FLINK/versions/12353081
> > >> >> > > [2]
> > >> >> https://dist.apache.org/repos/dist/dev/flink/flink-shaded-18.0-rc1
> > >> >> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > >> >> > > [4]
> > >> >> > >
> > >> >>
> > >>
> https://reposit

Re: [NOTICE] Hive connector externalization

2023-12-10 Thread yuxia
Thanks Sergey for the work. Happy to see we can externalize Hive connector 
finally.

Best regards,
Yuxia

- 原始邮件 -
发件人: "snuyanzin" 
收件人: "dev" 
发送时间: 星期六, 2023年 12 月 09日 上午 6:24:35
主题: [NOTICE] Hive connector externalization

Hi everyone

We are getting close to the externalization of Hive connector[1].
Since currently externalized version is already passing tests against
release-1.18 and release-1.19 then I'm going to remove Hive connector code
from Flink main repo[2]. For that reason I would kindly ask to avoid
merging of Hive connector related changes to Flink main repo (master
branch) in order to make this smoother. Instead it would be better to
create/merge  prs to connector's repo[3]

Also huge shoutout to Yuxia Luo, Martijn Visser, Ryan Skraba for the review

[1] https://issues.apache.org/jira/browse/FLINK-30064
[2] https://issues.apache.org/jira/browse/FLINK-33786
[3] https://github.com/apache/flink-connector-hive

-- 
Best regards,
Sergey


Re: [DISCUSS] Release Flink 1.18.1

2023-12-10 Thread Yun Tang
Thanks Jing for driving 1.18.1 release, +1 for this.


Best
Yun Tang

From: Rui Fan <1996fan...@gmail.com>
Sent: Saturday, December 9, 2023 21:46
To: dev@flink.apache.org 
Subject: Re: [DISCUSS] Release Flink 1.18.1

Thanks Jing for driving this release, +1

Best,
Rui

On Sat, Dec 9, 2023 at 7:33 AM Leonard Xu  wrote:

> Thanks Jing for driving this release, +1
>
> Best,
> Leonard
>
> > 2023年12月9日 上午1:23,Danny Cranmer  写道:
> >
> > +1
> >
> > Thanks for driving this
> >
> > On Fri, 8 Dec 2023, 12:05 Timo Walther,  wrote:
> >
> >> Thanks for taking care of this Jing.
> >>
> >> +1 to release 1.18.1 for this.
> >>
> >> Cheers,
> >> Timo
> >>
> >>
> >> On 08.12.23 10:00, Benchao Li wrote:
> >>> I've merged FLINK-33313 to release-1.18 branch.
> >>>
> >>> Péter Váry  于2023年12月8日周五 16:56写道:
> 
>  Hi Jing,
>  Thanks for taking care of this!
>  +1 (non-binding)
>  Peter
> 
>  Sergey Nuyanzin  ezt írta (időpont: 2023. dec.
> >> 8., P,
>  9:36):
> 
> > Thanks Jing driving it
> > +1
> >
> > also +1 to include FLINK-33313 mentioned by Benchao Li
> >
> > On Fri, Dec 8, 2023 at 9:17 AM Benchao Li 
> >> wrote:
> >
> >> Thanks Jing for driving 1.18.1 releasing.
> >>
> >> I would like to include FLINK-33313[1] in 1.18.1, it's just a
> bugfix,
> >> not a blocker, but it's already merged into master, I plan to merge
> it
> >> to 1.8/1.7 branches today after the CI passes.
> >>
> >> [1] https://issues.apache.org/jira/browse/FLINK-33313
> >>
> >> Jing Ge  于2023年12月8日周五 16:06写道:
> >>>
> >>> Hi all,
> >>>
> >>> I would like to discuss creating a new 1.18 patch release (1.18.1).
> >> The
> >>> last 1.18 release is nearly two months old, and since then, 37
> >> tickets
> >> have
> >>> been closed [1], of which 6 are blocker/critical [2].  Some of them
> >> are
> >>> quite important, such as FLINK-33598 [3]
> >>>
> >>> Most urgent and important one is FLINK-33523 [4] and according to
> the
> >>> discussion thread[5] on the ML, 1.18.1 should/must be released asap
> > after
> >>> the breaking change commit has been reverted.
> >>>
> >>> I am not aware of any other unresolved blockers and there are no
> >> in-progress
> >>> tickets [6].
> >>> Please let me know if there are any issues you'd like to be
> included
> >> in
> >>> this release but still not merged.
> >>>
> >>> If the community agrees to create this new patch release, I could
> >>> volunteer as the release manager.
> >>>
> >>> Best regards,
> >>> Jing
> >>>
> >>> [1]
> >>>
> >>
> >
> >>
> https://issues.apache.org/jira/browse/FLINK-33567?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.18.1%20%20and%20resolution%20%20!%3D%20%20Unresolved%20order%20by%20priority%20DESC
> >>> [2]
> >>>
> >>
> >
> >>
> https://issues.apache.org/jira/browse/FLINK-33693?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.18.1%20and%20resolution%20%20!%3D%20Unresolved%20%20and%20priority%20in%20(Blocker%2C%20Critical)%20ORDER%20by%20priority%20%20DESC
> >>> [3] https://issues.apache.org/jira/browse/FLINK-33598
> >>> [4] https://issues.apache.org/jira/browse/FLINK-33523
> >>> [5]
> https://lists.apache.org/thread/m4c879y8mb7hbn2kkjh9h3d8g1jphh3j
> >>> [6]
> https://issues.apache.org/jira/projects/FLINK/versions/12353640
> >>> Thanks,
> >>
> >>
> >>
> >> --
> >>
> >> Best,
> >> Benchao Li
> >>
> >
> >
> > --
> > Best regards,
> > Sergey
> >
> >>>
> >>>
> >>>
> >>
> >>
>
>


Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink

2023-12-10 Thread Xin Gong


good news.

+1

Best,
gongxin
On 2023/12/07 03:24:59 Leonard Xu wrote:
> Dear Flink devs,
> 
> As you may have heard, we at Alibaba (Ververica) are planning to donate CDC 
> Connectors for the Apache Flink project[1] to the Apache Flink community.
> 
> CDC Connectors for Apache Flink comprise a collection of source connectors 
> designed specifically for Apache Flink. These connectors[2] enable the 
> ingestion of changes from various databases using Change Data Capture (CDC), 
> most of these CDC connectors are powered by Debezium[3]. They support both 
> the DataStream API and the Table/SQL API, facilitating the reading of 
> database snapshots and continuous reading of transaction logs with 
> exactly-once processing, even in the event of failures.
> 
> 
> Additionally, in the latest version 3.0, we have introduced many long-awaited 
> features. Starting from CDC version 3.0, we've built a Streaming ELT 
> Framework available for streaming data integration. This framework allows 
> users to write their data synchronization logic in a simple YAML file, which 
> will automatically be translated into a Flink DataStreaming job. It 
> emphasizes optimizing the task submission process and offers advanced 
> functionalities such as whole database synchronization, merging sharded 
> tables, and schema evolution[4].
> 
> 
> I believe this initiative is a perfect match for both sides. For the Flink 
> community, it presents an opportunity to enhance Flink's competitive 
> advantage in streaming data integration, promoting the healthy growth and 
> prosperity of the Apache Flink ecosystem. For the CDC Connectors project, 
> becoming a sub-project of Apache Flink means being part of a neutral 
> open-source community, which can attract a more diverse pool of contributors.
> 
> Please note that the aforementioned points represent only some of our 
> motivations and vision for this donation. Specific future operations need to 
> be further discussed in this thread. For example, the sub-project name after 
> the donation; we hope to name it Flink-CDC aiming to streaming data 
> intergration through Apache Flink, following the naming convention of 
> Flink-ML; And this project is managed by a total of 8 maintainers, including 
> 3 Flink PMC members and 1 Flink Committer. The remaining 4 maintainers are 
> also highly active contributors to the Flink community, donating this project 
> to the Flink community implies that their permissions might be reduced. 
> Therefore, we may need to bring up this topic for further discussion within 
> the Flink PMC. Additionally, we need to discuss how to migrate existing users 
> and documents. We have a user group of nearly 10,000 people and a 
> multi-version documentation site need to migrate. We also need to plan for 
> the migration of CI/CD processes and other specifics. 
> 
> 
> While there are many intricate details that require implementation, we are 
> committed to progressing and finalizing this donation process.
> 
> 
> Despite being Flink’s most active ecological project (as evaluated by GitHub 
> metrics), it also boasts a significant user base. However, I believe it's 
> essential to commence discussions on future operations only after the 
> community reaches a consensus on whether they desire this donation.
> 
> 
> Really looking forward to hear what you think! 
> 
> 
> Best,
> Leonard (on behalf of the Flink CDC Connectors project maintainers)
> 
> [1] https://github.com/ververica/flink-cdc-connectors
> [2] 
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-connectors.html
> [3] https://debezium.io
> [4] 
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-pipeline.html


Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink

2023-12-10 Thread Zirui Peng
+1 for me,

As a contributor in Flink CDC, I've witness so much excellent work that Flink 
CDC community have done for the latest years. Also Apache InLong benefits a lot 
on Flink CDC for its ability to support database changes. Much appreciation for 
the Flink CDC community and its maintainers.

It's a great pleasure to support this contribution.

Best,
Zirui.

On 2023/12/08 07:10:48 Yu Li wrote:
> +1
> 
> Thanks Leonard for the proposal, and all Flink CDC contributors for the
> existing great works. I believe these two projects are a perfect fit and
> the donation will benifit both.
> 
> Best Regards,
> Yu
> 
> 
> On Fri, Dec 8, 2023 at 2:49 PM bai_wentao1  wrote:
> 
> > +1 for this exciting work
> >
> >
> > Best regards,
> > WT
> >
> >
> > | |
> > bai_wentao1
> > |
> > |
> > bai_went...@163.com
> > |
> >  Replied Message 
> > | From | ConradJam |
> > | Date | 12/8/2023 14:26 |
> > | To |  |
> > | Subject | Re: [PROPOSAL] Contribute Flink CDC Connectors project to
> > Apache Flink |
> > +1 Best Idea . Thanks to all the contributors of Flink CDC for the great
> > work.
> >
> > yuxia  于2023年12月7日周四 20:46写道:
> >
> > +1 for this. Thanks all the contributors of Flink CDC for the great work.
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Martijn Visser" 
> > 收件人: "dev" 
> > 发送时间: 星期四, 2023年 12 月 07日 下午 8:14:36
> > 主题: Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink
> >
> > Hi Leonard,
> >
> > +1 for this. I think the CDC connectors are a great result and example
> > of the Flink CDC community. Kudos to you all.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Thu, Dec 7, 2023 at 10:40 AM Márton Balassi 
> > wrote:
> >
> > Hi Leonard,
> >
> > Thank you for the excellent work you and the team working on the CDC
> > connectors project have been doing so far. I am +1 of having them under
> > Flink's umbrella.
> >
> > On Thu, Dec 7, 2023 at 10:26 AM Etienne Chauchot 
> > wrote:
> >
> > Big +1, thanks this will be a very useful addition to Flink.
> >
> > Best
> >
> > Etienne
> >
> > Le 07/12/2023 à 09:26, Hang Ruan a écrit :
> > +1 for contributing CDC Connectors  to Apache Flink.
> >
> > Best,
> > Hang
> >
> > Yuxin Tan  于2023年12月7日周四 16:05写道:
> >
> > Cool, +1 for contributing CDC Connectors to Apache Flink.
> >
> > Best,
> > Yuxin
> >
> >
> > Jing Ge  于2023年12月7日周四 15:43写道:
> >
> > Awesome! +1
> >
> > Best regards,
> > Jing
> >
> > On Thu, Dec 7, 2023 at 8:34 AM Sergey Nuyanzin >
> > wrote:
> >
> > thanks for working on this and driving it
> >
> > +1
> >
> > On Thu, Dec 7, 2023 at 7:26 AM Feng Jin
> > wrote:
> >
> > This is incredibly exciting news, a big +1 for this.
> >
> > Thank you for the fantastic work on Flink CDC. We have created
> > thousands
> > of
> > real-time integration jobs using Flink CDC connectors.
> >
> >
> > Best,
> > Feng
> >
> > On Thu, Dec 7, 2023 at 1:45 PM gongzhongqiang <
> > gongzhongqi...@apache.org
> > wrote:
> >
> > It's very exciting to hear the news.
> > +1 for adding CDC Connectors  to Apache Flink !
> >
> >
> > Best,
> > Zhongqiang
> >
> > Leonard Xu  于2023年12月7日周四 11:25写道:
> >
> > Dear Flink devs,
> >
> >
> > As you may have heard, we at Alibaba (Ververica) are planning
> > to
> > donate
> > CDC Connectors for the Apache Flink project
> > *[1]* to the Apache Flink community.
> >
> >
> >
> > CDC Connectors for Apache Flink comprise a collection of source
> > connectors designed specifically for Apache Flink. These
> > connectors
> > *[2]*
> > enable the ingestion of changes from various databases using
> > Change
> > Data Capture (CDC), most of these CDC connectors are powered by
> > Debezium
> > *[3]*
> > . They support both the DataStream API and the Table/SQL API,
> > facilitating the reading of database snapshots and continuous
> > reading
> > of
> > transaction logs with exactly-once processing, even in the
> > event of
> > failures.
> >
> >
> > Additionally, in the latest version 3.0, we have introduced
> > many
> > long-awaited features. Starting from CDC version 3.0, we've
> > built a
> > Streaming ELT Framework available for streaming data
> > integration.
> > This
> > framework allows users to write their data synchronization logic
> > in a
> > simple YAML file, which will automatically be translated into a
> > Flink
> > DataStreaming job. It emphasizes optimizing the task submission
> > process
> > and
> > offers advanced functionalities such as whole database
> > synchronization,
> > merging sharded tables, and schema evolution
> > *[4]*.
> >
> >
> >
> >
> > I believe this initiative is a perfect match for both sides.
> > For
> > the
> > Flink community, it presents an opportunity to enhance Flink's
> > competitive
> > advantage in streaming data integration, promoting the healthy
> > growth
> > and
> > prosperity of the Apache Flink ecosystem. For the CDC Connectors
> > project,
> > becoming a sub-project of Apache Flink means being part of a
> > neutral
> > open-source community, which can attract a more diverse poo

Re: [ANNOUNCE] Experimental Java 21 support now available on master

2023-12-10 Thread xiangyu feng
Thanks Sergey for the great work!

ZGC in JDK21 is now a multi-generational garbage collector. In jdk17, we
tried using ZGC as the default garbage collector for streaming
computations, but found that while it reduced STW, it did have a negative
impact on job throughput because a single-generation collector was not as
efficient as a generational collector. This issue might be resolved in
JDK21.

Looking forward to this!

Best Regards,
Xiangyu

Sergey Nuyanzin  于2023年12月11日周一 07:06写道:

> thanks for checking and creation the ticket
>
> yes, that probably makes sense
>
> On Thu, Nov 30, 2023 at 1:08 PM Yun Tang  wrote:
>
> > Hi Sergey,
> >
> > I checked the CI [1] which was executed with Java21, and noticed that the
> > StatefulJobSnapshotMigrationITCase-related tests have passed, which
> proves
> > what I guessed before, most checkpoints/savepoints should be restored
> > successfully.
> >
> > I think we shall introduce such snapshot migration tests, which restore
> > snapshots containing scala code. I also create a ticket focused on Java17
> > [2]
> >
> >
> > [1]
> >
> https://dev.azure.com/snuyanzin/flink/_build/results?buildId=2620&view=logs&j=0a15d512-44ac-5ba5-97ab-13a5d066c22c&t=9a028d19-6c4b-5a4e-d378-03fca149d0b1
> > [2] https://issues.apache.org/jira/browse/FLINK-33707
> >
> >
> > Best
> > Yun Tang
> > 
> > From: Sergey Nuyanzin 
> > Sent: Thursday, November 30, 2023 14:41
> > To: dev@flink.apache.org 
> > Subject: Re: [ANNOUNCE] Experimental Java 21 support now available on
> > master
> >
> > Thanks Yun Tang
> >
> > One question to clarify: since the scala version was also bumped for java
> > 17, shouldn't there be a similar task for java 17?
> >
> > On Thu, Nov 30, 2023 at 3:43 AM Yun Tang  wrote:
> >
> > > Hi Sergey,
> > >
> > > You can leverage all tests extending SnapshotMigrationTestBase[1] to
> > > verify the logic. I believe all binary _metadata existing in the
> > resources
> > > folder[2] were built by JDK8.
> > >
> > > I also create a ticket FLINK-33699[3] to track this.
> > >
> > > [1]
> > >
> >
> https://github.com/apache/flink/blob/master/flink-tests/src/test/java/org/apache/flink/test/checkpointing/utils/SnapshotMigrationTestBase.java
> > > [2]
> > >
> >
> https://github.com/apache/flink/tree/master/flink-tests/src/test/resources
> > > [3] https://issues.apache.org/jira/browse/FLINK-33699
> > >
> > > Best
> > > Yun Tang
> > > 
> > > From: Sergey Nuyanzin 
> > > Sent: Wednesday, November 29, 2023 22:56
> > > To: dev@flink.apache.org 
> > > Subject: Re: [ANNOUNCE] Experimental Java 21 support now available on
> > > master
> > >
> > > thanks for the response
> > >
> > >
> > > >I feel doubt about the conclusion that "don't try to load a savepoint
> > from
> > > a Java 8/11/17 build due to bumping to scala-2.12.18", since the
> > > snapshotted state (operator/keyed state-backend),  and most key/value
> > > serializer snapshots are generated by pure-java code.
> > > >The only left part is that the developer uses scala UDF or scala types
> > for
> > > key/value types. However, since all user-facing scala APIs have been
> > > deprecated, I don't think we have so many cases. Maybe we can give
> > > descriptions without such strong suggestions.
> > >
> > > That is the area where I feel I lack the knowledge to answer this
> > > precisely.
> > > My assumption was that statement about Java 21 regarding this should be
> > > similar to Java 17 which is almost same [1]
> > > Sorry for the inaccuracy
> > > Based on your statements I agree that the conclusion could be more
> > relaxed.
> > >
> > > I'm curious whether there are some tests or anything which could
> clarify
> > > this?
> > >
> > > [1] https://lists.apache.org/thread/mz0m6wqjmqy8htob3w4469pjbg9305do
> > >
> > > On Wed, Nov 29, 2023 at 12:25 PM Yun Tang  wrote:
> > >
> > > > Thanks Sergey for the great work.
> > > >
> > > > I feel doubt about the conclusion that "don't try to load a savepoint
> > > from
> > > > a Java 8/11/17 build due to bummping to scala-2.12.18", since the
> > > > snapshotted state (operator/keyed state-backend),  and most key/value
> > > > serializer snapshots are generated by pure-java code. The only left
> > part
> > > is
> > > > that the developer uses scala UDF or scala types for key/value types.
> > > > However, since all user-facing scala APIs have been deprecated [1], I
> > > don't
> > > > think we have so many cases. Maybe we can give descriptions without
> > such
> > > > strong suggestions.
> > > >
> > > > Please correct me if I am wrong.
> > > >
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-29740
> > > >
> > > > Best
> > > > Yun Tang
> > > >
> > > > 
> > > > From: Rui Fan <1996fan...@gmail.com>
> > > > Sent: Wednesday, November 29, 2023 16:43
> > > > To: dev@flink.apache.org 
> > > > Subject: Re: [ANNOUNCE] Experimental Java 21 support now available on
> > > > master
> > > >
> > > > Thanks Sergey for

Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink

2023-12-10 Thread The Xia
+1

On 2023/12/11 02:33:24 Xin Gong wrote:
> 
> good news.
> 
> +1
> 
> Best,
> gongxin
> On 2023/12/07 03:24:59 Leonard Xu wrote:
> > Dear Flink devs,
> > 
> > As you may have heard, we at Alibaba (Ververica) are planning to donate CDC 
> > Connectors for the Apache Flink project[1] to the Apache Flink community.
> > 
> > CDC Connectors for Apache Flink comprise a collection of source connectors 
> > designed specifically for Apache Flink. These connectors[2] enable the 
> > ingestion of changes from various databases using Change Data Capture 
> > (CDC), most of these CDC connectors are powered by Debezium[3]. They 
> > support both the DataStream API and the Table/SQL API, facilitating the 
> > reading of database snapshots and continuous reading of transaction logs 
> > with exactly-once processing, even in the event of failures.
> > 
> > 
> > Additionally, in the latest version 3.0, we have introduced many 
> > long-awaited features. Starting from CDC version 3.0, we've built a 
> > Streaming ELT Framework available for streaming data integration. This 
> > framework allows users to write their data synchronization logic in a 
> > simple YAML file, which will automatically be translated into a Flink 
> > DataStreaming job. It emphasizes optimizing the task submission process and 
> > offers advanced functionalities such as whole database synchronization, 
> > merging sharded tables, and schema evolution[4].
> > 
> > 
> > I believe this initiative is a perfect match for both sides. For the Flink 
> > community, it presents an opportunity to enhance Flink's competitive 
> > advantage in streaming data integration, promoting the healthy growth and 
> > prosperity of the Apache Flink ecosystem. For the CDC Connectors project, 
> > becoming a sub-project of Apache Flink means being part of a neutral 
> > open-source community, which can attract a more diverse pool of 
> > contributors.
> > 
> > Please note that the aforementioned points represent only some of our 
> > motivations and vision for this donation. Specific future operations need 
> > to be further discussed in this thread. For example, the sub-project name 
> > after the donation; we hope to name it Flink-CDC aiming to streaming data 
> > intergration through Apache Flink, following the naming convention of 
> > Flink-ML; And this project is managed by a total of 8 maintainers, 
> > including 3 Flink PMC members and 1 Flink Committer. The remaining 4 
> > maintainers are also highly active contributors to the Flink community, 
> > donating this project to the Flink community implies that their permissions 
> > might be reduced. Therefore, we may need to bring up this topic for further 
> > discussion within the Flink PMC. Additionally, we need to discuss how to 
> > migrate existing users and documents. We have a user group of nearly 10,000 
> > people and a multi-version documentation site need to migrate. We also need 
> > to plan for the migration of CI/CD processes and other specifics. 
> > 
> > 
> > While there are many intricate details that require implementation, we are 
> > committed to progressing and finalizing this donation process.
> > 
> > 
> > Despite being Flink’s most active ecological project (as evaluated by 
> > GitHub metrics), it also boasts a significant user base. However, I believe 
> > it's essential to commence discussions on future operations only after the 
> > community reaches a consensus on whether they desire this donation.
> > 
> > 
> > Really looking forward to hear what you think! 
> > 
> > 
> > Best,
> > Leonard (on behalf of the Flink CDC Connectors project maintainers)
> > 
> > [1] https://github.com/ververica/flink-cdc-connectors
> > [2] 
> > https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-connectors.html
> > [3] https://debezium.io
> > [4] 
> > https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-pipeline.html
> 


Re: [DISCUSS] FLIP-316: Introduce SQL Driver

2023-12-10 Thread Paul Lam
Hi Ferenc,

Sorry for my late reply. 

> Is any active work happening on this FLIP? As far as I see there
> are blockers that needs to happen first to implement regarding
> artifact distribution.

You’re right. There’s a block in K8s application mode, but none in
YARN application. I’m doing a POC on YARN application mode
before starting a vote thread.

I’ve been busy lately, but the FLIP is active for sure. The situation
will change in a couple of weeks.

Thank you for reaching out! I’ll let you know when the POC is completed.

Best,
Paul Lam

> 2023年11月21日 06:01,Ferenc Csaky  写道:
> 
> Hello devs,
> 
> Is any active work happening on this FLIP? As far as I see there
> are blockers that needs to happen first to implement regarding
> artifact distribution.
> 
> Is this work in halt completetly or some efforts are going into
> resolve the blockers first or something?
> 
> Our platform would benefit this feature a lot, we have a kind of
> working custom implementation at the moment, but it is uniquely
> adapted to our app and platform.
> 
> I could help out to move this forward.
> 
> Best,
> Ferenc
> 
> 
> 
> On Friday, June 30th, 2023 at 04:53, Paul Lam  > wrote:
> 
> 
>> 
>> 
>> Hi Jing,
>> 
>> Thanks for your input!
>> 
>>> Would you like to add
>>> one section to describe(better with script/code example) how to use it in
>>> these two scenarios from users' perspective?
>> 
>> 
>> OK. I’ll update the FLIP with the code snippet after I get the POC branch 
>> done.
>> 
>>> NIT: the pictures have transparent background when readers click on it. It
>>> would be great if you can replace them with pictures with white background.
>> 
>> 
>> Fixed. Thanks for pointing that out :)
>> 
>> Best,
>> Paul Lam
>> 
>>> 2023年6月27日 06:51,Jing Ge j...@ververica.com.INVALID 写道:
>>> 
>>> Hi Paul,
>>> 
>>> Thanks for driving it and thank you all for the informative discussion! The
>>> FLIP is in good shape now. As described in the FLIP, SQL Driver will be
>>> mainly used to run Flink SQLs in two scenarios: 1. SQL client/gateway in
>>> application mode and 2. external system integration. Would you like to add
>>> one section to describe(better with script/code example) how to use it in
>>> these two scenarios from users' perspective?
>>> 
>>> NIT: the pictures have transparent background when readers click on it. It
>>> would be great if you can replace them with pictures with white background.
>>> 
>>> Best regards,
>>> Jing
>>> 
>>> On Mon, Jun 26, 2023 at 1:31 PM Paul Lam >>  mailto:paullin3...@gmail.com> wrote:
>>> 
 Hi Shengkai,
 
> * How can we ship the json plan to the JobManager?
 
 The Flink K8s module should be responsible for file distribution. We could
 introduce
 an option like `kubernetes.storage.dir`. For each flink cluster, there
 would be a
 dedicated subdirectory, with the pattern like
 `${kubernetes.storage.dir}/${cluster-id}`.
 
 All resources-related options (e.g. pipeline jars, json plans) that are
 configured with
 scheme `file://`   > would 
 be uploaded to the resource directory
 and downloaded to the
 jobmanager, before SQL Driver accesses the files with the original
 filenames.
 
> * Classloading strategy
 
 We could directly specify the SQL Gateway jar as the jar file in
 PackagedProgram.
 It would be treated like a normal user jar and the SQL Driver is loaded
 into the user
 classloader. WDYT?
 
> * Option `$internal.sql-gateway.driver.sql-config` is string type
> I think it's better to use Map type here
 
 By Map type configuration, do you mean a nested map that contains all
 configurations?
 
 I hope I've explained myself well, it’s a file that contains the extra SQL
 configurations, which would be shipped to the jobmanager.
 
> * PoC branch
 
 Sure. I’ll let you know once I get the job done.
 
 Best,
 Paul Lam
 
> 2023年6月26日 14:27,Shengkai Fang   mailto:fskm...@gmail.com> 写道:
> 
> Hi, Paul.
> 
> Thanks for your update. I have a few questions about the new design:
> 
> * How can we ship the json plan to the JobManager?
> 
> The current design only exposes an option about the URL of the json
> plan. It seems the gateway is responsible to upload to an external 
> stroage.
> Can we reuse the PipelineOptions.JARS to ship to the remote filesystem?
> 
> * Classloading strategy
> 
> Currently, the Driver is in the sql-gateway package. It means the Driver
> is not in the JM's classpath directly. Because the sql-gateway jar is now
> in the opt directory rather than lib directory. It may need to add the
> external dependencies as Python does[1]. BTW, I think it's better to move
> the Driver into the flink-table-runtime package, which is much easier to
> find(Sorry for 

[jira] [Created] (FLINK-33790) Upsert statement filter unique key field colume in mysql dielact

2023-12-10 Thread JingWei Li (Jira)
JingWei Li created FLINK-33790:
--

 Summary: Upsert statement filter unique key field colume in mysql 
dielact 
 Key: FLINK-33790
 URL: https://issues.apache.org/jira/browse/FLINK-33790
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Reporter: JingWei Li


example: `col2` and `col4` is unique key in table `my_table`

 
{code:java}
INSERT INTO `my_table`(`col1`, `col2`, `col3`, `col4`, `col5`) 
VALUES (?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE 
`col1`=VALUES(`col1`),
`col2`=VALUES(`col2`),
`col3`=VALUES(`col3`),
`col4`=VALUES(`col4`),
`col5`=VALUES(`col5`){code}
result:
{code:java}
INSERT INTO `my_table`(`col1`, `col2`, `col3`, `col4`, `col5`) 
VALUES (?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE 
`col1`=VALUES(`col1`),
`col3`=VALUES(`col3`),
`col5`=VALUES(`col5`) {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re:Re: Re: Re: Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-10 Thread Xuyang
Hi, Shengkai.


> I think we shouldn't remove the operator if we can not give a solution to 
> help users upgrade their jobs. But I think we can delay the discussion until 
> we need to remove the operator. 


+1 for it.







--

Best!
Xuyang




在 2023-12-08 19:22:40,"Shengkai Fang"  写道:

Hi, Xuyang. Thanks for your response.


I just thought an another way to solve this question instead of introducing a 
new configuration. When using legacy syntax like `GROUP BY TUMBLE(xxx), f0`, 
the rewritten sql can be GROUP BY f0, window_start, window_end(window_start and 
window_end is produced by WINDOW TVF). We can use field index here and use a 
new Calc node to alias them to avoid field name conflict in WindowAggregate 
node.


The solution is much better than before and I think it can solve the problem I 
mentioned before. 


What about using config named "table.optimizer.window-rewrite-enabled"


+1  


IIRC, currently compatibility across middle versions of SQL is not guaranteed. 
Should we add constraints on this part?


I think we shouldn't remove the operator if we can not give a solution to help 
users upgrade their jobs. But I think we can delay the discussion until we need 
to remove the operator. 


Best,
Shengkai


Xuyang  于2023年12月8日周五 18:07写道:

Hi, Martijn, thanks for your share.


>On the topic of syntax for early/late fires, there is existing
>configuration for the legacy group windows:
>
>SET table.exec.emit.early-fire.enabled = true;
>SET table.exec.emit.early-fire.delay = 5s;
>SET table.exec.emit.late-fire.enabled = true;
>SET table.exec.emit.late-fire.delay = 0;
>SET table.exec.emit.allow-lateness = 5s;
>
>We should stick with the syntax for the TVFs, and not modify that.
Agree with you. We should follow the syntax defined in Flip-145. As for how to 
let these options to only take effect on a single window agg instead of all 
window aggs, we need to think of another way.
>On the topic of column naming, for other situations where a user wants
>to use a value that's already reserved, we require the user to include
>backticks to indicate that Flink should not use the reserved keyword
>implementation. Why isn't that sufficient in this case? I rather stay
>consistent with this behaviour instead of introducing new config
>options.
>
I just thought an another way to solve this question instead of introducing a 
new configuration. When using legacy syntax like `GROUP BY TUMBLE(xxx), f0`, 
the rewritten sql can be GROUP BY f0, window_start, window_end(window_start and 
window_end is produced by WINDOW TVF). We can use field index here and use a 
new Calc node to alias them to avoid field name conflict in WindowAggregate 
node.
What about this idea, cc  @Shengkai Fang ?


>I would propose to rename the FLIP to
>something like "Add Missing Table Valued Functions Features to Replace
>Legacy Group Window Aggregation".
IMO, the original title "deprecate" maybe more clear. Because although the doc 
has deprecated the legacy Group Window Aggregation syntax just like what I said 
in this Flip, but the DEPRECATED tag in doc is attached when doing the 
Flip-145. Let's review the content in Flip-145 again. The Flip-145 said "The 
existing Grouped window functions, i.e. GROUP BY TUMBLE... are still supported, 
but will be deprecated". So as a work to follow Flip-145, this Flip officially 
deprecates the legacy window syntax at this time. WDTK?



--

Best!
Xuyang





At 2023-12-07 16:51:44, "Martijn Visser"  wrote:
>Hi Xuyang,
>
>Thanks a lot for starting this discussion.
>
>At first, I was a bit confused because the FLIP talks about
>deprecating the Legacy Group Window Aggregations, but they have
>already been marked as deprecated in the documentation [1].
>My understanding was that the big challenge was that we don't yet
>support SESSION windows in the Window TVF, and that the other features
>you've mentioned in the discussion threads are additional
>capabilities.
>
>However, when reading up on the actual FLIP (your discussion email
>didn't include a link [2] to it) I now understand the situation. The
>appendix table is actually the most valuable for me, because it gives
>me the overview of the missing capabilities between TVF implementation
>and the Legacy Group Window Aggregations.
>
>On the topic of syntax for early/late fires, there is existing
>configuration for the legacy group windows:
>
>SET table.exec.emit.early-fire.enabled = true;
>SET table.exec.emit.early-fire.delay = 5s;
>SET table.exec.emit.late-fire.enabled = true;
>SET table.exec.emit.late-fire.delay = 0;
>SET table.exec.emit.allow-lateness = 5s;
>
>We should stick with the syntax for the TVFs, and not modify that.
>
>On the topic of column naming, for other situations where a user wants
>to use a value that's already reserved, we require the user to include
>backticks to indicate that Flink should not use the reserved keyword
>implementation. Why isn't that sufficient in this case? I rather stay
>consistent with this behaviou

[jira] [Created] (FLINK-33791) Fix NPE when array is null in PostgresArrayConverter in flink-connector-jdbc

2023-12-10 Thread JingWei Li (Jira)
JingWei Li created FLINK-33791:
--

 Summary: Fix NPE when array is null in PostgresArrayConverter in 
flink-connector-jdbc
 Key: FLINK-33791
 URL: https://issues.apache.org/jira/browse/FLINK-33791
 Project: Flink
  Issue Type: Bug
  Components: Connectors / JDBC
Reporter: JingWei Li


{code:java}
// private JdbcDeserializationConverter createPostgresArrayConverter(ArrayType 
arrayType) {
// Since PGJDBC 42.2.15 (https://github.com/pgjdbc/pgjdbc/pull/1194) 
bytea[] is wrapped in
// primitive byte arrays
final Class elementClass =

LogicalTypeUtils.toInternalConversionClass(arrayType.getElementType());
final JdbcDeserializationConverter elementConverter =
createNullableInternalConverter(arrayType.getElementType());
return val -> {
@SuppressWarnings("unchecked")
T pgArray = (T) val;
Object[] in = (Object[]) pgArray.getArray();
final Object[] array = (Object[]) Array.newInstance(elementClass, 
in.length);
for (int i = 0; i < in.length; i++) {
array[i] = elementConverter.deserialize(in[i]);
}
return new GenericArrayData(array);
};
} {code}
When use this method, array is null pgArray.getArray() will throw NPE。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink

2023-12-10 Thread Jiafeng Zhang
+1
Great, cdc is very important for real-time data integration, and this project 
is also widely used, I am optimistic about this project

Jiafeng.Zhang / 张家锋
Email: jiafengzh...@apache.org


Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink

2023-12-10 Thread wudi
Awesome, +1

Brs,

di.wu

> 2023年12月7日 上午11:24,Leonard Xu  写道:
> 
> Dear Flink devs,
> 
> As you may have heard, we at Alibaba (Ververica) are planning to donate CDC 
> Connectors for the Apache Flink project[1] to the Apache Flink community.
> 
> CDC Connectors for Apache Flink comprise a collection of source connectors 
> designed specifically for Apache Flink. These connectors[2] enable the 
> ingestion of changes from various databases using Change Data Capture (CDC), 
> most of these CDC connectors are powered by Debezium[3]. They support both 
> the DataStream API and the Table/SQL API, facilitating the reading of 
> database snapshots and continuous reading of transaction logs with 
> exactly-once processing, even in the event of failures.
> 
> 
> Additionally, in the latest version 3.0, we have introduced many long-awaited 
> features. Starting from CDC version 3.0, we've built a Streaming ELT 
> Framework available for streaming data integration. This framework allows 
> users to write their data synchronization logic in a simple YAML file, which 
> will automatically be translated into a Flink DataStreaming job. It 
> emphasizes optimizing the task submission process and offers advanced 
> functionalities such as whole database synchronization, merging sharded 
> tables, and schema evolution[4].
> 
> 
> I believe this initiative is a perfect match for both sides. For the Flink 
> community, it presents an opportunity to enhance Flink's competitive 
> advantage in streaming data integration, promoting the healthy growth and 
> prosperity of the Apache Flink ecosystem. For the CDC Connectors project, 
> becoming a sub-project of Apache Flink means being part of a neutral 
> open-source community, which can attract a more diverse pool of contributors.
> 
> Please note that the aforementioned points represent only some of our 
> motivations and vision for this donation. Specific future operations need to 
> be further discussed in this thread. For example, the sub-project name after 
> the donation; we hope to name it Flink-CDC aiming to streaming data 
> intergration through Apache Flink, following the naming convention of 
> Flink-ML; And this project is managed by a total of 8 maintainers, including 
> 3 Flink PMC members and 1 Flink Committer. The remaining 4 maintainers are 
> also highly active contributors to the Flink community, donating this project 
> to the Flink community implies that their permissions might be reduced. 
> Therefore, we may need to bring up this topic for further discussion within 
> the Flink PMC. Additionally, we need to discuss how to migrate existing users 
> and documents. We have a user group of nearly 10,000 people and a 
> multi-version documentation site need to migrate. We also need to plan for 
> the migration of CI/CD processes and other specifics. 
> 
> 
> While there are many intricate details that require implementation, we are 
> committed to progressing and finalizing this donation process.
> 
> 
> Despite being Flink’s most active ecological project (as evaluated by GitHub 
> metrics), it also boasts a significant user base. However, I believe it's 
> essential to commence discussions on future operations only after the 
> community reaches a consensus on whether they desire this donation.
> 
> 
> Really looking forward to hear what you think! 
> 
> 
> Best,
> Leonard (on behalf of the Flink CDC Connectors project maintainers)
> 
> [1] https://github.com/ververica/flink-cdc-connectors
> [2] 
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-connectors.html
> [3] https://debezium.io
> [4] 
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-pipeline.html



Re:Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-10 Thread Xuyang
Hi, Jim.
>As a clarification, since FLINK-24204 is finishing up work from
>FLIP-145[1], do we need to discuss anything before you work out the details
>of FLINK-24024 as a PR?
Which issue do you mean? It seems that FLINK-24204[1] is the issue with table 
api&sql type system.


> I've got a PR up [3] for moving at least one of the classes you are touching.
Nice work! Since we are not going to delete the legacy group window agg 
operator actually, the only compatibility issue
may be that when using flink sql, the legacy group window agg operator will be 
rewritten into new operators. Will these tests be affected about this rewritten?




[1] https://issues.apache.org/jira/browse/FLINK-24204






--

Best!
Xuyang





At 2023-12-09 06:25:30, "Jim Hughes"  wrote:
>Hi Xuyang,
>
>As a clarification, since FLINK-24204 is finishing up work from
>FLIP-145[1], do we need to discuss anything before you work out the details
>of FLINK-24024 as a PR?
>
>Relatedly, as that goes up for a PR, as part of FLINK-33421 [2], Bonnie and
>I are working through migrating some of the JsonPlan Tests and ITCases to
>RestoreTests.  I've got a PR up [3] for moving at least one of the classes
>you are touching.  Let me know if I can share any details about that work.
>
>Cheers,
>
>Jim
>
>1.
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows
>
>2. https://issues.apache.org/jira/browse/FLINK-33421
>3. https://github.com/apache/flink/pull/23886
>https://issues.apache.org/jira/browse/FLINK-33676
>
>On Tue, Nov 28, 2023 at 7:31 AM Xuyang  wrote:
>
>> Hi all.
>> I'd like to start a discussion of FLIP-392: Deprecate the Legacy Group
>> Window Aggregation.
>>
>>
>> Although the current Flink SQL Window Aggregation documentation[1]
>> indicates that the legacy Group Window Aggregation
>> syntax has been deprecated, the new Window TVF Aggregation syntax has not
>> fully covered all of the features of the legacy one.
>>
>>
>> Compared to Group Window Aggergation, Window TVF Aggergation has several
>> advantages, such as two-stage optimization,
>> support for standard GROUPING SET syntax, and so on. However, it needs to
>> supplement and enrich the following features.
>>
>>
>> 1. Support for SESSION Window TVF Aggregation
>> 2. Support for consuming CDC stream
>> 3. Support for HOP window size with non-integer step length
>> 4. Support for configurations such as early fire, late fire and allow
>> lateness
>> (which are internal experimental configurations in Group Window
>> Aggregation and not public to users yet.)
>> 5. Unification of the Window TVF Aggregation operator in runtime at the
>> implementation layer
>> (In the long term, the cost to maintain the operators about Window TVF
>> Aggregation and Group Window Aggregation is too expensive.)
>>
>>
>> This flip aims to continue the unfinished work in FLIP-145[2], which is to
>> fully enable the capabilities of Window TVF Aggregation
>>  and officially deprecate the legacy syntax Group Window Aggregation, to
>> prepare for the removal of the legacy one in Flink 2.0.
>>
>>
>> I have already done some preliminary POC to validate the feasibility of
>> the related work in this flip as follows.
>> 1. POC for SESSION Window TVF Aggregation [3]
>> 2. POC for CUMULATE in Group Window Aggregation operator [4]
>> 3. POC for consuming CDC stream in Window Aggregation operator [5]
>>
>>
>> Looking forward to your feedback and thoughts!
>>
>>
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/
>>
>> [2]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows
>> [3] https://github.com/xuyangzhong/flink/tree/FLINK-24024
>> [4]
>> https://github.com/xuyangzhong/flink/tree/poc_legacy_group_window_agg_cumulate
>> [5]
>> https://github.com/xuyangzhong/flink/tree/poc_window_agg_consumes_cdc_stream
>>
>>
>>
>> --
>>
>> Best!
>> Xuyang


Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink

2023-12-10 Thread Yunqing Mo
So cool, Big +1 for this exciting work.

On 2023/12/07 03:24:59 Leonard Xu wrote:
> Dear Flink devs,
> 
> As you may have heard, we at Alibaba (Ververica) are planning to donate CDC 
> Connectors for the Apache Flink project[1] to the Apache Flink community.
> 
> CDC Connectors for Apache Flink comprise a collection of source connectors 
> designed specifically for Apache Flink. These connectors[2] enable the 
> ingestion of changes from various databases using Change Data Capture (CDC), 
> most of these CDC connectors are powered by Debezium[3]. They support both 
> the DataStream API and the Table/SQL API, facilitating the reading of 
> database snapshots and continuous reading of transaction logs with 
> exactly-once processing, even in the event of failures.
> 
> 
> Additionally, in the latest version 3.0, we have introduced many long-awaited 
> features. Starting from CDC version 3.0, we've built a Streaming ELT 
> Framework available for streaming data integration. This framework allows 
> users to write their data synchronization logic in a simple YAML file, which 
> will automatically be translated into a Flink DataStreaming job. It 
> emphasizes optimizing the task submission process and offers advanced 
> functionalities such as whole database synchronization, merging sharded 
> tables, and schema evolution[4].
> 
> 
> I believe this initiative is a perfect match for both sides. For the Flink 
> community, it presents an opportunity to enhance Flink's competitive 
> advantage in streaming data integration, promoting the healthy growth and 
> prosperity of the Apache Flink ecosystem. For the CDC Connectors project, 
> becoming a sub-project of Apache Flink means being part of a neutral 
> open-source community, which can attract a more diverse pool of contributors.
> 
> Please note that the aforementioned points represent only some of our 
> motivations and vision for this donation. Specific future operations need to 
> be further discussed in this thread. For example, the sub-project name after 
> the donation; we hope to name it Flink-CDC aiming to streaming data 
> intergration through Apache Flink, following the naming convention of 
> Flink-ML; And this project is managed by a total of 8 maintainers, including 
> 3 Flink PMC members and 1 Flink Committer. The remaining 4 maintainers are 
> also highly active contributors to the Flink community, donating this project 
> to the Flink community implies that their permissions might be reduced. 
> Therefore, we may need to bring up this topic for further discussion within 
> the Flink PMC. Additionally, we need to discuss how to migrate existing users 
> and documents. We have a user group of nearly 10,000 people and a 
> multi-version documentation site need to migrate. We also need to plan for 
> the migration of CI/CD processes and other specifics. 
> 
> 
> While there are many intricate details that require implementation, we are 
> committed to progressing and finalizing this donation process.
> 
> 
> Despite being Flink’s most active ecological project (as evaluated by GitHub 
> metrics), it also boasts a significant user base. However, I believe it's 
> essential to commence discussions on future operations only after the 
> community reaches a consensus on whether they desire this donation.
> 
> 
> Really looking forward to hear what you think! 
> 
> 
> Best,
> Leonard (on behalf of the Flink CDC Connectors project maintainers)
> 
> [1] https://github.com/ververica/flink-cdc-connectors
> [2] 
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-connectors.html
> [3] https://debezium.io
> [4] 
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-pipeline.html


Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink

2023-12-10 Thread liu ron
+1

Best,
Ron

Yunqing Mo  于2023年12月11日周一 12:01写道:

> So cool, Big +1 for this exciting work.
>
> On 2023/12/07 03:24:59 Leonard Xu wrote:
> > Dear Flink devs,
> >
> > As you may have heard, we at Alibaba (Ververica) are planning to donate
> CDC Connectors for the Apache Flink project[1] to the Apache Flink
> community.
> >
> > CDC Connectors for Apache Flink comprise a collection of source
> connectors designed specifically for Apache Flink. These connectors[2]
> enable the ingestion of changes from various databases using Change Data
> Capture (CDC), most of these CDC connectors are powered by Debezium[3].
> They support both the DataStream API and the Table/SQL API, facilitating
> the reading of database snapshots and continuous reading of transaction
> logs with exactly-once processing, even in the event of failures.
> >
> >
> > Additionally, in the latest version 3.0, we have introduced many
> long-awaited features. Starting from CDC version 3.0, we've built a
> Streaming ELT Framework available for streaming data integration. This
> framework allows users to write their data synchronization logic in a
> simple YAML file, which will automatically be translated into a Flink
> DataStreaming job. It emphasizes optimizing the task submission process and
> offers advanced functionalities such as whole database synchronization,
> merging sharded tables, and schema evolution[4].
> >
> >
> > I believe this initiative is a perfect match for both sides. For the
> Flink community, it presents an opportunity to enhance Flink's competitive
> advantage in streaming data integration, promoting the healthy growth and
> prosperity of the Apache Flink ecosystem. For the CDC Connectors project,
> becoming a sub-project of Apache Flink means being part of a neutral
> open-source community, which can attract a more diverse pool of
> contributors.
> >
> > Please note that the aforementioned points represent only some of our
> motivations and vision for this donation. Specific future operations need
> to be further discussed in this thread. For example, the sub-project name
> after the donation; we hope to name it Flink-CDC aiming to streaming data
> intergration through Apache Flink, following the naming convention of
> Flink-ML; And this project is managed by a total of 8 maintainers,
> including 3 Flink PMC members and 1 Flink Committer. The remaining 4
> maintainers are also highly active contributors to the Flink community,
> donating this project to the Flink community implies that their permissions
> might be reduced. Therefore, we may need to bring up this topic for further
> discussion within the Flink PMC. Additionally, we need to discuss how to
> migrate existing users and documents. We have a user group of nearly 10,000
> people and a multi-version documentation site need to migrate. We also need
> to plan for the migration of CI/CD processes and other specifics.
> >
> >
> > While there are many intricate details that require implementation, we
> are committed to progressing and finalizing this donation process.
> >
> >
> > Despite being Flink’s most active ecological project (as evaluated by
> GitHub metrics), it also boasts a significant user base. However, I believe
> it's essential to commence discussions on future operations only after the
> community reaches a consensus on whether they desire this donation.
> >
> >
> > Really looking forward to hear what you think!
> >
> >
> > Best,
> > Leonard (on behalf of the Flink CDC Connectors project maintainers)
> >
> > [1] https://github.com/ververica/flink-cdc-connectors
> > [2]
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-connectors.html
> > [3] https://debezium.io
> > [4]
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-pipeline.html
>


[jira] [Created] (FLINK-33792) Generate the same code for the same logic

2023-12-10 Thread Dan Zou (Jira)
Dan Zou created FLINK-33792:
---

 Summary: Generate the same code for the same logic
 Key: FLINK-33792
 URL: https://issues.apache.org/jira/browse/FLINK-33792
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Dan Zou


Generate the same code for the same logic, so that we may reuse the generated 
code between different jobs. This is the precondition for FLINK-28691. The 
current issue is we use a self-incrementing counter in CodeGenUtils#newName, it 
means we could not get the same generated class between two queries even when 
they are exactly the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)