[jira] [Created] (FLINK-35648) Pipeline job doesn't support multiple routing
yux created FLINK-35648: --- Summary: Pipeline job doesn't support multiple routing Key: FLINK-35648 URL: https://issues.apache.org/jira/browse/FLINK-35648 Project: Flink Issue Type: Improvement Components: Flink CDC Reporter: yux Currently, any upstream could be routed at most once, which means if we wrote such route definition: routes: - source-table: db.(A|B) sink-table: terminal.one - source0table: db.(B|C) sink-table: terminal.two Any upstream schema / data changes from db.B will be sent to terminal.one {*}only{*}, not to terminal.two since it has been handled by the first route rule. This ticket suggests adding a route behavior option (FIRST_MATCH / COMPLETE) to configure if all route rules should be applied or only the first matched rule (for backwards compatibility.). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35647) Pipeline route rule supports placeholder replacement
yux created FLINK-35647: --- Summary: Pipeline route rule supports placeholder replacement Key: FLINK-35647 URL: https://issues.apache.org/jira/browse/FLINK-35647 Project: Flink Issue Type: Improvement Components: Flink CDC Reporter: yux Currently, we must provide explicit sink-table in pipeline route rules, which means if we'd like to route all tables in specific database and change names in pattern, there's no choice but write all rules separately, which is a chore. There's already an implementation ([https://github.com/apache/flink-cdc/pull/2908)] to use some placeholder syntax like (db.source<> -> db.sink<>) to perform replacement in batch, which could solve this problem. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35646) Add gateway refresh rest api doc
dalongliu created FLINK-35646: - Summary: Add gateway refresh rest api doc Key: FLINK-35646 URL: https://issues.apache.org/jira/browse/FLINK-35646 Project: Flink Issue Type: Sub-task Components: Documentation, Table SQL / Gateway Reporter: dalongliu Fix For: 1.20.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35645) Add materialized table quickstart doc
dalongliu created FLINK-35645: - Summary: Add materialized table quickstart doc Key: FLINK-35645 URL: https://issues.apache.org/jira/browse/FLINK-35645 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: dalongliu Fix For: 1.20.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35643) Add materialized table statement doc
dalongliu created FLINK-35643: - Summary: Add materialized table statement doc Key: FLINK-35643 URL: https://issues.apache.org/jira/browse/FLINK-35643 Project: Flink Issue Type: Sub-task Reporter: dalongliu -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35644) Add workflow scheduler doc
dalongliu created FLINK-35644: - Summary: Add workflow scheduler doc Key: FLINK-35644 URL: https://issues.apache.org/jira/browse/FLINK-35644 Project: Flink Issue Type: Sub-task Reporter: dalongliu -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35642) parsing the configuration file in flink, when configuring the s3 key, if there is contain # char, the correct key will be split
blackpighe created FLINK-35642: -- Summary: parsing the configuration file in flink, when configuring the s3 key, if there is contain # char, the correct key will be split Key: FLINK-35642 URL: https://issues.apache.org/jira/browse/FLINK-35642 Project: Flink Issue Type: Bug Components: API / Core Affects Versions: 1.19.1, 1.19.0, 1.14.3, 1.20.0 Reporter: blackpighe Fix For: 1.19.1 Attachments: image-2024-06-19-11-45-05-871.png parsing the configuration file in flink, when configuring the s3 key, if there is contain # char, the correct key will be split。 such as: s3.secret-key: Minio#dct@78 !image-2024-06-19-11-45-05-871.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [ANNOUNCE] Apache Flink CDC 3.1.1 released
Well done! Thanks Qingsheng for the release work and all contributors involved. Best, Zhongqiang Gong Qingsheng Ren 于2024年6月18日周二 23:51写道: > The Apache Flink community is very happy to announce the release of Apache > Flink CDC 3.1.1. > > Apache Flink CDC is a distributed data integration tool for real time data > and batch data, bringing the simplicity and elegance of data integration > via YAML to describe the data movement and transformation in a data > pipeline. > > Please check out the release blog post for an overview of the release: > > https://flink.apache.org/2024/06/18/apache-flink-cdc-3.1.1-release-announcement/ > > The release is available for download at: > https://flink.apache.org/downloads.html > > Maven artifacts for Flink CDC can be found at: > https://search.maven.org/search?q=g:org.apache.flink%20cdc > > The full release notes are available in Jira: > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354763 > > We would like to thank all contributors of the Apache Flink community who > made this release possible! > > Regards, > Qingsheng Ren >
Re: [VOTE] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join
+1 binding On Tue, Jun 18, 2024 at 11:54 AM Feng Jin wrote: > > +1 (non-binding) > > > Best, > Feng Jin > > > On Tue, Jun 18, 2024 at 10:24 AM Lincoln Lee wrote: > > > +1 (binding) > > > > > > Best, > > Lincoln Lee > > > > > > Xintong Song 于2024年6月17日周一 13:39写道: > > > > > +1 (binding) > > > > > > Best, > > > > > > Xintong > > > > > > > > > > > > On Mon, Jun 17, 2024 at 11:41 AM Zhanghao Chen < > > zhanghao.c...@outlook.com> > > > wrote: > > > > > > > +1 (unbinding) > > > > > > > > Best, > > > > Zhanghao Chen > > > > > > > > From: weijie guo > > > > Sent: Monday, June 17, 2024 10:13 > > > > To: dev > > > > Subject: [VOTE] FLIP-462: Support Custom Data Distribution for Input > > > > Stream of Lookup Join > > > > > > > > Hi everyone, > > > > > > > > > > > > Thanks for all the feedback about the FLIP-462: Support Custom Data > > > > Distribution for Input Stream of Lookup Join [1]. The discussion > > > > thread is here [2]. > > > > > > > > > > > > The vote will be open for at least 72 hours unless there is an > > > > objection or insufficient votes. > > > > > > > > > > > > Best, > > > > > > > > Weijie > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-462+Support+Custom+Data+Distribution+for+Input+Stream+of+Lookup+Join > > > > > > > > > > > > [2] https://lists.apache.org/thread/kds2zrcdmykrz5lmn0hf9m4phdl60nfb > > > > > > > > >
Re: [ANNOUNCE] Apache Flink CDC 3.1.1 released
Well done! Thanks a lot for your hard work! Best, Paul Lam > 2024年6月19日 09:47,Leonard Xu 写道: > > Congratulations! Thanks Qingsheng for the release work and all contributors > involved. > > Best, > Leonard > >> 2024年6月18日 下午11:50,Qingsheng Ren 写道: >> >> The Apache Flink community is very happy to announce the release of Apache >> Flink CDC 3.1.1. >> >> Apache Flink CDC is a distributed data integration tool for real time data >> and batch data, bringing the simplicity and elegance of data integration >> via YAML to describe the data movement and transformation in a data >> pipeline. >> >> Please check out the release blog post for an overview of the release: >> https://flink.apache.org/2024/06/18/apache-flink-cdc-3.1.1-release-announcement/ >> >> The release is available for download at: >> https://flink.apache.org/downloads.html >> >> Maven artifacts for Flink CDC can be found at: >> https://search.maven.org/search?q=g:org.apache.flink%20cdc >> >> The full release notes are available in Jira: >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354763 >> >> We would like to thank all contributors of the Apache Flink community who >> made this release possible! >> >> Regards, >> Qingsheng Ren >
Re: [ANNOUNCE] Apache Flink 1.19.1 released
Thanks Hong for driving this release! I clicked the library/flink docker images link[1] from 1.19.1-release-announcement[2]. And I cannot find the 1.19.1 images there, I'm not sure if it's caused by delay or some steps are missed. [1] https://hub.docker.com/_/flink/tags?page=1=1.19.1 [2] https://flink.apache.org/2024/06/14/apache-flink-1.19.1-release-announcement/ Best, Rui On Wed, Jun 19, 2024 at 9:49 AM Leonard Xu wrote: > Congratulations! Thanks Hong for the release work and all involved! > > Best, > Leonard > > > 2024年6月19日 上午4:20,Hong Liang 写道: > > > > The Apache Flink community is very happy to announce the release of > Apache > > Flink 1.19.1, which is the first bugfix release for the Apache Flink 1.19 > > series. > > > > Apache Flink® is an open-source stream processing framework for > > distributed, high-performing, always-available, and accurate data > streaming > > applications. > > > > The release is available for download at: > > https://flink.apache.org/downloads.html > > > > Please check out the release blog post for an overview of the > improvements > > for this bugfix release: > > > https://flink.apache.org/2024/06/14/apache-flink-1.19.1-release-announcement/ > > > > The full release notes are available in Jira: > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12354399=12315522 > > > > We would like to thank all contributors of the Apache Flink community who > > made this release possible! > > > > Feel free to reach out to the release managers (or respond to this > thread) > > with feedback on the release process. Our goal is to constantly improve > the > > release process. Feedback on what could be improved or things that didn't > > go so well are appreciated. > > > > Regards, > > Hong > >
Re: [ANNOUNCE] Apache Flink 1.19.1 released
Congratulations! Thanks Hong for the release work and all involved! Best, Leonard > 2024年6月19日 上午4:20,Hong Liang 写道: > > The Apache Flink community is very happy to announce the release of Apache > Flink 1.19.1, which is the first bugfix release for the Apache Flink 1.19 > series. > > Apache Flink® is an open-source stream processing framework for > distributed, high-performing, always-available, and accurate data streaming > applications. > > The release is available for download at: > https://flink.apache.org/downloads.html > > Please check out the release blog post for an overview of the improvements > for this bugfix release: > https://flink.apache.org/2024/06/14/apache-flink-1.19.1-release-announcement/ > > The full release notes are available in Jira: > https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12354399=12315522 > > We would like to thank all contributors of the Apache Flink community who > made this release possible! > > Feel free to reach out to the release managers (or respond to this thread) > with feedback on the release process. Our goal is to constantly improve the > release process. Feedback on what could be improved or things that didn't > go so well are appreciated. > > Regards, > Hong
Re: [ANNOUNCE] Apache Flink CDC 3.1.1 released
Congratulations! Thanks Qingsheng for the release work and all contributors involved. Best, Leonard > 2024年6月18日 下午11:50,Qingsheng Ren 写道: > > The Apache Flink community is very happy to announce the release of Apache > Flink CDC 3.1.1. > > Apache Flink CDC is a distributed data integration tool for real time data > and batch data, bringing the simplicity and elegance of data integration > via YAML to describe the data movement and transformation in a data > pipeline. > > Please check out the release blog post for an overview of the release: > https://flink.apache.org/2024/06/18/apache-flink-cdc-3.1.1-release-announcement/ > > The release is available for download at: > https://flink.apache.org/downloads.html > > Maven artifacts for Flink CDC can be found at: > https://search.maven.org/search?q=g:org.apache.flink%20cdc > > The full release notes are available in Jira: > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354763 > > We would like to thank all contributors of the Apache Flink community who > made this release possible! > > Regards, > Qingsheng Ren
[jira] [Created] (FLINK-35641) ParquetSchemaConverter should correctly handle field optionality
Alex Sorokoumov created FLINK-35641: --- Summary: ParquetSchemaConverter should correctly handle field optionality Key: FLINK-35641 URL: https://issues.apache.org/jira/browse/FLINK-35641 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Reporter: Alex Sorokoumov At the moment, [ParquetSchemaConverter|https://github.com/apache/flink/blob/99d6fd3c68f46daf0397a35566414e1d19774c3d/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/utils/ParquetSchemaConverter.java#L64] marks all fields as optional. This is not correct in general and especially when it comes to handling maps. For example, [parquet-tools|https://pypi.org/project/parquet-tools/] breaks on the Parquet file produced by [ParquetRowDataWriterTest#complexTypeTest|https://github.com/apache/flink/blob/99d6fd3c68f46daf0397a35566414e1d19774c3d/flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriterTest.java#L140-L151]: {noformat} parquet-tools inspect /var/folders/sc/k3hr87fj4x169rdq9n107whwgp/T/junit14646865447948471989/3b328592-7315-48c6-8fa9-38da4048fb4e Traceback (most recent call last): File "/Users/asorokoumov/.pyenv/versions/3.12.3/bin/parquet-tools", line 8, in sys.exit(main()) ^^ File "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/parquet_tools/cli.py", line 26, in main args.handler(args) File "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/parquet_tools/commands/inspect.py", line 55, in _cli _execute_simple( File "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/parquet_tools/commands/inspect.py", line 63, in _execute_simple pq_file: pq.ParquetFile = pq.ParquetFile(filename) File "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 317, in __init__ self.reader.open( File "pyarrow/_parquet.pyx", line 1492, in pyarrow._parquet.ParquetReader.open File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Map keys must be annotated as required. {noformat} The correct thing to do is to mark nullable fields as optional, otherwise required. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[ANNOUNCE] Apache Flink 1.19.1 released
The Apache Flink community is very happy to announce the release of Apache Flink 1.19.1, which is the first bugfix release for the Apache Flink 1.19 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is available for download at: https://flink.apache.org/downloads.html Please check out the release blog post for an overview of the improvements for this bugfix release: https://flink.apache.org/2024/06/14/apache-flink-1.19.1-release-announcement/ The full release notes are available in Jira: https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12354399=12315522 We would like to thank all contributors of the Apache Flink community who made this release possible! Feel free to reach out to the release managers (or respond to this thread) with feedback on the release process. Our goal is to constantly improve the release process. Feedback on what could be improved or things that didn't go so well are appreciated. Regards, Hong
Re: [VOTE] FLIP-461: Synchronize rescaling with checkpoint creation to minimize reprocessing for the AdaptiveScheduler
+1 (binding) On Tue, Jun 18, 2024 at 11:38 AM Gabor Somogyi wrote: > +1 (binding) > > G > > > On Mon, Jun 17, 2024 at 10:24 AM Matthias Pohl wrote: > > > Hi everyone, > > the discussion in [1] about FLIP-461 [2] is kind of concluded. I am > > starting a vote on this one here. > > > > The vote will be open for at least 72 hours (i.e. until June 20, 2024; > > 8:30am UTC) unless there are any objections. The FLIP will be considered > > accepted if 3 binding votes (from active committers according to the > Flink > > bylaws [3]) are gathered by the community. > > > > Best, > > Matthias > > > > [1] https://lists.apache.org/thread/nnkonmsv8xlk0go2sgtwnphkhrr5oc3y > > [2] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler > > [3] > > > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals > > >
[jira] [Created] (FLINK-35640) Drop Flink 1.15 support for the operator
Mate Czagany created FLINK-35640: Summary: Drop Flink 1.15 support for the operator Key: FLINK-35640 URL: https://issues.apache.org/jira/browse/FLINK-35640 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Mate Czagany Fix For: kubernetes-operator-1.10.0 As the operator only supports the latest 4 stable minor Flink versions, 1.15 support should be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS Statement
Hi Jeyhun, Timo, I updated the FLIP to include the option to reorder the columns defined in the SELECT part. As Timo mentioned, this is equivalent to the Flink INSERT INTO statement and also to Azure CTAS statement. For example: > CREATE TABLE t1(a INT, b INT, c INT); > CREATE TABLE t1(c, b, a) AS SELECT * FROM t1; > DESCRIBE s1; +-+---+--++ | Column Name | Data Type | Nullable | Extras | +-+---+--++ | c | INT | NULL || | b | INT | NULL || | a | INT | NULL || +-+---+--++ Let me know your thoughts. - Sergio On Tue, Jun 18, 2024 at 7:34 AM Sergio Pena wrote: > Hi Ron, > > I think the primary key and unique values are handled by the engine. I > found this document [1] > that explains how Flink handles those records. When using CTAS, you can > specify in the > table options how you want the records to be merged if you define a > primary key. > > Regarding CTAS, this is just a wrapper that's similar to executing two > statements manually; a > create table followed by an insert select statements. If a user has a > primary key in the target > table and then execute the insert select, then the Flink engine will > handle unique records > depending on how the target table was configured. > > Hope this link [1] helps understand primary keys? It really helped me, I > wasn't sure how > it worked either :). > > - Sergio > > [1] > https://nightlies.apache.org/flink/flink-table-store-docs-master/docs/concepts/primary-key-table/ > > > On Mon, Jun 17, 2024 at 8:43 PM Ron Liu wrote: > >> Hi, Sergio >> >> Thanks for your reply. >> >> > PKEYS are not inherited from the select query. Also UNIQUE KEY is not >> supported on the create statements yet. >> All columns queried in the select part form a query schema with only the >> column name, type and null/not null properties. >> Based on this, I don't see there will be an issue with uniqueness. Users >> may define a pkey on the not null columns >> only. >> >> Sorry if I wasn't clear about this issue. My question is, if a primary key >> is defined on column a in the target table of a CTAS, and the data fetched >> by the Select query can't guarantee uniqueness based on column a, then >> what >> is relied upon to ensure uniqueness? >> Is it something that the engine does, or is it dependent on the storage >> mechanism of the target table? >> >> Best, >> Ron >> >> >> Sergio Pena 于2024年6月18日周二 01:24写道: >> >> > Hi Ron, >> > Thanks for your feedback. >> > >> > Is it possible for you to list these systems and make it part of the >> > > Reference chapter? Which would make it easier for everyone to >> understand. >> > > >> > >> > I updated the flip to include the ones I found (mysql, postgres, >> oracle). >> > Though postgres/oracle semantics are different, they at least allow you >> > some schema changes in the create part. >> > I think spark sql supports it too, but I couldn't find a way to test >> it, so >> > I didn't include it in the list. >> > >> > One is whether Nullable columns are >> > >> > allowed to be referenced in the primary key definition, because all the >> > >> > columns deduced based on Select Query may be Nullable; >> > >> > >> > I updated the flip. But in short, CTAS will use the same rules and >> > validations as other CREATE statements. >> > Pkeys are not allowed on NULL columns, so CTAS would fail too. The CTAS >> > inherits the NULL and NOT NULL >> > constraints from the query schema, so PKEY can be used on the NOT NULL >> cols >> > only. >> > >> > if the UNIQUE KEY cannot be deduced based on Select Query, or the >> > > deduced UNIQUE KEY does not match the defined primary key, which will >> > lead >> > > to data duplication, the engine needs to what to ensure the >> uniqueness of >> > > the data? >> > > >> > >> > PKEYS are not inherited from the select query. Also UNIQUE KEY is not >> > supported on the create statements yet. >> > All columns queried in the select part form a query schema with only the >> > column name, type and null/not null properties. >> > Based on this, I don't see there will be an issue with uniqueness. Users >> > may define a pkey on the not null columns >> > only. >> > >> > - Sergio >> > >> > >> > On Sun, Jun 16, 2024 at 9:55 PM Ron Liu wrote: >> > >> > > Hi, Sergio >> > > >> > > Sorry for later joining this thread. >> > > >> > > Thanks for driving this proposal, it looks great. >> > > >> > > I have a few questions: >> > > >> > > 1. Many SQL-based data processing systems, however, support schema >> > > definitions within their CTAS statements >> > > >> > > Is it possible for you to list these systems and make it part of the >> > > Reference chapter? Which would make it easier for everyone to >> understand. >> > > >> > > >> > > 2. This proposal proposes to support defining primary keys in CTAS, >> then >> > > there are two possible issues
Re: [VOTE] Apache Flink Kubernetes Operator Release 1.9.0, release candidate #1
Hi, +1 (non-binding) Note: Using the Apache Flink KEYS file [1] to verify the signatures your key seems to be expired, so that file should be updated as well. - Verified checksums and signatures - Built source distribution - Verified all pom.xml versions are the same - Verified install from RC repo - Verified Chart.yaml and values.yaml contents - Submitted basic example with 1.17 and 1.19 Flink versions in native and standalone mode - Tested operator HA, added new watched namespace dynamically - Checked operator logs Regards, Mate [1] https://dist.apache.org/repos/dist/release/flink/KEYS Gyula Fóra ezt írta (időpont: 2024. jún. 18., K, 8:14): > Hi Everyone, > > Please review and vote on the release candidate #1 for the version 1.9.0 of > Apache Flink Kubernetes Operator, > as follows: > [ ] +1, Approve the release > [ ] -1, Do not approve the release (please provide specific comments) > > **Release Overview** > > As an overview, the release consists of the following: > a) Kubernetes Operator canonical source distribution (including the > Dockerfile), to be deployed to the release repository at dist.apache.org > b) Kubernetes Operator Helm Chart to be deployed to the release repository > at dist.apache.org > c) Maven artifacts to be deployed to the Maven Central Repository > d) Docker image to be pushed to dockerhub > > **Staging Areas to Review** > > The staging areas containing the above mentioned artifacts are as follows, > for your review: > * All artifacts for a,b) can be found in the corresponding dev repository > at dist.apache.org [1] > * All artifacts for c) can be found at the Apache Nexus Repository [2] > * The docker image for d) is staged on github [3] > > All artifacts are signed with the key 21F06303B87DAFF1 [4] > > Other links for your review: > * JIRA release notes [5] > * source code tag "release-1.9.0-rc1" [6] > * PR to update the website Downloads page to > include Kubernetes Operator links [7] > > **Vote Duration** > > The voting time will run for at least 72 hours. > It is adopted by majority approval, with at least 3 PMC affirmative votes. > > **Note on Verification** > > You can follow the basic verification guide here[8]. > Note that you don't need to verify everything yourself, but please make > note of what you have tested together with your +- vote. > > Cheers! > Gyula Fora > > [1] > > https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.9.0-rc1/ > [2] > https://repository.apache.org/content/repositories/orgapacheflink-1740/ > [3] ghcr.io/apache/flink-kubernetes-operator:17129ff > [4] https://dist.apache.org/repos/dist/release/flink/KEYS > [5] > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354417 > [6] > https://github.com/apache/flink-kubernetes-operator/tree/release-1.9.0-rc1 > [7] https://github.com/apache/flink-web/pull/747 > [8] > > https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release >
Re: [ANNOUNCE] Flink 1.20 feature freeze
Hi Robert, Rui, Ufuk, and Weijie, I would like to raise the PR about merging `flink run` and `flink run-application` functionality [1] to get considered as part of the 1.20 release. The reason IMO is that the `run-application` CLI command should be removed in the same release when Per-Job mode gets removed. AFAIK when we deprecate a public API, it has to stay for 2 minor releases to give time for users to adapt. According to that, if `run-application` is deprecated in Flink 2.0, it can get removed in Flink 2.3. Currently the drop of per-job mode is blocked [2] and probably it will not be resolved for a while, but I could imagine it would be possible in 2.1 or 2.2. The change itself is rather small and concise, and Marton Balassi volunteered to review it ASAP. Pls. correct me if I am wrong about the deprecation process. Looking forward to your opinion! Thanks, Ferenc [1] https://issues.apache.org/jira/browse/FLINK-35625 [2] https://issues.apache.org/jira/browse/FLINK-26000 On Tuesday, 18 June 2024 at 11:27, weijie guo wrote: > > > Hi Zakelly, > > Thank you for informing us! > > After discussion, all RMs agreed that this was an important fix that should > be merged into 1.20. > > So feel free to merge it. > > Best regards, > > Weijie > > > Zakelly Lan zakelly@gmail.com 于2024年6月15日周六 16:29写道: > > > Hi Robert, Rui, Ufuk and Weijie, > > > > Thanks for the update! > > > > FYI: This PR[1] fixes & cleanup the left-over checkpoint directories for > > file-merging on TM exit. And the second commit fixes the wrong state handle > > usage. We encountered several unexpected CI fails, so we missed the feature > > freeze time. It is better to have this PR in 1.20 so I will merge this if > > you agree. Thanks. > > > > [1] https://github.com/apache/flink/pull/24933 > > > > Best, > > Zakelly > > > > On Sat, Jun 15, 2024 at 6:00 AM weijie guo guoweijieres...@gmail.com > > wrote: > > > > > Hi everyone, > > > > > > The feature freeze of 1.20 has started now. That means that no new > > > features > > > > > > or improvements should now be merged into the master branch unless you > > > ask > > > > > > the release managers first, which has already been done for PRs, or > > > pending > > > > > > on CI to pass. Bug fixes and documentation PRs can still be merged. > > > > > > - Cutting release branch > > > > > > Currently we have no blocker issues(beside tickets that used for > > > release-testing). > > > > > > We are planning to cut the release branch on next Friday (June 21) if > > > no new test instabilities, and we'll make another announcement in the > > > dev mailing list then. > > > > > > - Cross-team testing > > > > > > The release testing will start right after we cut the release branch, > > > which > > > > > > is expected to come in the next week. As a prerequisite of it, we have > > > created > > > > > > the corresponding instruction ticket in FLINK-35602 [1], please check > > > and complete the > > > > > > documentation and test instruction of your new feature and mark the > > > related JIRA > > > > > > issue in the 1.20 release wiki page [2] before we start testing, which > > > > > > would be quite helpful for other developers to validate your features. > > > > > > Best regards, > > > > > > Robert, Rui, Ufuk and Weijie > > > > > > [1]https://issues.apache.org/jira/browse/FLINK-35602 > > > > > > [2] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release
[ANNOUNCE] Apache Flink CDC 3.1.1 released
The Apache Flink community is very happy to announce the release of Apache Flink CDC 3.1.1. Apache Flink CDC is a distributed data integration tool for real time data and batch data, bringing the simplicity and elegance of data integration via YAML to describe the data movement and transformation in a data pipeline. Please check out the release blog post for an overview of the release: https://flink.apache.org/2024/06/18/apache-flink-cdc-3.1.1-release-announcement/ The release is available for download at: https://flink.apache.org/downloads.html Maven artifacts for Flink CDC can be found at: https://search.maven.org/search?q=g:org.apache.flink%20cdc The full release notes are available in Jira: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354763 We would like to thank all contributors of the Apache Flink community who made this release possible! Regards, Qingsheng Ren
Re: [VOTE] FLIP-461: Synchronize rescaling with checkpoint creation to minimize reprocessing for the AdaptiveScheduler
+1 (binding) G On Mon, Jun 17, 2024 at 10:24 AM Matthias Pohl wrote: > Hi everyone, > the discussion in [1] about FLIP-461 [2] is kind of concluded. I am > starting a vote on this one here. > > The vote will be open for at least 72 hours (i.e. until June 20, 2024; > 8:30am UTC) unless there are any objections. The FLIP will be considered > accepted if 3 binding votes (from active committers according to the Flink > bylaws [3]) are gathered by the community. > > Best, > Matthias > > [1] https://lists.apache.org/thread/nnkonmsv8xlk0go2sgtwnphkhrr5oc3y > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler > [3] > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals >
[RESULT][VOTE] Apache Flink CDC Release 3.1.1, release candidate #0
Hi everyone, I'm happy to announce that we have unanimously approved this release. There are 5 approving votes, 3 of which are binding: - Xiqian Yu (non-binding) - Yanquan Lv (non-binding) - Leonard Xu (binding) - Jark Wu (binding) - Lincoln Lee (binding) There are no disapproving votes. Thank you for verifying the release candidate. We will now proceed to finalize the release and announce it once everything is published. Best, Qingsheng
[jira] [Created] (FLINK-35639) upgrading to 1.19 with job in HA state with restart strategy crashes job manager
yazgoo created FLINK-35639: -- Summary: upgrading to 1.19 with job in HA state with restart strategy crashes job manager Key: FLINK-35639 URL: https://issues.apache.org/jira/browse/FLINK-35639 Project: Flink Issue Type: Bug Components: API / Core Affects Versions: 1.19.1 Environment: Download 1.18 and 1.19 binary releases. Add the following to flink-1.19.0/conf/config.yaml and flink-1.18.1/conf/flink-conf.yaml ```yaml high-availability: zookeeper high-availability.zookeeper.quorum: localhost high-availability.storageDir: file:///tmp/flink/recovery ``` Launch zookeeper: docker run --network host zookeeper:latest launch 1.18 task manager: ./flink-1.18.1/bin/taskmanager.sh start-foreground launch 1.18 job manager: ./flink-1.18.1/bin/jobmanager.sh start-foreground launch the following job: ```java import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.api.java.tuple.Tuple2; import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.util.Collector; import org.apache.flink.api.common.restartstrategy.RestartStrategies; import org.apache.flink.api.common.time.Time; import java.util.concurrent.TimeUnit; public class FlinkJob \{ public static void main(String[] args) throws Exception { final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); env.setRestartStrategy( RestartStrategies.fixedDelayRestart(Integer.MAX_VALUE, Time.of(20, TimeUnit.SECONDS)) ); env.fromElements("Hello World", "Hello Flink") .flatMap(new LineSplitter()) .groupBy(0) .sum(1) .print(); } public static final class LineSplitter implements FlatMapFunction> \{ @Override public void flatMap(String value, Collector> out) { for (String word : value.split(" ")) { try { Thread.sleep(12); } catch (InterruptedException e) \{ e.printStackTrace(); } out.collect(new Tuple2<>(word, 1)); } } } } ``` ```xml 4.0.0 org.apache.flink myflinkjob 1.0-SNAPSHOT 1.18.1 1.8 org.apache.flink flink-java ${flink.version} org.apache.flink flink-streaming-java ${flink.version} org.apache.maven.plugins maven-compiler-plugin 3.8.1 ${java.version} ${java.version} org.apache.maven.plugins maven-jar-plugin 3.1.0 true lib/ FlinkJob ``` Launch job: ./flink-1.18.1/bin/flink run ../flink-job/target/myflinkjob-1.0-SNAPSHOT.jar Job has been submitted with JobID 5f0898c964a93a47aa480427f3e2c6c0 Kill job manager and task manager. Then launch job manager 1.19.0 ./flink-1.19.0/bin/jobmanager.sh start-foreground Root cause == It looks like the type of delayBetweenAttemptsInterval was changed in 1.19 https://github.com/apache/flink/pull/22984/files#diff-d174f32ffdea69de610c4f37c545bd22a253b9846434f83397f1bbc2aaa399faR239 , introducing an incompatibility which is not handled by flink 1.19. In my opinion, job-maanger should not crash when starting in that case. Reporter: yazgoo When trying to upgrade a flink cluster from 1.18 to 1.19, with a 1.18 job in zookeeper HA state, I have a ClassCastException, see log below ```log 2024-06-18 16:58:14,401 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: JobMaster for job 5f0898c964a93a47aa480427f3e2c6c0 failed. at org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:1484) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:775) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:738) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$7(Dispatcher.java:693) ~[flink-dist-1.19.0.jar:1.19.0] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[?:?] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[?:?] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) ~[?:?] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRunAsync(PekkoRpcActor.java:451) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:218) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85)
Re: [ANNOUNCE] New Apache Flink Committer - Zhongqiang Gong
Congratulations Zhongqiang! Best, Swapnal On Tue, 18 Jun 2024, 07:05 Jacky Lau, wrote: > Congratulations Zhongqiang! > > Best Regards > Jacky Lau > > Yubin Li 于2024年6月18日周二 00:28写道: > > > Congratulations Zhongqiang! > > > > Best Regards > > Yubin > > > > On Tue, Jun 18, 2024 at 12:21 AM Muhammet Orazov > > wrote: > > > > > > Congratulations Zhongqiang! Well deserved! > > > > > > Best, > > > Muhammet > > > > > > > > >
[jira] [Created] (FLINK-35638) Modify OceanBase Docker container to make the test cases runnable on non-Linux systems
He Wang created FLINK-35638: --- Summary: Modify OceanBase Docker container to make the test cases runnable on non-Linux systems Key: FLINK-35638 URL: https://issues.apache.org/jira/browse/FLINK-35638 Project: Flink Issue Type: Improvement Components: Flink CDC Reporter: He Wang -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] Apache Flink CDC Release 3.1.1, release candidate #0
+1 (binding) - verified hashes - verified signatures - build from source with JDK 8 & Maven 3.8.6 - the source distributions do not contain any binaries - all POM files point to the same version - checked release notes - reviewed the web PR Best, Lincoln Lee Jark Wu 于2024年6月18日周二 19:58写道: > +1 (binding) > > - Build and compile the source code locally: *OK* > - Verified signatures: *OK* > - Verified hashes: *OK* > - Checked no missing artifacts in the staging area: *OK* > - Reviewed the website release PR: *OK* > - Checked the licenses: *OK* > > Best, > Jark > > On Tue, 18 Jun 2024 at 18:14, Leonard Xu wrote: > > > +1 (binding) > > > > - verified signatures > > - verified hashsums > > - checked release notes > > - reviewed the web PR > > - tested Flink CDC works with Flink 1.19 > > - tested route、transform in MySQL to Doris Pipeline > > > > Best, > > Leonard > > > > >
Re: [DISCUSS] FLIP-463: Schema Definition in CREATE TABLE AS Statement
Hi Ron, I think the primary key and unique values are handled by the engine. I found this document [1] that explains how Flink handles those records. When using CTAS, you can specify in the table options how you want the records to be merged if you define a primary key. Regarding CTAS, this is just a wrapper that's similar to executing two statements manually; a create table followed by an insert select statements. If a user has a primary key in the target table and then execute the insert select, then the Flink engine will handle unique records depending on how the target table was configured. Hope this link [1] helps understand primary keys? It really helped me, I wasn't sure how it worked either :). - Sergio [1] https://nightlies.apache.org/flink/flink-table-store-docs-master/docs/concepts/primary-key-table/ On Mon, Jun 17, 2024 at 8:43 PM Ron Liu wrote: > Hi, Sergio > > Thanks for your reply. > > > PKEYS are not inherited from the select query. Also UNIQUE KEY is not > supported on the create statements yet. > All columns queried in the select part form a query schema with only the > column name, type and null/not null properties. > Based on this, I don't see there will be an issue with uniqueness. Users > may define a pkey on the not null columns > only. > > Sorry if I wasn't clear about this issue. My question is, if a primary key > is defined on column a in the target table of a CTAS, and the data fetched > by the Select query can't guarantee uniqueness based on column a, then what > is relied upon to ensure uniqueness? > Is it something that the engine does, or is it dependent on the storage > mechanism of the target table? > > Best, > Ron > > > Sergio Pena 于2024年6月18日周二 01:24写道: > > > Hi Ron, > > Thanks for your feedback. > > > > Is it possible for you to list these systems and make it part of the > > > Reference chapter? Which would make it easier for everyone to > understand. > > > > > > > I updated the flip to include the ones I found (mysql, postgres, oracle). > > Though postgres/oracle semantics are different, they at least allow you > > some schema changes in the create part. > > I think spark sql supports it too, but I couldn't find a way to test it, > so > > I didn't include it in the list. > > > > One is whether Nullable columns are > > > > allowed to be referenced in the primary key definition, because all the > > > > columns deduced based on Select Query may be Nullable; > > > > > > I updated the flip. But in short, CTAS will use the same rules and > > validations as other CREATE statements. > > Pkeys are not allowed on NULL columns, so CTAS would fail too. The CTAS > > inherits the NULL and NOT NULL > > constraints from the query schema, so PKEY can be used on the NOT NULL > cols > > only. > > > > if the UNIQUE KEY cannot be deduced based on Select Query, or the > > > deduced UNIQUE KEY does not match the defined primary key, which will > > lead > > > to data duplication, the engine needs to what to ensure the uniqueness > of > > > the data? > > > > > > > PKEYS are not inherited from the select query. Also UNIQUE KEY is not > > supported on the create statements yet. > > All columns queried in the select part form a query schema with only the > > column name, type and null/not null properties. > > Based on this, I don't see there will be an issue with uniqueness. Users > > may define a pkey on the not null columns > > only. > > > > - Sergio > > > > > > On Sun, Jun 16, 2024 at 9:55 PM Ron Liu wrote: > > > > > Hi, Sergio > > > > > > Sorry for later joining this thread. > > > > > > Thanks for driving this proposal, it looks great. > > > > > > I have a few questions: > > > > > > 1. Many SQL-based data processing systems, however, support schema > > > definitions within their CTAS statements > > > > > > Is it possible for you to list these systems and make it part of the > > > Reference chapter? Which would make it easier for everyone to > understand. > > > > > > > > > 2. This proposal proposes to support defining primary keys in CTAS, > then > > > there are two possible issues here. One is whether Nullable columns are > > > allowed to be referenced in the primary key definition, because all the > > > columns deduced based on Select Query may be Nullable; and the second > is > > > that if the UNIQUE KEY cannot be deduced based on Select Query, or the > > > deduced UNIQUE KEY does not match the defined primary key, which will > > lead > > > to data duplication, the engine needs to what to ensure the uniqueness > of > > > the data? > > > > > > > > > Best > > > Ron > > > > > > > > > Jeyhun Karimov 于2024年6月14日周五 01:51写道: > > > > > > > Thanks Sergio and Timo for your answers. > > > > Sounds good to me. > > > > Looking forward for this feature. > > > > > > > > Regards, > > > > Jeyhun > > > > > > > > On Thu, Jun 13, 2024 at 4:48 PM Sergio Pena > > > > > > > > > wrote: > > > > > > > > > Sure Yuxia, I just added the support for RTAS statements too. > > > > > > > > > > - Sergio > > > >
[jira] [Created] (FLINK-35637) ScalarFunctionCallGen does not handle complex argument type properly
Qinghui Xu created FLINK-35637: -- Summary: ScalarFunctionCallGen does not handle complex argument type properly Key: FLINK-35637 URL: https://issues.apache.org/jira/browse/FLINK-35637 Project: Flink Issue Type: Bug Components: Table SQL / Planner Reporter: Qinghui Xu When trying to use a UDF that expects argument as `Array`, error is raised: ``` {{}}java.lang.ClassCastException: org.apache.flink.table.data.GenericRowData cannot be cast to org.apache.flink.table.data.RawValueData at org.apache.flink.table.data.GenericArrayData.getRawValue(GenericArrayData.java:223) at org.apache.flink.table.data.ArrayData.lambda$createElementGetter$95d74a6c$1(ArrayData.java:224) at org.apache.flink.table.data.util.DataFormatConverters.arrayDataToJavaArray(DataFormatConverters.java:1223) at org.apache.flink.table.data.util.DataFormatConverters.access$200(DataFormatConverters.java:106) at org.apache.flink.table.data.util.DataFormatConverters$ObjectArrayConverter.toExternalImpl(DataFormatConverters.java:1175) at org.apache.flink.table.data.util.DataFormatConverters$ObjectArrayConverter.toExternalImpl(DataFormatConverters.java:1115) at org.apache.flink.table.data.util.DataFormatConverters$DataFormatConverter.toExternal(DataFormatConverters.java:419) at StreamExecCalc$1560.processElement(Unknown Source) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26) at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:317) at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:411) at MyUDFExpectingRowDataArray$$anonfun$run$1.apply at MyUDFExpectingRowDataArray$$anonfun$run$1.apply at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at com.criteo.featureflow.flink.datadisco.test.JsonFileRowDataSource.run(TestBlinkGlupTableSource.scala:65) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:104) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:60) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:269) ``` After digging into the `ScalarFunctionCallGen`, it turns out it's trying to treat the argument as a `RAW` type while it should be a `ROW`. The root cause seems to be that the codegen relies solely on the `ScalarFunction` signature to refer the type which is the "external type". It should instead take into consideration the type of the operand and bridge to the external type. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] Apache Flink CDC Release 3.1.1, release candidate #0
+1 (binding) - Build and compile the source code locally: *OK* - Verified signatures: *OK* - Verified hashes: *OK* - Checked no missing artifacts in the staging area: *OK* - Reviewed the website release PR: *OK* - Checked the licenses: *OK* Best, Jark On Tue, 18 Jun 2024 at 18:14, Leonard Xu wrote: > +1 (binding) > > - verified signatures > - verified hashsums > - checked release notes > - reviewed the web PR > - tested Flink CDC works with Flink 1.19 > - tested route、transform in MySQL to Doris Pipeline > > Best, > Leonard > >
[jira] [Created] (FLINK-35636) Streaming File Sink s3 end-to-end test did not finish after 900 seconds
Weijie Guo created FLINK-35636: -- Summary: Streaming File Sink s3 end-to-end test did not finish after 900 seconds Key: FLINK-35636 URL: https://issues.apache.org/jira/browse/FLINK-35636 Project: Flink Issue Type: Bug Components: Build System / CI Affects Versions: 1.17.2 Reporter: Weijie Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35635) Release Testing: Verify FLIP-445: Support dynamic parallelism inference for HiveSource
xingbe created FLINK-35635: -- Summary: Release Testing: Verify FLIP-445: Support dynamic parallelism inference for HiveSource Key: FLINK-35635 URL: https://issues.apache.org/jira/browse/FLINK-35635 Project: Flink Issue Type: Sub-task Components: Connectors / Hive Reporter: xingbe Assignee: xingbe Fix For: 1.20.0 Follow up the test for https://issues.apache.org/jira/browse/FLINK-35293 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35634) Add a CDC quickstart utility
yux created FLINK-35634: --- Summary: Add a CDC quickstart utility Key: FLINK-35634 URL: https://issues.apache.org/jira/browse/FLINK-35634 Project: Flink Issue Type: New Feature Components: Flink CDC Reporter: yux Currently, it's not very easy to initialize a CDC pipeline job from scratch, requiring user to configure lots of Flink configurations manually. This ticket suggests creating an extra component like `tiup` and `rustup` to help user creating and submitting CDC job quickly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35633) Early verification and clearer error message of pipeline YAML definition
yux created FLINK-35633: --- Summary: Early verification and clearer error message of pipeline YAML definition Key: FLINK-35633 URL: https://issues.apache.org/jira/browse/FLINK-35633 Project: Flink Issue Type: Improvement Components: Flink CDC Reporter: yux Currently, little verifications are applied before submitting YAML pipeline job to Flink cluster, and errors in Transform / Route rules will be exposed as a runtime exception, which is hard to debug and investigate. This ticket suggests performing Transform / Route expression validation in CDC CLI before submitting YAML jobs, providing clearer and more descriptive error messages in early. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[SUMMARY] Flink 1.20 Release Sync 06/18/2024
Dear devs, This is the first meeting after feature freeze of Flink 1.20. I'd like to share the information synced in the meeting. - Feature Freeze We have announced the feature freeze of 1.20 at June 15. Finally, we have 13 completed features. Meanwhile, 3 features were moved to the next release, but none of them are must-have for 1.20. - Blockers: - FLINK-35629 - Performance regression in stringRead and stringWrite - We haven't found the root cause yet, but Zakelly will try JMH 1.19 to exclude the impact of environment change. - FLINK-35587 - job fails with "The read buffer is null in credit-based input channel" on TPC-DS 10TB benchmark - This is a serious bug that influence our credit-base network protocol. Junrui has identified the commit that caused the problem. Weijie will investigate it with the highest priority. - Cutting release branch We are planning to cut the release branch on next Monday (June 24). For the convenience of testing, we decided to out *rc0* release a few days after cut-branch, please note that it is only for testing purpose and is not intended to be a final release. We'll make announcement separately in the dev mailing list for both of them. - Release Testing We have created an umbrella JIRA[1] for release testing of 1.20. please check and complete the documentation and test instruction of your new feature and mark the related JIRA issue in the 1.20 release wiki page [2] before we start testing, which would be quite helpful for other developers to validate your features. - Sync meeting[3]: Due to the time conflict of some RM, we decided to advance the start time of release sync meeting by *half an hour*. The next meeting is 06/25/2024 9.30am (UTC+2) and 3.30pm (UTC+8), please feel free to join us. Lastly, we encourage attendees to fill out the topics to be discussed at the bottom of 1.20 wiki page[2] a day in advance, to make it easier for everyone to understand the background of the topics, thanks! Best, Robert, Rui, Ufuk, Weijie [1] https://issues.apache.org/jira/browse/FLINK-35602 [2] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release [3] https://meet.google.com/mtj-huez-apu
Re: [VOTE] Apache Flink CDC Release 3.1.1, release candidate #0
+1 (binding) - verified signatures - verified hashsums - checked release notes - reviewed the web PR - tested Flink CDC works with Flink 1.19 - tested route、transform in MySQL to Doris Pipeline Best, Leonard
[jira] [Created] (FLINK-35632) The example provided in the kafkaSource documentation for topic regex subscription is incorrect
elon_X created FLINK-35632: -- Summary: The example provided in the kafkaSource documentation for topic regex subscription is incorrect Key: FLINK-35632 URL: https://issues.apache.org/jira/browse/FLINK-35632 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 1.20.0 Reporter: elon_X Attachments: image-2024-06-18-17-47-53-525.png, image-2024-06-18-17-48-38-080.png [https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/connectors/datastream/kafka/#topic-partition-subscription] The example provided in the document has issues and will result in a compilation error. The correct example should be: {code:java} // code placeholder KafkaSource.builder().setTopicPattern(Pattern.compile("topic.*")) {code} !image-2024-06-18-17-47-53-525.png! !image-2024-06-18-17-48-38-080.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [ANNOUNCE] Flink 1.20 feature freeze
Hi Zakelly, Thank you for informing us! After discussion, all RMs agreed that this was an important fix that should be merged into 1.20. So feel free to merge it. Best regards, Weijie Zakelly Lan 于2024年6月15日周六 16:29写道: > Hi Robert, Rui, Ufuk and Weijie, > > Thanks for the update! > > FYI: This PR[1] fixes & cleanup the left-over checkpoint directories for > file-merging on TM exit. And the second commit fixes the wrong state handle > usage. We encountered several unexpected CI fails, so we missed the feature > freeze time. It is better to have this PR in 1.20 so I will merge this if > you agree. Thanks. > > > [1] https://github.com/apache/flink/pull/24933 > > Best, > Zakelly > > On Sat, Jun 15, 2024 at 6:00 AM weijie guo > wrote: > > > Hi everyone, > > > > > > The feature freeze of 1.20 has started now. That means that no new > features > > > > or improvements should now be merged into the master branch unless you > ask > > > > the release managers first, which has already been done for PRs, or > pending > > > > on CI to pass. Bug fixes and documentation PRs can still be merged. > > > > > > > > - *Cutting release branch* > > > > > > Currently we have no blocker issues(beside tickets that used for > > release-testing). > > > > We are planning to cut the release branch on next Friday (June 21) if > > no new test instabilities, and we'll make another announcement in the > > dev mailing list then. > > > > > > > > - *Cross-team testing* > > > > > > The release testing will start right after we cut the release branch, > which > > > > is expected to come in the next week. As a prerequisite of it, we have > > created > > > > the corresponding instruction ticket in FLINK-35602 [1], please check > > and complete the > > > > documentation and test instruction of your new feature and mark the > > related JIRA > > > > issue in the 1.20 release wiki page [2] before we start testing, which > > > > would be quite helpful for other developers to validate your features. > > > > > > > > Best regards, > > > > Robert, Rui, Ufuk and Weijie > > > > > > [1]https://issues.apache.org/jira/browse/FLINK-35602 > > > > [2] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release > > >
[jira] [Created] (FLINK-35631) KafkaSource parameter partition.discovery.interval.ms with a default value of 5 minutes does not take effect
elon_X created FLINK-35631: -- Summary: KafkaSource parameter partition.discovery.interval.ms with a default value of 5 minutes does not take effect Key: FLINK-35631 URL: https://issues.apache.org/jira/browse/FLINK-35631 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.17.2 Reporter: elon_X When I start a stream program to consume Kafka (flink-connector-kafka-3.1-SNAPSHOT) the Flink task does not automatically detect new partitions after Kafka adds partitions. *Reason* In the {{{}KafkaSourceBuilder{}}}, this parameter is checked to see if it has been overridden. Since I did not set this parameter, even though it is {{{}CONTINUOUS_UNBOUNDED{}}}, it still sets {{{}partition.discovery.interval.ms = -1{}}}. In the {{{}KafkaSourceEnumerator{}}}, the value of {{partition.discovery.interval.ms}} is then -1, instead of the default value of 5 minutes, so automatic partition discovery does not work, and the default value of 5 minutes for {{partition.discovery.interval.ms}} is meaningless. A possible solution is to set {{partition.discovery.interval.ms = -1}} only if {{boundedness == Boundedness.BOUNDED}} is true. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-444: Native file copy support
Hi Lorenzo, > As I mentioned above, in the end I think hardcoding in Flink support for the `cpulimit` doesn't seem appropriate and at the same time it is not actually necessary. I agree. What's suggested is similar to the way where Flink currently allow users to limit resource usage on s3a filesystem. I appreciate the effort spent in looking into limiting s5cmd using cpu limit. > To use `s5cmd` with `cpulimit` via some bash script wrapping the `s5cmd`. Users instead of passing a path to the `s5cmd` via the `s3.s5cmd.path` config option, can pass a path to that wrapper bash script. That wrapper could then apply `cpulimit` to the `s5cmd` call (or use a completely different mechanism, like CGroups if desired). Would appreciate if we have the CPU limit captured under rejected alternatives and also mention the option of implementing limits within wrapper. It can be very useful for users who may want to implement said limiting. Best regards Keith Lee On Mon, Jun 17, 2024 at 5:50 PM Piotr Nowojski wrote: > Hi all! > > >> Do you know if there is a way to limit the CPU utilisation of s5cmd? I > see > >> worker and concurrency configuration but these do not map directly to > cap > >> in CPU usage. The experience for feature user in this case will be one > of > >> trial and error. > > > > Those are good points. As a matter of fact, shortly after publishing this > FLIP, we started experimenting with using `cpulimit` to achieve just that. > If everything will work out fine, we are planning to expose this as a > configuration option for the S3 file system. I've added that to the FLIP. > > Before finally putting this FLIP under vote, I would like to revisit/share > with you our production experience when it comes to preventing s5cmd from > overloading the machines. > > When trying out `cpulimit`, we had some problems on our particular setup to > make `cpulimit` behave the way we want. I think in general that's still a > viable option, but we have internally settled upon just limiting the number > of workers via: > > "s3.s5cmd.args": "-r 0 --numworkers 5" > > That seems to work reliably with a combination of FLINK-35501 [1], without > affecting the actual performance. But I will be curious to learn what will > eventually work best for others. > > Hence, I would propose also to modify the FLIP slightly, and remove > "native" support for cpulimit from this FLIP. In other words, I would > propose to drop the config option: > > public static final ConfigOption S5CMD_CPULIMIT = > ConfigOptions.key("s3.s5cmd.cpulimit") > .doubleType() > .withDescription( > "Optional cpulimit value to set for the s5cmd > to prevent TaskManager from overloading"); > > The thing is, that it seems a bit platform specific, and there is actually > no need for this dedicated config option. Instead I would propose to > document in some best practices section two solutions to limit the resource > usage of the s5cmd. > > 1. Adding ` --numworkers X` to the `s3.s5cmd.args` ConfigOption and/or > adjusting IO Thread Pool size (thanks to [1]) > 2. To use `s5cmd` with `cpulimit` via some bash script wrapping the > `s5cmd`. Users instead of passing a path to the `s5cmd` via the > `s3.s5cmd.path` config option, can pass a path to that wrapper bash script. > That wrapper could then apply `cpulimit` to the `s5cmd` call (or use a > completely different mechanism, like CGroups if desired). > > As I mentioned above, in the end I think hardcoding in Flink support for > the `cpulimit` doesn't seem appropriate and at the same time it is not > actually necessary. > > WDYT? > > Best, > Piotrek > > > [1] https://issues.apache.org/jira/browse/FLINK-35501 > > pt., 17 maj 2024 o 15:27 > napisał(a): > > > Perfectly agree with all your considerations. > > Wee said. > > > > Thank you! > > On May 16, 2024 at 10:53 +0200, Piotr Nowojski , > > wrote: > > > Hi Lorenzo, > > > > > > > • concerns about memory and CPU used out of Flink's control > > > > > > Please note that using AWS SDKv2 would have the same concerns. In both > > > cases Flink can control only to a certain extent what either the SDKv2 > > > library does under the hood or the s5cmd process, via configuration > > > parameters. SDKv2, if configured improperly, could also overload TMs > CPU, > > > and it would also use extra memory. To an extent that also applies to > the > > > current way we are downloading/uploading files from the S3. > > > > > > > • deployment concerns from the usability perspective (would need to > > > install s5cmd on all TMs prior to the job deploy) > > > > > > Yes, that's the downside. > > > > > > > Also, invoking an external binary would incur in some performance > > > degradation possibly. Might it be that using AWS SDK would not, given > its > > > Java implementation, and performance could be similar? > > > > > > We haven't run the benchmarks for very fast checkpointing and we > haven't > > >
Re: [DISCUSS] FLIP-444: Native file copy support
I meant to address Piotr on the above, apologies! Best regards Keith Lee On Tue, Jun 18, 2024 at 8:06 AM Keith Lee wrote: > Hi Lorenzo, > > > As I mentioned above, in the end I think hardcoding in Flink support for > the `cpulimit` doesn't seem appropriate and at the same time it is not > actually necessary. > > I agree. > > What's suggested is similar to the way where Flink currently allow users > to limit resource usage on s3a filesystem. > I appreciate the effort spent in looking into limiting s5cmd using cpu > limit. > > > To use `s5cmd` with `cpulimit` via some bash script wrapping the > `s5cmd`. Users instead of passing a path to the `s5cmd` via the > `s3.s5cmd.path` config option, can pass a path to that wrapper bash script. > That wrapper could then apply `cpulimit` to the `s5cmd` call (or use a > completely different mechanism, like CGroups if desired). > > Would appreciate if we have the CPU limit captured under rejected > alternatives and also mention the option of implementing limits within > wrapper. > It can be very useful for users who may want to implement said limiting. > > Best regards > Keith Lee > > > On Mon, Jun 17, 2024 at 5:50 PM Piotr Nowojski > wrote: > >> Hi all! >> >> >> Do you know if there is a way to limit the CPU utilisation of s5cmd? I >> see >> >> worker and concurrency configuration but these do not map directly to >> cap >> >> in CPU usage. The experience for feature user in this case will be one >> of >> >> trial and error. >> > >> > Those are good points. As a matter of fact, shortly after publishing >> this >> FLIP, we started experimenting with using `cpulimit` to achieve just that. >> If everything will work out fine, we are planning to expose this as a >> configuration option for the S3 file system. I've added that to the FLIP. >> >> Before finally putting this FLIP under vote, I would like to revisit/share >> with you our production experience when it comes to preventing s5cmd from >> overloading the machines. >> >> When trying out `cpulimit`, we had some problems on our particular setup >> to >> make `cpulimit` behave the way we want. I think in general that's still a >> viable option, but we have internally settled upon just limiting the >> number >> of workers via: >> >> "s3.s5cmd.args": "-r 0 --numworkers 5" >> >> That seems to work reliably with a combination of FLINK-35501 [1], without >> affecting the actual performance. But I will be curious to learn what will >> eventually work best for others. >> >> Hence, I would propose also to modify the FLIP slightly, and remove >> "native" support for cpulimit from this FLIP. In other words, I would >> propose to drop the config option: >> >> public static final ConfigOption S5CMD_CPULIMIT = >> ConfigOptions.key("s3.s5cmd.cpulimit") >> .doubleType() >> .withDescription( >> "Optional cpulimit value to set for the s5cmd >> to prevent TaskManager from overloading"); >> >> The thing is, that it seems a bit platform specific, and there is actually >> no need for this dedicated config option. Instead I would propose to >> document in some best practices section two solutions to limit the >> resource >> usage of the s5cmd. >> >> 1. Adding ` --numworkers X` to the `s3.s5cmd.args` ConfigOption and/or >> adjusting IO Thread Pool size (thanks to [1]) >> 2. To use `s5cmd` with `cpulimit` via some bash script wrapping the >> `s5cmd`. Users instead of passing a path to the `s5cmd` via the >> `s3.s5cmd.path` config option, can pass a path to that wrapper bash >> script. >> That wrapper could then apply `cpulimit` to the `s5cmd` call (or use a >> completely different mechanism, like CGroups if desired). >> >> As I mentioned above, in the end I think hardcoding in Flink support for >> the `cpulimit` doesn't seem appropriate and at the same time it is not >> actually necessary. >> >> WDYT? >> >> Best, >> Piotrek >> >> >> [1] https://issues.apache.org/jira/browse/FLINK-35501 >> >> pt., 17 maj 2024 o 15:27 >> napisał(a): >> >> > Perfectly agree with all your considerations. >> > Wee said. >> > >> > Thank you! >> > On May 16, 2024 at 10:53 +0200, Piotr Nowojski , >> > wrote: >> > > Hi Lorenzo, >> > > >> > > > • concerns about memory and CPU used out of Flink's control >> > > >> > > Please note that using AWS SDKv2 would have the same concerns. In both >> > > cases Flink can control only to a certain extent what either the SDKv2 >> > > library does under the hood or the s5cmd process, via configuration >> > > parameters. SDKv2, if configured improperly, could also overload TMs >> CPU, >> > > and it would also use extra memory. To an extent that also applies to >> the >> > > current way we are downloading/uploading files from the S3. >> > > >> > > > • deployment concerns from the usability perspective (would need to >> > > install s5cmd on all TMs prior to the job deploy) >> > > >> > > Yes, that's the downside. >> > > >> > > > Also, invoking an
[jira] [Created] (FLINK-35630) Kafka source may reset the consume offset to earliest when the partition leader changes
tanjialiang created FLINK-35630: --- Summary: Kafka source may reset the consume offset to earliest when the partition leader changes Key: FLINK-35630 URL: https://issues.apache.org/jira/browse/FLINK-35630 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: kafka-3.2.0 Reporter: tanjialiang Kafka producer using the ack=1 option to write data to a topic. Flink Kafka source startup with the *scan.startup.mode=earliest-offset* option to consume (the Kafka *auto.offset.reset* option will be force override to earliest). If a partition leader is not available, a follower may become the new leader and this may trigger log truncation. It may cause consumers to consume offset out of range and use the *auto.offset.reset* strategy to reset the offset. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35629) Performance regression in stringRead and stringWrite
Rui Fan created FLINK-35629: --- Summary: Performance regression in stringRead and stringWrite Key: FLINK-35629 URL: https://issues.apache.org/jira/browse/FLINK-35629 Project: Flink Issue Type: Bug Components: Benchmarks Affects Versions: 1.20.0 Reporter: Rui Fan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[VOTE] Apache Flink Kubernetes Operator Release 1.9.0, release candidate #1
Hi Everyone, Please review and vote on the release candidate #1 for the version 1.9.0 of Apache Flink Kubernetes Operator, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) **Release Overview** As an overview, the release consists of the following: a) Kubernetes Operator canonical source distribution (including the Dockerfile), to be deployed to the release repository at dist.apache.org b) Kubernetes Operator Helm Chart to be deployed to the release repository at dist.apache.org c) Maven artifacts to be deployed to the Maven Central Repository d) Docker image to be pushed to dockerhub **Staging Areas to Review** The staging areas containing the above mentioned artifacts are as follows, for your review: * All artifacts for a,b) can be found in the corresponding dev repository at dist.apache.org [1] * All artifacts for c) can be found at the Apache Nexus Repository [2] * The docker image for d) is staged on github [3] All artifacts are signed with the key 21F06303B87DAFF1 [4] Other links for your review: * JIRA release notes [5] * source code tag "release-1.9.0-rc1" [6] * PR to update the website Downloads page to include Kubernetes Operator links [7] **Vote Duration** The voting time will run for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. **Note on Verification** You can follow the basic verification guide here[8]. Note that you don't need to verify everything yourself, but please make note of what you have tested together with your +- vote. Cheers! Gyula Fora [1] https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.9.0-rc1/ [2] https://repository.apache.org/content/repositories/orgapacheflink-1740/ [3] ghcr.io/apache/flink-kubernetes-operator:17129ff [4] https://dist.apache.org/repos/dist/release/flink/KEYS [5] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354417 [6] https://github.com/apache/flink-kubernetes-operator/tree/release-1.9.0-rc1 [7] https://github.com/apache/flink-web/pull/747 [8] https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
[jira] [Created] (FLINK-35628) Translate “DataGen Connector" of "DataStream Connectors" into Chinese
Hao Yu created FLINK-35628: -- Summary: Translate “DataGen Connector" of "DataStream Connectors" into Chinese Key: FLINK-35628 URL: https://issues.apache.org/jira/browse/FLINK-35628 Project: Flink Issue Type: Improvement Components: chinese-translation, Documentation Affects Versions: 1.18.0 Reporter: Hao Yu Fix For: 1.18.0 Attachments: image-2024-06-18-14-03-18-914.png The page url is [https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/datastream/datagen/] The markdown file is located in flink/docs/content.zh/docs/connectors/datastream/datagen.md -- This message was sent by Atlassian Jira (v8.20.10#820010)