[jira] [Created] (FLINK-35977) Missing an import in datastream.md
guluo created FLINK-35977: - Summary: Missing an import in datastream.md Key: FLINK-35977 URL: https://issues.apache.org/jira/browse/FLINK-35977 Project: Flink Issue Type: Bug Affects Versions: 1.20.0 Reporter: guluo In document datastream.md, we missing an import about OpenContext. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35976) StreamPhysicalOverAggregate should handle column name confliction
lincoln lee created FLINK-35976: --- Summary: StreamPhysicalOverAggregate should handle column name confliction Key: FLINK-35976 URL: https://issues.apache.org/jira/browse/FLINK-35976 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.19.1, 1.20.0 Reporter: lincoln lee Assignee: lincoln lee Fix For: 2.0.0 Duplicate column name exception occurred when use a nested over aggregate query, e.g., a repro case: {code} @Test def testNestedOverAgg(): Unit = { util.addTable(s""" |CREATE TEMPORARY TABLE src ( | a STRING, | b STRING, | ts TIMESTAMP_LTZ(3), | watermark FOR ts as ts |) WITH ( | 'connector' = 'values' |) |""".stripMargin) util.verifyExecPlan(s""" |SELECT * |FROM ( | SELECT | *, count(*) OVER (PARTITION BY a ORDER BY ts) AS c2 | FROM ( | SELECT | *, count(*) OVER (PARTITION BY a,b ORDER BY ts) AS c1 | FROM src | ) |) |""".stripMargin) } {code} {code} org.apache.flink.table.api.ValidationException: Field names must be unique. Found duplicates: [w0$o0] at org.apache.flink.table.types.logical.RowType.validateFields(RowType.java:273) at org.apache.flink.table.types.logical.RowType.(RowType.java:158) at org.apache.flink.table.types.logical.RowType.of(RowType.java:298) at org.apache.flink.table.types.logical.RowType.of(RowType.java:290) at org.apache.flink.table.planner.calcite.FlinkTypeFactory$.toLogicalRowType(FlinkTypeFactory.scala:678) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamPhysicalOverAggregate.translateToExecNode(StreamPhysicalOverAggregate.scala:57) at org.apache.flink.table.planner.plan.nodes.physical.FlinkPhysicalRel.translateToExecNode(FlinkPhysicalRel.scala:53) at org.apache.flink.table.planner.plan.nodes.physical.FlinkPhysicalRel.translateToExecNode$(FlinkPhysicalRel.scala:52) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamPhysicalOverAggregateBase.translateToExecNode(StreamPhysicalOverAggregateBase.scala:35) at org.apache.flink.table.planner.plan.nodes.exec.ExecNodeGraphGenerator.generate(ExecNodeGraphGenerator.java:74) at org.apache.flink.table.planner.plan.nodes.exec.ExecNodeGraphGenerator.generate(ExecNodeGraphGenerator.java:54) at org.apache.flink.table.planner.delegation.PlannerBase.translateToExecNodeGraph(PlannerBase.scala:407) at org.apache.flink.table.planner.utils.TableTestUtilBase.assertPlanEquals(TableTestBase.scala:1076) at org.apache.flink.table.planner.utils.TableTestUtilBase.doVerifyPlan(TableTestBase.scala:920) at org.apache.flink.table.planner.utils.TableTestUtilBase.verifyExecPlan(TableTestBase.scala:675) at org.apache.flink.table.planner.plan.stream.sql.agg.OverAggregateTest.testNestedOverAgg(OverAggregateTest.scala:460) {code} This is a similar case In https://issues.apache.org/jira/browse/FLINK-22121, but missed the fixing in streaming over agg scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35975) MongoFilterPushDownVisitorTest fails
Jiabao Sun created FLINK-35975: -- Summary: MongoFilterPushDownVisitorTest fails Key: FLINK-35975 URL: https://issues.apache.org/jira/browse/FLINK-35975 Project: Flink Issue Type: Bug Components: Connectors / MongoDB Reporter: Jiabao Sun Assignee: Jiabao Sun Fix For: mongodb-1.3.0 https://github.com/apache/flink-connector-mongodb/actions/runs/10127471893/job/28005305311 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35974) PyFlink YARN per-job on Docker test failed because docker-compose command not found
Weijie Guo created FLINK-35974: -- Summary: PyFlink YARN per-job on Docker test failed because docker-compose command not found Key: FLINK-35974 URL: https://issues.apache.org/jira/browse/FLINK-35974 Project: Flink Issue Type: Improvement Reporter: Weijie Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35973) HeartbeatManagerTest.testHeartbeatTimeout failed
Weijie Guo created FLINK-35973: -- Summary: HeartbeatManagerTest.testHeartbeatTimeout failed Key: FLINK-35973 URL: https://issues.apache.org/jira/browse/FLINK-35973 Project: Flink Issue Type: Improvement Components: Build System / CI Affects Versions: 2.0.0 Reporter: Weijie Guo Jul 26 03:37:22 java.lang.AssertionError Jul 26 03:37:22 at org.junit.Assert.fail(Assert.java:87) Jul 26 03:37:22 at org.junit.Assert.assertTrue(Assert.java:42) Jul 26 03:37:22 at org.junit.Assert.assertFalse(Assert.java:65) Jul 26 03:37:22 at org.junit.Assert.assertFalse(Assert.java:75) Jul 26 03:37:22 at org.apache.flink.runtime.heartbeat.HeartbeatManagerTest.testHeartbeatTimeout(HeartbeatManagerTest.java:200) Jul 26 03:37:22 at java.lang.reflect.Method.invoke(Method.java:498) Jul 26 03:37:22 at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35972) Directly specify key and process runnable in async state execution model
Zakelly Lan created FLINK-35972: --- Summary: Directly specify key and process runnable in async state execution model Key: FLINK-35972 URL: https://issues.apache.org/jira/browse/FLINK-35972 Project: Flink Issue Type: Sub-task Reporter: Zakelly Lan Assignee: Zakelly Lan In asynchronous execution model (FLIP-425) for state, the key is managed by the framework and automatically assigned before {{processElement}} or callback running. However in some cases, there should be an internal interface that allows to switch key and start a new process with this key. It is useful when implementing something like mini-batch, which triggers a batch of real logic for gathered elements at one time instead of the {{processElement}} for each element. Additionally, the processing invoked by this interface should have a higher priority than the normal input, as the semantics of this processing method is to continue processing the previously half-processed data. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35971) Lose precision in PostgresParallelSource
wuzhenhua created FLINK-35971: - Summary: Lose precision in PostgresParallelSource Key: FLINK-35971 URL: https://issues.apache.org/jira/browse/FLINK-35971 Project: Flink Issue Type: Bug Components: Flink CDC Environment: Flink version 1.14.0 Flink CDC version 2.4.1 Database and its version PostgreSQL 10.23 Reporter: wuzhenhua Reproduce step: {code:java} CREATE TABLE IF NOT EXISTS s1.t1 ( id bigint NOT NULL, tm time without time zone, CONSTRAINT t1_pkey PRIMARY KEY (id) ) {code} {code:java} INSERT INTO s1.t1 VALUES(1, '10:33:23.660863') {code} {code:java} val prop = new Properties() val pgSource = PostgresSourceBuilder.PostgresIncrementalSource.builder[String] .hostname("localhost") .port(5432) .database("cdc_test") .schemaList("s1") .tableList("s1.t1") .username("postgres") .password("postgres") .deserializer(new JsonDebeziumDeserializationSchema) .slotName("aaa") .decodingPluginName("pgoutput") .debeziumProperties(prop) .build() env.enableCheckpointing(3000) env.fromSource(pgSource, WatermarkStrategy.noWatermarks[String](), "PostgresParallelSource").print() env.execute("Print Postgres Snapshot + WAL") {code} expect to see: {code:java} "after":{"id":1,"tm":38003660863} {code} i see instead: {code:java} "after":{"id":1,"tm":3800366} {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35970) Update documentation about FLINK-26050 (small file compaction)
Roman Khachatryan created FLINK-35970: - Summary: Update documentation about FLINK-26050 (small file compaction) Key: FLINK-35970 URL: https://issues.apache.org/jira/browse/FLINK-35970 Project: Flink Issue Type: Improvement Components: Documentation, Runtime / State Backends Affects Versions: 1.20.0 Reporter: Roman Khachatryan Fix For: 2.0.0 This is a follow-up on FLINK-26050 to update the documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35969) Remove deprecated dataset based API from State Processor API
Gabor Somogyi created FLINK-35969: - Summary: Remove deprecated dataset based API from State Processor API Key: FLINK-35969 URL: https://issues.apache.org/jira/browse/FLINK-35969 Project: Flink Issue Type: Improvement Components: API / State Processor Affects Versions: 2.0.0 Reporter: Gabor Somogyi -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35968) Remove flink-cdc-runtime depedency from connectors
Hongshun Wang created FLINK-35968: - Summary: Remove flink-cdc-runtime depedency from connectors Key: FLINK-35968 URL: https://issues.apache.org/jira/browse/FLINK-35968 Project: Flink Issue Type: Improvement Components: Flink CDC Affects Versions: cdc-3.1.1 Reporter: Hongshun Wang Fix For: cdc-3.2.0 Current, flink-cdc-source-connectors and flink-cdc-pipeline-connectors depends on flink-cdc-runtime, which is not ideal for design and is redundant. This issue is aimed to remove it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35967) Shade other Flink connectors in CDC Pipeline connectors
yux created FLINK-35967: --- Summary: Shade other Flink connectors in CDC Pipeline connectors Key: FLINK-35967 URL: https://issues.apache.org/jira/browse/FLINK-35967 Project: Flink Issue Type: Bug Components: Flink CDC Reporter: yux Currently, CDC Pipeline connectors bundles other Flink source / sink connectors without shading / relocating them, which hassles users since they have to ensure there's no incompatible connectors loaded from Flink /lib or elsewhere. Shading these commonly used Flink connectors (including MySQL, StarRocks) would be a solution, just like what Doris Pipeline connector does now. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35966) Introduce the TASKS for TaskManagerLoadBalanceMode enum
RocMarshal created FLINK-35966: -- Summary: Introduce the TASKS for TaskManagerLoadBalanceMode enum Key: FLINK-35966 URL: https://issues.apache.org/jira/browse/FLINK-35966 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: RocMarshal -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35965) Add ENDSWITH function
Dylan He created FLINK-35965: Summary: Add ENDSWITH function Key: FLINK-35965 URL: https://issues.apache.org/jira/browse/FLINK-35965 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Dylan He Add ENDSWITH function. Returns if {{expr}} ends with {{endExpr}}. Example: {code:sql} > SELECT ENDSWITH('SparkSQL', 'SQL'); true > SELECT ENDSWITH('SparkSQL', 'sql'); false {code} Syntax: {code:sql} ENDSWITH(expr, endExpr) {code} Arguments: * {{expr}}: A STRING or BINARY expression. * {{endExpr}}: A STRING or BINARY expression. Returns: A BOOLEAN. {{expr}} and {{endExpr}} should have same type. If {{expr}} or {{endExpr}} is NULL, the result is NULL. If {{endExpr}} is the empty, the result is true. See also: * [Spark|https://spark.apache.org/docs/3.5.1/sql-ref-functions-builtin.html#string-functions] * [Databricks|https://docs.databricks.com/en/sql/language-manual/functions/endswith.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35964) Add STARTSWITH function
Dylan He created FLINK-35964: Summary: Add STARTSWITH function Key: FLINK-35964 URL: https://issues.apache.org/jira/browse/FLINK-35964 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Dylan He Add STARTSWITH function. Returns if {{expr}} begins with {{startExpr}}. Example: {code:sql} > SELECT STARTSWITH('SparkSQL', 'Spark'); true > SELECT STARTSWITH('SparkSQL', 'spark'); false {code} Syntax: {code:sql} STARTSWITH(expr, startExpr) {code} Arguments: * {{expr}}: A STRING or BINARY expression. * {{startExpr}}: A STRING or BINARY expression. Returns: A BOOLEAN. {{expr}} and {{startExpr}} should have same type. If {{expr}} or {{startExpr}} is NULL, the result is NULL. If {{startExpr}} is the empty, the result is true. See also: * [Spark|https://spark.apache.org/docs/3.5.1/sql-ref-functions-builtin.html#string-functions] * [Databricks|https://docs.databricks.com/en/sql/language-manual/functions/startswith.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35963) Add REGEXP_SUBSTR function
Dylan He created FLINK-35963: Summary: Add REGEXP_SUBSTR function Key: FLINK-35963 URL: https://issues.apache.org/jira/browse/FLINK-35963 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Dylan He Add REGEXP_SUBSTR function. Returns the first substring in {{str}} that matches {{regex}}. Example: {code:sql} > SELECT REGEXP_SUBSTR('Steven Jones and Stephen Smith are the best players', > 'Ste(v|ph)en'); Steven > SELECT REGEXP_SUBSTR('Mary had a little lamb', 'Ste(v|ph)en'); NULL {code} Syntax: {code:sql} REGEXP_SUBSTR(str, regex) {code} Arguments: * {{str}}: A STRING expression to be matched. * {{regex}}: A STRING expression with a pattern. Returns: A STRING. In case of a malformed {{regex}} the function returns an error. If either argument is NULL or the pattern is not found, the result is NULL. See also: * [Spark|https://spark.apache.org/docs/3.5.1/sql-ref-functions-builtin.html#string-functions] * [Databricks|https://docs.databricks.com/en/sql/language-manual/functions/regexp_substr.html] * [MySQL|https://dev.mysql.com/doc/refman/8.4/en/regexp.html#function_regexp-substr] * [PostgreSQL|https://www.postgresql.org/docs/16/functions-matching.html#FUNCTIONS-POSIX-REGEXP] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35962) Add REGEXP_INSTR function
Dylan He created FLINK-35962: Summary: Add REGEXP_INSTR function Key: FLINK-35962 URL: https://issues.apache.org/jira/browse/FLINK-35962 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Dylan He Add REGEXP_INSTR function. Returns the position of the first substring in {{str}} that matches {{regex}}. Example: {code:sql} > SELECT REGEXP_INSTR('Steven Jones and Stephen Smith are the best players', > 'Ste(v|ph)en'); 1 > SELECT REGEXP_INSTR('Mary had a little lamb', 'Ste(v|ph)en'); 0 {code} Syntax: {code:sql} REGEXP_INSTR(str, regex) {code} Arguments: * {{str}}: A STRING expression to be matched. * {{regex}}: A STRING expression with a pattern. Returns: A STRING. Result indexes begin at 1, 0 if there is no match. In case of a malformed {{regex}} the function returns an error. If either argument is NULL, the result is NULL. See also: * [Spark|https://spark.apache.org/docs/3.5.1/sql-ref-functions-builtin.html#string-functions] * [Databricks|https://docs.databricks.com/en/sql/language-manual/functions/regexp_instr.html] * [MySQL|https://dev.mysql.com/doc/refman/8.4/en/regexp.html#function_regexp-instr] * [PostgreSQL|https://www.postgresql.org/docs/16/functions-matching.html#FUNCTIONS-POSIX-REGEXP] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35959) CLONE - Updates the docs stable version
Weijie Guo created FLINK-35959: -- Summary: CLONE - Updates the docs stable version Key: FLINK-35959 URL: https://issues.apache.org/jira/browse/FLINK-35959 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo Assignee: lincoln lee Update docs to "stable" in {{docs/config.toml}} in the branch of the _just-released_ version: * Change V{{{}ersion{}}} from {{{}x.y-SNAPSHOT }}to \{{{}x.y.z{}}}, i.e. {{1.6-SNAPSHOT}} to {{1.6.0}} * Change V{{{}ersionTitle{}}} from {{x.y-SNAPSHOT}} to {{{}x.y{}}}, i.e. {{1.6-SNAPSHOT}} to {{1.6}} * Change Branch from {{master}} to {{{}release-x.y{}}}, i.e. {{master}} to {{release-1.6}} * Change {{baseURL}} from {{//[ci.apache.org/projects/flink/flink-docs-master|http://ci.apache.org/projects/flink/flink-docs-master]}} to {{//[ci.apache.org/projects/flink/flink-docs-release-x.y|http://ci.apache.org/projects/flink/flink-docs-release-x.y]}} * Change {{javadocs_baseurl}} from {{//[ci.apache.org/projects/flink/flink-docs-master|http://ci.apache.org/projects/flink/flink-docs-master]}} to {{//[ci.apache.org/projects/flink/flink-docs-release-x.y|http://ci.apache.org/projects/flink/flink-docs-release-x.y]}} * Change {{IsStable}} to {{true}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35960) CLONE - Start End of Life discussion thread for now outdated Flink minor version
Weijie Guo created FLINK-35960: -- Summary: CLONE - Start End of Life discussion thread for now outdated Flink minor version Key: FLINK-35960 URL: https://issues.apache.org/jira/browse/FLINK-35960 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo The idea is to discuss whether we should do a final release for the now not supported minor version in the community. Such a minor release shouldn't be covered by the current minor version release managers. Their only responsibility is to trigger the discussion. The intention of a final patch release for the now unsupported Flink minor version is to flush out all the fixes that didn't end up in the previous release. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35957) CLONE - Other announcements
Weijie Guo created FLINK-35957: -- Summary: CLONE - Other announcements Key: FLINK-35957 URL: https://issues.apache.org/jira/browse/FLINK-35957 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo Assignee: Lincoln Lee h3. Recordkeeping Use [reporter.apache.org|https://reporter.apache.org/addrelease.html?flink] to seed the information about the release into future project reports. (Note: Only PMC members have access report releases. If you do not have access, ask on the mailing list for assistance.) h3. Flink blog Major or otherwise important releases should have a blog post. Write one if needed for this particular release. Minor releases that don’t introduce new major functionality don’t necessarily need to be blogged (see [flink-web PR #581 for Flink 1.15.3|https://github.com/apache/flink-web/pull/581] as an example for a minor release blog post). Please make sure that the release notes of the documentation (see section "Review and update documentation") are linked from the blog post of a major release. We usually include the names of all contributors in the announcement blog post. Use the following command to get the list of contributors: {code} # first line is required to make sort first with uppercase and then lower export LC_ALL=C export FLINK_PREVIOUS_RELEASE_BRANCH= export FLINK_CURRENT_RELEASE_BRANCH= # e.g. # export FLINK_PREVIOUS_RELEASE_BRANCH=release-1.17 # export FLINK_CURRENT_RELEASE_BRANCH=release-1.18 git log $(git merge-base master $FLINK_PREVIOUS_RELEASE_BRANCH)..$(git show-ref --hash ${FLINK_CURRENT_RELEASE_BRANCH}) --pretty=format:"%an%n%cn" | sort -u | paste -sd, | sed "s/\,/\, /g" {code} h3. Social media Tweet, post on Facebook, LinkedIn, and other platforms. Ask other contributors to do the same. h3. Flink Release Wiki page Add a summary of things that went well or that went not so well during the release process. This can include feedback from contributors but also more generic things like the release have taken longer than initially anticipated (and why) to give a bit of context to the release process. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35958) CLONE - Update reference data for Migration Tests
Weijie Guo created FLINK-35958: -- Summary: CLONE - Update reference data for Migration Tests Key: FLINK-35958 URL: https://issues.apache.org/jira/browse/FLINK-35958 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo Assignee: lincoln lee Fix For: 1.20.0, 1.19.1 Update migration tests in master to cover migration from new version. Since 1.18, this step could be done automatically with the following steps. For more information please refer to [this page.|https://github.com/apache/flink/blob/master/flink-test-utils-parent/flink-migration-test-utils/README.md] # {*}On the published release tag (e.g., release-1.16.0){*}, run {panel} {panel} |{{$ mvn clean }}{{package}} {{{}-Pgenerate-migration-test-data -Dgenerate.version={}}}{{{}1.16{}}} {{-nsu -Dfast -DskipTests}}| The version (1.16 in the command above) should be replaced with the target one. # Modify the content of the file [apache/flink:flink-test-utils-parent/flink-migration-test-utils/src/main/resources/most_recently_published_version|https://github.com/apache/flink/blob/master/flink-test-utils-parent/flink-migration-test-utils/src/main/resources/most_recently_published_version] to the latest version (it would be "v1_16" if sticking to the example where 1.16.0 was released). # Commit the modification in step a and b with "{_}[release] Generate reference data for state migration tests based on release-1.xx.0{_}" to the corresponding release branch (e.g. {{release-1.16}} in our example), replace "xx" with the actual version (in this example "16"). You should use the Jira issue ID in case of [release] as the commit message's prefix if you have a dedicated Jira issue for this task. # Cherry-pick the commit to the master branch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35954) CLONE - Merge website pull request
Weijie Guo created FLINK-35954: -- Summary: CLONE - Merge website pull request Key: FLINK-35954 URL: https://issues.apache.org/jira/browse/FLINK-35954 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo Assignee: lincoln lee Merge the website pull request to [list the release|http://flink.apache.org/downloads.html]. Make sure to regenerate the website as well, as it isn't build automatically. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35955) CLONE - Remove outdated versions
Weijie Guo created FLINK-35955: -- Summary: CLONE - Remove outdated versions Key: FLINK-35955 URL: https://issues.apache.org/jira/browse/FLINK-35955 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo h4. dist.apache.org For a new major release remove all release files older than 2 versions, e.g., when releasing 1.7, remove all releases <= 1.5. For a new bugfix version remove all release files for previous bugfix releases in the same series, e.g., when releasing 1.7.1, remove the 1.7.0 release. # If you have not already, check out the Flink section of the {{release}} repository on {{[dist.apache.org|http://dist.apache.org/]}} via Subversion. In a fresh directory: {code} svn checkout https://dist.apache.org/repos/dist/release/flink --depth=immediates cd flink {code} # Remove files for outdated releases and commit the changes. {code} svn remove flink- svn commit {code} # Verify that files are [removed|https://dist.apache.org/repos/dist/release/flink] (!) Remember to remove the corresponding download links from the website. h4. CI Disable the cron job for the now-unsupported version from (tools/azure-pipelines/[build-apache-repo.yml|https://github.com/apache/flink/blob/master/tools/azure-pipelines/build-apache-repo.yml]) in the respective branch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35961) CLONE - Build 1.19 docs in GitHub Action and mark 1.19 as stable in docs
Weijie Guo created FLINK-35961: -- Summary: CLONE - Build 1.19 docs in GitHub Action and mark 1.19 as stable in docs Key: FLINK-35961 URL: https://issues.apache.org/jira/browse/FLINK-35961 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo Assignee: lincoln lee -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35953) CLONE - Update japicmp configuration
Weijie Guo created FLINK-35953: -- Summary: CLONE - Update japicmp configuration Key: FLINK-35953 URL: https://issues.apache.org/jira/browse/FLINK-35953 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo Assignee: lincoln lee Fix For: 1.20.0, 1.19.1 Update the japicmp reference version and wipe exclusions / enable API compatibility checks for {{@PublicEvolving}} APIs on the corresponding SNAPSHOT branch with the {{update_japicmp_configuration.sh}} script (see below). For a new major release (x.y.0), run the same command also on the master branch for updating the japicmp reference version and removing out-dated exclusions in the japicmp configuration. Make sure that all Maven artifacts are already pushed to Maven Central. Otherwise, there's a risk that CI fails due to missing reference artifacts. {code:bash} tools $ NEW_VERSION=$RELEASE_VERSION releasing/update_japicmp_configuration.sh tools $ cd ..$ git add *$ git commit -m "Update japicmp configuration for $RELEASE_VERSION" {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35956) CLONE - Apache mailing lists announcements
Weijie Guo created FLINK-35956: -- Summary: CLONE - Apache mailing lists announcements Key: FLINK-35956 URL: https://issues.apache.org/jira/browse/FLINK-35956 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo Assignee: lincoln lee Announce on the {{dev@}} mailing list that the release has been finished. Announce on the release on the {{user@}} mailing list, listing major improvements and contributions. Announce the release on the [annou...@apache.org|mailto:annou...@apache.org] mailing list. {panel} {panel} |{{From: Release Manager}} {{To: dev@flink.apache.org, u...@flink.apache.org, user...@flink.apache.org, annou...@apache.org}} {{Subject: [ANNOUNCE] Apache Flink 1.2.3 released}} {{The Apache Flink community is very happy to announce the release of Apache Flink 1.2.3, which is the third bugfix release for the Apache Flink 1.2 series.}} {{Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.}} {{The release is available for download at:}} {{[https://flink.apache.org/downloads.html]}} {{Please check out the release blog post for an overview of the improvements for this bugfix release:}} {{}} {{The full release notes are available in Jira:}} {{}} {{We would like to thank all contributors of the Apache Flink community who made this release possible!}} {{Feel free to reach out to the release managers (or respond to this thread) with feedback on the release process. Our goal is to constantly improve the release process. Feedback on what could be improved or things that didn't go so well are appreciated.}} {{Regards,}} {{Release Manager}}| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35952) Promote release 1.20
Weijie Guo created FLINK-35952: -- Summary: Promote release 1.20 Key: FLINK-35952 URL: https://issues.apache.org/jira/browse/FLINK-35952 Project: Flink Issue Type: New Feature Affects Versions: 1.18.0 Reporter: Weijie Guo Assignee: lincoln lee Once the release has been finalized (FLINK-32920), the last step of the process is to promote the release within the project and beyond. Please wait for 24h after finalizing the release in accordance with the [ASF release policy|http://www.apache.org/legal/release-policy.html#release-announcements]. *Final checklist to declare this issue resolved:* # Website pull request to [list the release|http://flink.apache.org/downloads.html] merged # Release announced on the user@ mailing list. # Blog post published, if applicable. # Release recorded in [reporter.apache.org|https://reporter.apache.org/addrelease.html?flink]. # Release announced on social media. # Completion declared on the dev@ mailing list. # Update Homebrew: [https://docs.brew.sh/How-To-Open-a-Homebrew-Pull-Request] (seems to be done automatically - at least for minor releases for both minor and major releases) # Updated the japicmp configuration ** corresponding SNAPSHOT branch japicmp reference version set to the just released version, and API compatibiltity checks for {{@PublicEvolving}} was enabled ** (minor version release only) master branch japicmp reference version set to the just released version ** (minor version release only) master branch japicmp exclusions have been cleared # Update the list of previous version in {{docs/config.toml}} on the master branch. # Set {{show_outdated_warning: true}} in {{docs/config.toml}} in the branch of the _now deprecated_ Flink version (i.e. 1.16 if 1.18.0 is released) # Update stable and master alias in [https://github.com/apache/flink/blob/master/.github/workflows/docs.yml] # Open discussion thread for End of Life for Unsupported version (i.e. 1.16) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35951) Flink CDC startup exception using jar package
huxx created FLINK-35951: Summary: Flink CDC startup exception using jar package Key: FLINK-35951 URL: https://issues.apache.org/jira/browse/FLINK-35951 Project: Flink Issue Type: Bug Components: Flink CDC Affects Versions: cdc-3.1.1 Environment: jdk 17 flink 1.18 flinkcdc 3.0.0 Reporter: huxx Attachments: image-2024-08-01-15-01-25-795.png, image-2024-08-01-15-10-35-824.png I integrated Flink with SpringBoot to complete my project because I only used a small part of Flink's functionality, namely the data extraction feature in Flink CDC. I don't want to build another Flink environment to manage and maintain, I want to integrate this feature with other businesses and manage it through Spring. I can use the Idea development tool to package and start projects normally. I use the spring boot man plugin tool to package, but when I start it using Java Jar and execute StreamExecutionEnvironment. Execute(), it prompts that the class cannot be found. However, this class does exist and can be packaged !image-2024-08-01-15-10-35-824.png! {code:java} org.apache.flink.runtime.client.JobInitializationException: Could not start the JobMaster. at org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.lambda$new$0(DefaultJobMasterServiceProcess.java:97) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1773) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: java.util.concurrent.CompletionException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionConfig at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1770) ... 3 more Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionConfig at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321) at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:114) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ... 3 more Caused by: java.lang.ClassNotFoundException: org.apache.flink.api.common.ExecutionConfig at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:467) at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:78) at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2045) at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1909) at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2235) at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1744) at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:514) at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:472) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:539) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:527) at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:67) at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:101) at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:122) at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:379) at org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:356
[jira] [Created] (FLINK-35950) CLONE - Publish the Dockerfiles for the new release
Weijie Guo created FLINK-35950: -- Summary: CLONE - Publish the Dockerfiles for the new release Key: FLINK-35950 URL: https://issues.apache.org/jira/browse/FLINK-35950 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo Assignee: lincoln lee Note: the official Dockerfiles fetch the binary distribution of the target Flink version from an Apache mirror. After publishing the binary release artifacts, mirrors can take some hours to start serving the new artifacts, so you may want to wait to do this step until you are ready to continue with the "Promote the release" steps in the follow-up Jira. Follow the [release instructions in the flink-docker repo|https://github.com/apache/flink-docker#release-workflow] to build the new Dockerfiles and send an updated manifest to Docker Hub so the new images are built and published. h3. Expectations * Dockerfiles in [flink-docker|https://github.com/apache/flink-docker] updated for the new Flink release and pull request opened on the Docker official-images with an updated manifest -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35948) CLONE - Deploy artifacts to Maven Central Repository
Weijie Guo created FLINK-35948: -- Summary: CLONE - Deploy artifacts to Maven Central Repository Key: FLINK-35948 URL: https://issues.apache.org/jira/browse/FLINK-35948 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo Assignee: lincoln lee Use the [Apache Nexus repository|https://repository.apache.org/] to release the staged binary artifacts to the Maven Central repository. In the Staging Repositories section, find the relevant release candidate orgapacheflink-XXX entry and click Release. Drop all other release candidates that are not being released. h3. Deploy source and binary releases to dist.apache.org Copy the source and binary releases from the dev repository to the release repository at [dist.apache.org|http://dist.apache.org/] using Subversion. {code:java} $ svn move -m "Release Flink ${RELEASE_VERSION}" https://dist.apache.org/repos/dist/dev/flink/flink-${RELEASE_VERSION}-rc${RC_NUM} https://dist.apache.org/repos/dist/release/flink/flink-${RELEASE_VERSION} {code} (Note: Only PMC members have access to the release repository. If you do not have access, ask on the mailing list for assistance.) h3. Remove old release candidates from [dist.apache.org|http://dist.apache.org/] Remove the old release candidates from [https://dist.apache.org/repos/dist/dev/flink] using Subversion. {code:java} $ svn checkout https://dist.apache.org/repos/dist/dev/flink --depth=immediates $ cd flink $ svn remove flink-${RELEASE_VERSION}-rc* $ svn commit -m "Remove old release candidates for Apache Flink ${RELEASE_VERSION} {code} h3. Expectations * Maven artifacts released and indexed in the [Maven Central Repository|https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.flink%22] (usually takes about a day to show up) * Source & binary distributions available in the release repository of [https://dist.apache.org/repos/dist/release/flink/] * Dev repository [https://dist.apache.org/repos/dist/dev/flink/] is empty * Website contains links to new release binaries and sources in download page * (for minor version updates) the front page references the correct new major release version and directs to the correct link -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35949) CLONE - Create Git tag and mark version as released in Jira
Weijie Guo created FLINK-35949: -- Summary: CLONE - Create Git tag and mark version as released in Jira Key: FLINK-35949 URL: https://issues.apache.org/jira/browse/FLINK-35949 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo Assignee: lincoln lee Create and push a new Git tag for the released version by copying the tag for the final release candidate, as follows: {code:java} $ git tag -s "release-${RELEASE_VERSION}" refs/tags/${TAG}^{} -m "Release Flink ${RELEASE_VERSION}" $ git push refs/tags/release-${RELEASE_VERSION} {code} In JIRA, inside [version management|https://issues.apache.org/jira/plugins/servlet/project-config/FLINK/versions], hover over the current release and a settings menu will appear. Click Release, and select today’s date. (Note: Only PMC members have access to the project administration. If you do not have access, ask on the mailing list for assistance.) If PRs have been merged to the release branch after the the last release candidate was tagged, make sure that the corresponding Jira tickets have the correct Fix Version set. h3. Expectations * Release tagged in the source code repository * Release version finalized in JIRA. (Note: Not all committers have administrator access to JIRA. If you end up getting permissions errors ask on the mailing list for assistance) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35947) CLONE - Deploy Python artifacts to PyPI
Weijie Guo created FLINK-35947: -- Summary: CLONE - Deploy Python artifacts to PyPI Key: FLINK-35947 URL: https://issues.apache.org/jira/browse/FLINK-35947 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo Assignee: lincoln lee Release manager should create a PyPI account and ask the PMC add this account to pyflink collaborator list with Maintainer role (The PyPI admin account info can be found here. NOTE, only visible to PMC members) to deploy the Python artifacts to PyPI. The artifacts could be uploaded using twine([https://pypi.org/project/twine/]). To install twine, just run: {code:java} pip install --upgrade twine==1.12.0 {code} Download the python artifacts from dist.apache.org and upload it to pypi.org: {code:java} svn checkout https://dist.apache.org/repos/dist/dev/flink/flink-${RELEASE_VERSION}-rc${RC_NUM} cd flink-${RELEASE_VERSION}-rc${RC_NUM} cd python #uploads wheels for f in *.whl; do twine upload --repository-url https://upload.pypi.org/legacy/ $f $f.asc; done #upload source packages twine upload --repository-url https://upload.pypi.org/legacy/ apache-flink-libraries-${RELEASE_VERSION}.tar.gz apache-flink-libraries-${RELEASE_VERSION}.tar.gz.asc twine upload --repository-url https://upload.pypi.org/legacy/ apache-flink-${RELEASE_VERSION}.tar.gz apache-flink-${RELEASE_VERSION}.tar.gz.asc {code} If upload failed or incorrect for some reason (e.g. network transmission problem), you need to delete the uploaded release package of the same version (if exists) and rename the artifact to \{{{}apache-flink-${RELEASE_VERSION}.post0.tar.gz{}}}, then re-upload. (!) Note: re-uploading to pypi.org must be avoided as much as possible because it will cause some irreparable problems. If that happens, users cannot install the apache-flink package by explicitly specifying the package version, i.e. the following command "pip install apache-flink==${RELEASE_VERSION}" will fail. Instead they have to run "pip install apache-flink" or "pip install apache-flink==${RELEASE_VERSION}.post0" to install the apache-flink package. h3. Expectations * Python artifacts released and indexed in the [PyPI|https://pypi.org/project/apache-flink/] Repository -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35946) Finalize release 1.20.0
Weijie Guo created FLINK-35946: -- Summary: Finalize release 1.20.0 Key: FLINK-35946 URL: https://issues.apache.org/jira/browse/FLINK-35946 Project: Flink Issue Type: New Feature Reporter: Weijie Guo Assignee: lincoln lee Fix For: 1.19.0 Once the release candidate has been reviewed and approved by the community, the release should be finalized. This involves the final deployment of the release candidate to the release repositories, merging of the website changes, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35945) Using Flink to dock with Kafka data sources without enabling checkpoints cannot browse consumer group information in Kafka
任铭睿 created FLINK-35945: --- Summary: Using Flink to dock with Kafka data sources without enabling checkpoints cannot browse consumer group information in Kafka Key: FLINK-35945 URL: https://issues.apache.org/jira/browse/FLINK-35945 Project: Flink Issue Type: Improvement Components: API / DataStream Reporter: 任铭睿 Using Flink to dock with Kafka data sources without enabling checkpoints cannot browse consumer group information in Kafka -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35944) Add CompiledPlan annotations to BatchExecUnion
Jim Hughes created FLINK-35944: -- Summary: Add CompiledPlan annotations to BatchExecUnion Key: FLINK-35944 URL: https://issues.apache.org/jira/browse/FLINK-35944 Project: Flink Issue Type: Sub-task Reporter: Jim Hughes Assignee: Jim Hughes In addition to the annotations, implement the BatchCompiledPlan test for this operator. For this operator, the BatchExecHashAggregate operator must be annotated as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35943) Add CompiledPlan annotations to BatchExecHashJoin and BatchExecNestedLoopJoin
Jim Hughes created FLINK-35943: -- Summary: Add CompiledPlan annotations to BatchExecHashJoin and BatchExecNestedLoopJoin Key: FLINK-35943 URL: https://issues.apache.org/jira/browse/FLINK-35943 Project: Flink Issue Type: Sub-task Reporter: Jim Hughes Assignee: Jim Hughes In addition to the annotations, implement the BatchCompiledPlan test for these two operators. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35942) Add CompiledPlan annotations to BatchExecSort
Jim Hughes created FLINK-35942: -- Summary: Add CompiledPlan annotations to BatchExecSort Key: FLINK-35942 URL: https://issues.apache.org/jira/browse/FLINK-35942 Project: Flink Issue Type: Sub-task Reporter: Jim Hughes Assignee: Jim Hughes In addition to the annotations, implement the BatchCompiledPlan test for this operator. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35941) Add CompiledPlan annotations to BatchExecLimit
Jim Hughes created FLINK-35941: -- Summary: Add CompiledPlan annotations to BatchExecLimit Key: FLINK-35941 URL: https://issues.apache.org/jira/browse/FLINK-35941 Project: Flink Issue Type: Sub-task Reporter: Jim Hughes Assignee: Jim Hughes In addition to the annotations, implement the BatchCompiledPlan test for this operator. Additionally, tests for the TableSource operator will be pulled into this work. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35940) Upgrade log4j dependencies to 2.23.1
Daniel Burrell created FLINK-35940: -- Summary: Upgrade log4j dependencies to 2.23.1 Key: FLINK-35940 URL: https://issues.apache.org/jira/browse/FLINK-35940 Project: Flink Issue Type: Improvement Components: API / Core Affects Versions: 1.19.1 Environment: N/A Reporter: Daniel Burrell There is a need to upgrade log4j dependencies as the current version has a vulnerability. I propose upgrading to 2.23.1 which is the latest 2.x version. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35939) Do not set empty config values via ConfigUtils#encodeCollectionToConfig
Ferenc Csaky created FLINK-35939: Summary: Do not set empty config values via ConfigUtils#encodeCollectionToConfig Key: FLINK-35939 URL: https://issues.apache.org/jira/browse/FLINK-35939 Project: Flink Issue Type: Improvement Affects Versions: 1.19.1 Reporter: Ferenc Csaky Fix For: 2.0.0 The {{ConfigUtils#encodeCollectionToConfig}} function only skips to set a given {{ConfigOption}} value, if that value is null. If the passed collection is empty, it will set that empty collection. I think this behavior is less logical and can cause more undesired situations, when we only set a value if it is not empty AND not null. Furthermore, the method's [javadoc|https://github.com/apache/flink/blob/82b628d4730eef32b2f7a022e3b73cb18f950e6e/flink-core/src/main/java/org/apache/flink/configuration/ConfigUtils.java#L73] describes the logic I just mentioned above, which is in conflict with the actual implementation and tests, which sets an empty collection. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35938) Avoid commit the same datafile again in Paimon Sink.
LvYanquan created FLINK-35938: - Summary: Avoid commit the same datafile again in Paimon Sink. Key: FLINK-35938 URL: https://issues.apache.org/jira/browse/FLINK-35938 Project: Flink Issue Type: Technical Debt Components: Flink CDC Affects Versions: cdc-3.1.0 Reporter: LvYanquan Fix For: cdc-3.2.0 Attachments: image-2024-07-31-19-45-14-153.png [Flink will re-commit committables|https://github.com/apache/flink/blob/82b628d4730eef32b2f7a022e3b73cb18f950e6e/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/connector/sink2/GlobalCommitterOperator.java#L148] when job restart from failure. This may cause the same datafile were added twice in current PaimonCommitter. !image-2024-07-31-19-45-14-153.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35937) Helm RBAC cleanup
Tim created FLINK-35937: --- Summary: Helm RBAC cleanup Key: FLINK-35937 URL: https://issues.apache.org/jira/browse/FLINK-35937 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Affects Versions: 1.19.0 Environment: Kubernetes 1.29.4 Reporter: Tim This is a follow up ticket to [https://issues.apache.org/jira/projects/FLINK/issues/FLINK-35310] In further research and testing with Kyverno I figured out that some apiGroups seem to be invalid and I removed them with this PR. It seems that the "extensions" apiGroups does not exist on our recent cluster (Kubernetes 1.29.4). I'm not sure but it might be related to these deprecation notices: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#deployment-v116 Same holds for the "finalizers" resources. They do not seem to exist anymore and lead to problems with our deployment. So I also removed them. To complete the verb list I also added "deleteCollections" where applicable. Please take a look if this makes sense to you. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35936) paimon cdc schema evolution failure when restart job
MOBIN created FLINK-35936: - Summary: paimon cdc schema evolution failure when restart job Key: FLINK-35936 URL: https://issues.apache.org/jira/browse/FLINK-35936 Project: Flink Issue Type: Bug Components: Flink CDC Affects Versions: cdc-3.2.0 Environment: Flink 1.19 cdc master Reporter: MOBIN paimon cdc schema evolution failure when restart job Minimal reproduce step: # stop flink-cdc-mysql-to-paimon pipeline job # alter mysql table schema, such as add column # start pipeline job # the newly added column was not synchronized to the paimon table -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35935) CREATE TABLE AS doesn't work with LIMIT
Xingcan Cui created FLINK-35935: --- Summary: CREATE TABLE AS doesn't work with LIMIT Key: FLINK-35935 URL: https://issues.apache.org/jira/browse/FLINK-35935 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.18.1 Reporter: Xingcan Cui {code:java} CREATE TABLE WITH (foo) AS (SELECT * FROM bar LIMIT 5){code} The above statement throws "Caused by: java.lang.AssertionError: not a query: " exception. A workaround is to wrap the query with CTE. {code:java} CREATE TABLE WITH (foo) AS (WITH R AS (SELECT * FROM bar LIMIT 5) SELECT * FROM R){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35934) Add CompiledPlan annotations to BatchExecValues
Jim Hughes created FLINK-35934: -- Summary: Add CompiledPlan annotations to BatchExecValues Key: FLINK-35934 URL: https://issues.apache.org/jira/browse/FLINK-35934 Project: Flink Issue Type: Sub-task Reporter: Jim Hughes Assignee: Jim Hughes -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35933) Skip distributing maxAllowedWatermark if there are no subtasks
Roman Khachatryan created FLINK-35933: - Summary: Skip distributing maxAllowedWatermark if there are no subtasks Key: FLINK-35933 URL: https://issues.apache.org/jira/browse/FLINK-35933 Project: Flink Issue Type: Improvement Components: Runtime / Coordination Reporter: Roman Khachatryan Assignee: Roman Khachatryan Fix For: 1.20.0 On JM, `SourceCoordinator.announceCombinedWatermark` executes unnecessary if there are no subtasks to distribute maxAllowedWatermark. This involves Heap and ConcurrentHashMap accesses and lots of logging. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35932) Add REGEXP_COUNT function
Dylan He created FLINK-35932: Summary: Add REGEXP_COUNT function Key: FLINK-35932 URL: https://issues.apache.org/jira/browse/FLINK-35932 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Dylan He Add REGEXP_COUNT function. Returns the number of times {{str}} matches the {{regexp}} pattern. Example: {code:sql} > SELECT REGEXP_COUNT('Steven Jones and Stephen Smith are the best players', > 'Ste(v|ph)en'); 2 > SELECT REGEXP_COUNT('Mary had a little lamb', 'Ste(v|ph)en'); 0 {code} Syntax: {code:sql} REGEXP_COUNT(str, regexp) {code} Arguments: * {{str}}: A STRING expression to be matched. * {{regexp}}: A STRING expression with a matching pattern. Returns: An INTEGER. {{regexp}} must be a Java regular expression. When using literals, use `raw-literal` (`r` prefix) to avoid escape character pre-processing. If either argument is {{NULL}}, the result is {{NULL}}. See also: * [Spark|https://spark.apache.org/docs/3.5.1/sql-ref-functions-builtin.html#string-functions] * [Databricks|https://docs.databricks.com/en/sql/language-manual/functions/regexp_count.html] * [PostgreSQL|https://www.postgresql.org/docs/16/functions-string.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35931) Add REGEXP_EXTRACT_ALL function
Dylan He created FLINK-35931: Summary: Add REGEXP_EXTRACT_ALL function Key: FLINK-35931 URL: https://issues.apache.org/jira/browse/FLINK-35931 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Dylan He Add REGEXP_EXTRACT_ALL function. Extracts all of the strings in {{str}} that match the {{regexp}} expression and correspond to the regex group {{idx}}. Example: {code:sql} > SELECT REGEXP_EXTRACT_ALL('100-200, 300-400', '(\\d+)-(\\d+)', 1); [100, 300] > SELECT REGEXP_EXTRACT_ALL('100-200, 300-400', r'(\d+)-(\d+)', 1); ["100","300"] {code} Syntax: {code:sql} REGEXP_EXTRACT_ALL(str, regexp[, idx]) {code} Arguments: * {{str}}: A STRING expression to be matched. * {{regexp}}: A STRING expression with a matching pattern. * {{idx}}: An optional INTEGER expression greater or equal 0 with default 1. Returns: An ARRAY. {{regexp}} must be a Java regular expression. When using literals, use `raw-literal` (`r` prefix) to avoid escape character pre-processing. {{regexp}} may contain multiple groups. {{idx}} indicates which regex group to extract. An {{idx}} of 0 means matching the entire regular expression. See also: * [Spark|https://spark.apache.org/docs/3.5.1/sql-ref-functions-builtin.html#string-functions] * [Databricks|https://docs.databricks.com/en/sql/language-manual/functions/regexp_extract_all.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35930) Add FORMAT_STRING function
Dylan He created FLINK-35930: Summary: Add FORMAT_STRING function Key: FLINK-35930 URL: https://issues.apache.org/jira/browse/FLINK-35930 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Dylan He Add FORMAT_STRING function as the same in Spark. This function is a synonym for PRINTF function. Returns a formatted string from printf-style format strings. Example: {code:sql} > SELECT FORMAT_STRING('Hello World %d %s', 100, 'days'); Hello World 100 days {code} Syntax: {code:sql} FORMAT_STRING(strfmt, obj...) {code} Arguments: * {{strfmt}}: A STRING expression. * {{obj}}: ANY expression. Returns: A STRING. See also: * [Spark|https://spark.apache.org/docs/3.5.1/sql-ref-functions-builtin.html#string-functions] * [Databricks|https://docs.databricks.com/en/sql/language-manual/functions/format_string.html] * [Hive|https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#LanguageManualUDF-StringFunctions] printf * [PostgreSQL|https://www.postgresql.org/docs/16/functions-string.html] format -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35929) In flink insert mode, it supports modifying the parallelism of jdbc sink when the parallelism of source and sink is the same.
Qiu created FLINK-35929: --- Summary: In flink insert mode, it supports modifying the parallelism of jdbc sink when the parallelism of source and sink is the same. Key: FLINK-35929 URL: https://issues.apache.org/jira/browse/FLINK-35929 Project: Flink Issue Type: Improvement Components: Connectors / JDBC Affects Versions: jdbc-3.1.2 Reporter: Qiu Attachments: image-2024-07-30-19-57-45-033.png, image-2024-07-30-19-57-50-868.png In insert mode, when the source and sink parallelism are consistent, if you reduce or increase the jdbc sink parallelism, the SQL verification will report an error. The following is the error message. configured sink parallelism is: 8, while the input parallelism is: -1. Since configured parallelism is different from input parallelism and the changelog mode contains [INSERT,UPDATE_AFTER,DELETE], which is not INSERT_ONLY mode, primary key is required but no primary key is found! {code:java} //代码占位符 module: flink-connector-jdbc class: org.apache.flink.connector.jdbc.table.JdbcDynamicTableSink public ChangelogMode getChangelogMode(ChangelogMode requestedMode) { validatePrimaryKey(requestedMode); ChangelogMode.Builder changelogModeBuilder = ChangelogMode.newBuilder() .addContainedKind(RowKind.INSERT); if (tableSchema.getPrimaryKey().isPresent()) { changelogModeBuilder .addContainedKind(RowKind.UPDATE_AFTER) .addContainedKind(RowKind.DELETE); } return changelogModeBuilder.build(); } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35928) ForSt supports compiling with RocksDB
Hangxiang Yu created FLINK-35928: Summary: ForSt supports compiling with RocksDB Key: FLINK-35928 URL: https://issues.apache.org/jira/browse/FLINK-35928 Project: Flink Issue Type: Sub-task Reporter: Hangxiang Yu Assignee: Hangxiang Yu -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35927) Support closing ForSt meta file when using object storage
Hangxiang Yu created FLINK-35927: Summary: Support closing ForSt meta file when using object storage Key: FLINK-35927 URL: https://issues.apache.org/jira/browse/FLINK-35927 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Reporter: Hangxiang Yu Assignee: Hangxiang Yu -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35926) During rescale, jobmanager has incorrect judgment logic for the max parallelism.
yuanfenghu created FLINK-35926: -- Summary: During rescale, jobmanager has incorrect judgment logic for the max parallelism. Key: FLINK-35926 URL: https://issues.apache.org/jira/browse/FLINK-35926 Project: Flink Issue Type: Bug Components: Runtime / Task Affects Versions: 1.19.1 Environment: flink-1.19.1 There is a high probability that 1.18 has the same problem Reporter: yuanfenghu Attachments: image-2024-07-30-14-56-48-931.png, image-2024-07-30-14-59-26-976.png, image-2024-07-30-15-00-28-491.png When I was using the adaptive scheduler and modified the task in parallel through the rest api, an incorrect decision logic occurred, causing the task to fail. h2. produce: When I start a simple job with a parallelism of 128, the Max Parallelism of the job will be set to 256 (through flink's internal calculation logic). Then I make a savepoint on the job and modify the parallelism of the job to 1. Restore the job from the savepoint. At this time, the Max Parallelism of the job is still 256: !image-2024-07-30-14-56-48-931.png! this is as expected, at this time I call the rest api to increase the parallelism to 129 (which is obviously reasonable, since it is < 128), but the task throws an exception after restarting: !image-2024-07-30-14-59-26-976.png! At this time, when viewing the detailed information of the task, it is found that Max Parallelism has changed to 128: !image-2024-07-30-15-00-28-491.png! This can be reproduced stably locally h3. Causes: In AdaptiveScheduler we recalculate the job `VertexParallelismStore`, This results in the job after restart having the wrong max parallelism. , which seems to be related to FLINK-21844 and FLINK-22084 . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35925) Remove hive connector from Flink main repo
Zhenqiu Huang created FLINK-35925: - Summary: Remove hive connector from Flink main repo Key: FLINK-35925 URL: https://issues.apache.org/jira/browse/FLINK-35925 Project: Flink Issue Type: Technical Debt Components: Connectors / Hive Affects Versions: 2.0.0 Reporter: Zhenqiu Huang Fix For: 2.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35924) Improve the SourceReaderBase to support the RecordsWithSplitIds share internal buffer from SplitReader.
Jiangjie Qin created FLINK-35924: Summary: Improve the SourceReaderBase to support the RecordsWithSplitIds share internal buffer from SplitReader. Key: FLINK-35924 URL: https://issues.apache.org/jira/browse/FLINK-35924 Project: Flink Issue Type: Improvement Components: Connectors / Common Affects Versions: 1.20.0 Reporter: Jiangjie Qin Recently, we saw corrupted {{RecordsWithSplitIds}} in one of our Iceberg source implementation. The problem is following sequence: # The {{SourceReaderBase}} does not have anything in the element queue. And the current fetch has been drained. # the main thread invokes {{{}SourceReaderBase.pollNext(){}}}, sees nothing to consume, and goes into the {{SourceReaderBase.finishedOrAvailableLater()}} call. # A {{SplitFetcher}} thread just finished consuming from a split and put the last batch of records into the element queue. The SplitFetcher becomes Idle after that, i.e. no assigned splits and no pending tasks. # The main thread invokes {{SplitFetcherManager.maybeShutdownFinishedFetchers()}} to shutdown idle fetchers. So the {{SplitFetcher}} in (3) is shutdown. As a result the {{SplitReader}} is also closed. # The main thread then sees a non-empty element queue, with the last batch put by the {{SplitFetcher}} which has just shutdown. # The main thread invokes {{SourceReaderBase.pollNext()}} again to process the last batch, but receives an exception, because the records contained in the batch has been released as a part of the {{SplitReader}} closure in (4) The problem here is that the {{SourceReaderBase}} implementation assumes that once a batch of records (i.e. {{{}RecordsWithSplitIds{}}}) is generated, these records are available even after the {{SplitReader}} which generated them is closed. This assumption forces some of the connector implementations to copy the records fetched from the {{{}SplitReader{}}}, and hence introduces additional overhead. This patch aims to improve the {{SourceReaderBase}} to ensure that a {{SplitReader}} will not be closed until all the records it have emitted have been processed. There is also a somewhat orthogonal problem that the {{SplitFetchers}} are being closed too aggressively. We've seen that in most cases, a dedicated SplitFetcher is created to handle a split and closed right away after the current split is finished but before the next split arrives. I'll create a separate patch to introduce a short period of timeout before we define a {{SplitFetcher}} thread as idle. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35923) Add CompiledPlan annotations to BatchExecSort
Jim Hughes created FLINK-35923: -- Summary: Add CompiledPlan annotations to BatchExecSort Key: FLINK-35923 URL: https://issues.apache.org/jira/browse/FLINK-35923 Project: Flink Issue Type: Sub-task Reporter: Jim Hughes Assignee: Jim Hughes Fix For: 2.0.0 In addition to the annotations, implement the BatchCompiledPlan test for this operator. Since this is the first operator, exchange operator must be annotated as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35922) Add configuration options related to hive
Qiu created FLINK-35922: --- Summary: Add configuration options related to hive Key: FLINK-35922 URL: https://issues.apache.org/jira/browse/FLINK-35922 Project: Flink Issue Type: Improvement Components: Connectors / Hive Affects Versions: 1.19.1 Reporter: Qiu Current, we submit flink job to yarn with run-application target and need to specify some configuration related to hive, because we use distributed filesystem similar to Ali oss to storage resources, in this case, we will pass special configuration option and set them to hiveConfiguration. In order to solve such problems, we can provide a configuration option prefixed with "flink.hadoop."(such as -Dflink.hadoop.xxx=yyy), and then take it into HiveConfiguration. A simple implementation code is as follows: {code:java} //代码占位符 module: flink-connectors/flink-connector-hive class: org.apache.flink.table.catalog.hive.HiveCatalog //代码占位符 public static HiveConf createHiveConf(@Nullable String hiveConfDir, @Nullable String hadoopConfDir) { ... String flinkConfDir = System.getenv(ConfigConstants.ENV_FLINK_CONF_DIR); if (flinkConfDir != null) { org.apache.flink.configuration.Configuration flinkConfiguration = GlobalConfiguration.loadConfiguration(flinkConfDir); // add all configuration keys with prefix 'flink.hadoop.' in Flink conf to Hadoop conf for (String key : flinkConfiguration.keySet()) { for (String prefix : FLINK_CONFIG_PREFIXES) { if (key.startsWith(prefix)) { String newKey = key.substring(prefix.length()); String value = flinkConfiguration.getString(key, null); hadoopConf.set(newKey, value); LOG.debug("Adding Flink config entry for {} as {}={} to Hadoop config", key, newKey, value); } } } } }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35921) Flink SQL physical operator replacement support
Wang Qilong created FLINK-35921: --- Summary: Flink SQL physical operator replacement support Key: FLINK-35921 URL: https://issues.apache.org/jira/browse/FLINK-35921 Project: Flink Issue Type: Bug Reporter: Wang Qilong Does Flinksql provide some SPI implementations that support custom physical operators, such as customizing a FileSourceScanExec in the execPhysicalPlan layer? I have been researching the FlinkSQL source code recently, understanding the execution process of FlinkSQL, and came up with the above question -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35920) Add PRINTF function
Dylan He created FLINK-35920: Summary: Add PRINTF function Key: FLINK-35920 URL: https://issues.apache.org/jira/browse/FLINK-35920 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Dylan He Add PRINTF function as the same in Spark & Hive. Returns a formatted string from printf-style format strings. Example: {code:sql} > SELECT PRINTF('Hello World %d %s', 100, 'days'); Hello World 100 days {code} Syntax: {code:sql} PRINTF(strfmt[, obj1, ...]) {code} Arguments: * {{strfmt}}: A STRING expression. * {{objN}}: A STRING or NUMERIC expression. Returns: A STRING. See also: * [Spark|https://spark.apache.org/docs/3.5.1/sql-ref-functions-builtin.html#string-functions] * [Databricks|https://docs.databricks.com/en/sql/language-manual/functions/printf.html] * [Hive|https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#LanguageManualUDF-StringFunctions] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35919) The rowkind keyword option for MySQL connector metadata should be changed to the op_type keyword
Thorne created FLINK-35919: -- Summary: The rowkind keyword option for MySQL connector metadata should be changed to the op_type keyword Key: FLINK-35919 URL: https://issues.apache.org/jira/browse/FLINK-35919 Project: Flink Issue Type: Improvement Components: Flink CDC Affects Versions: cdc-3.1.1 Reporter: Thorne Fix For: cdc-3.3.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35918) Migrate the Time to Duration for the flink-runtime module.
RocMarshal created FLINK-35918: -- Summary: Migrate the Time to Duration for the flink-runtime module. Key: FLINK-35918 URL: https://issues.apache.org/jira/browse/FLINK-35918 Project: Flink Issue Type: Technical Debt Affects Versions: 2.0.0 Reporter: RocMarshal -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35917) Failed to deserialize data of EventHeaderV4
wangkang created FLINK-35917: Summary: Failed to deserialize data of EventHeaderV4 Key: FLINK-35917 URL: https://issues.apache.org/jira/browse/FLINK-35917 Project: Flink Issue Type: Bug Components: Flink CDC Affects Versions: cdc-3.1.1 Environment: * Flink version : 1.15 * Flink CDC version: 3.1.1 * Database and version: mysql 5.7 Reporter: wangkang *Environment :* * Flink version : 1.15 * Flink CDC version: 3.1.1 * Database and version: mysql 5.7 [2024-07-28 20:26:33.471] [INFO] [flink-akka.actor.default-dispatcher-140] [org.apache.flink.runtime.executiongraph.ExecutionGraph ] >>> Job MySQL-Paimon Table Sync: vipdw_rt.sales_inv_hold_test (7cdbe1f87b1729196791b7c2506df5e0) switched from state FAILING to FAILED. org.apache.flink.runtime.JobException: Recovery is suppressed by FailureRateRestartBackoffTimeStrategy(FailureRateRestartBackoffTimeStrategy(failuresIntervalMS=180,backoffTimeMS=1,maxFailuresPerInterval=3) at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) ~[flink-dist-1.15.3-vip.jar:1.15.3-vip] at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) ~[flink-dist-1.15.3-vip.jar:1.15.3-vip] at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:301) ~[flink-dist-1.15.3-vip.jar:1.15.3-vip] at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:291) ~[flink-dist-1.15.3-vip.jar:1.15.3-vip] at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:282) ~[flink-dist-1.15.3-vip.jar:1.15.3-vip] at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:739) ~[flink-dist-1.15.3-vip.jar:1.15.3-vip] at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:78) ~[flink-dist-1.15.3-vip.jar:1.15.3-vip] at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:443) ~[flink-dist-1.15.3-vip.jar:1.15.3-vip] at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_262] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:304) ~[flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) ~[flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:302) ~[flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217) ~[flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78) ~[flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163) ~[flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) [flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) [flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) [flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at akka.actor.Actor.aroundReceive(Actor.scala:537) [flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at akka.actor.Actor.aroundReceive$(Actor.scala:535) [flink-rpc-akka_e9ac714f-60da-4da5-93ce-bd11a5ce2af1.jar:1.15.3-vip] at akka.actor.AbstractActor.aroundReceive(AbstractActor.
[jira] [Created] (FLINK-35916) AbstractJdbcRowConverter: type conversion to string
Evgeniy created FLINK-35916: --- Summary: AbstractJdbcRowConverter: type conversion to string Key: FLINK-35916 URL: https://issues.apache.org/jira/browse/FLINK-35916 Project: Flink Issue Type: Improvement Components: Connectors / JDBC Affects Versions: jdbc-3.2.0 Reporter: Evgeniy Fix For: jdbc-3.1.2, jdbc-3.2.0 *Problem:* When converting a type to a string, Java types are explicitly cast, which generates a ClassCastException. {*}Effects{*}: Most Java types, one way or another, have the ability to be converted to a String type. However, not all types with this capability inherit a class of type String or similar, where the type can be cast to type String. As java suggests, any object can be cast to the String: Object#toString type if its implementation does not imply the opposite. However, explicit casting to the String type makes this impossible for 99% of custom Java objects. {*}Goals{*}: Calling the Object#toString method for all objects that are supposed to be converted to a string. {*}Solution{*}: To begin with, I suggest you pay attention to the class that deals with type conversion - AbstractJdbcRowConverter. Pay attention to the method that deals with type conversion - createInternalConverter, and specifically to the lambda expression that deals with converting a type to a String type (CHAR, VARCHAR). {code:java} switch (type.getTypeRoot()) { ... case CHAR: case VARCHAR: return val -> StringData.fromString((String) val); ... } {code} You can observe (String) val, which is problematic. My intended solution is the following change. {code:java} return val -> StringData.fromString(val.toString()); {code} val.toString() can give a chance for a new life to most objects that cannot be processed by Flink by default, and this is very cool because it solves at least 1 problem. {*}For further research{*}: For example, pay attention to the type that is used in the podtgres driver to package PostgreSQL types other than primitive ones - PGobject. {code:java} public class PGobject implements Serializable, Cloneable { protected @Nullable String type; protected @Nullable String value; ... ... public String toString() { return this.getValue(); } ... } {code} As you may have noticed, PostgreSQL itself is engaged in converting types to a string, which is an excellent argument for using val.toString(). Perhaps in the future, you can add PGobject type processing, as was done for PGarray, but it seems to me that the solution described above is the fastest and most optimal. Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35915) Add MASK function
Dylan He created FLINK-35915: Summary: Add MASK function Key: FLINK-35915 URL: https://issues.apache.org/jira/browse/FLINK-35915 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Dylan He Add MASK function as the same in Spark. Returns a masked version of the input str. Example: {code:sql} > SELECT MASK('AaBb123-&^ % 서울 Ä'); XxXxnnn-&^ % 서울 X > SELECT MASK('AaBb123-&^ % 서울 Ä', 'Z', 'z', '9', 'X'); ZzZz999XZ {code} Syntax: {code:sql} MASK(str[, upperChar[, lowerChar[, digitChar[, otherChar) {code} Arguments: * {{str}}: A STRING expression. * {{upperChar}}: A single character STRING literal used to substitute upper case characters. The default is 'X'. If upperChar is NULL, upper case characters remain unmasked. * {{lowerChar}}: A single character STRING literal used to substitute lower case characters. The default is 'x'. If lowerChar is NULL, lower case characters remain unmasked. * {{digitChar}}: A single character STRING literal used to substitute digits. The default is 'n'. If digitChar is NULL, digits remain unmasked. * {{otherChar}}: A single character STRING literal used to substitute any other character. The default is NULL, which leaves these characters unmasked. Returns: A STRING. See also: * [Spark|https://spark.apache.org/docs/3.5.1/sql-ref-functions-builtin.html#string-functions] * [Databricks|https://docs.databricks.com/en/sql/language-manual/functions/mask.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35914) Flink 1.18 runtime not supporting java 17
sushil_karwasra created FLINK-35914: --- Summary: Flink 1.18 runtime not supporting java 17 Key: FLINK-35914 URL: https://issues.apache.org/jira/browse/FLINK-35914 Project: Flink Issue Type: Bug Components: API / Core Affects Versions: 1.18.1 Reporter: sushil_karwasra I have a Kinesis stream on AWS, and the runtime I'm using is Flink 1.18. When I tried to compile my application code in Java 17 and deploy when i start the Kinesis stream, I encountered the following error: Caused by: java.lang.UnsupportedClassVersionError: KinesisToSqsStreamingJob has been compiled by a more recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 55.0. Does this mean that Flink does not yet support Java 17 compiled application code? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35913) The issue of using StreamExecutioneEnvironment to execute multiple jobs in a program on Flink CDC
huxx created FLINK-35913: Summary: The issue of using StreamExecutioneEnvironment to execute multiple jobs in a program on Flink CDC Key: FLINK-35913 URL: https://issues.apache.org/jira/browse/FLINK-35913 Project: Flink Issue Type: Bug Components: Flink CDC Reporter: huxx Attachments: image-2024-07-29-12-23-55-766.png, image-2024-07-29-12-25-19-146.png I integrated Flink CDC with Springboot and attempted to listen to multiple data sources using a single program, as described above. I used StreamExecutioneEnvironment to execute multiple jobs and built different data sources. However, when I tried the second task, I received multiple warnings like this WARN 21828 --- [rce-coordinator] io.debezium.metrics.Metrics : Unable to register metrics as an old set with the same name exists, retrying in PT5S (attempt 1 out of 12) The final result was that my second job was not executed. Did you say that my StreamExecutionEnvironment registered duplicate parameters? I don't know how to solve this problem? Can't a single StreamExecutioneEnvironment execute multiple jobs? Thank you for your reply !image-2024-07-29-12-23-55-766.png! !image-2024-07-29-12-25-19-146.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35912) SqlServer CDC doesn't chunk UUID-typed columns correctly
yux created FLINK-35912: --- Summary: SqlServer CDC doesn't chunk UUID-typed columns correctly Key: FLINK-35912 URL: https://issues.apache.org/jira/browse/FLINK-35912 Project: Flink Issue Type: Bug Components: Flink CDC Reporter: yux As reported by GitHub user @LiPL2017, SqlServer CDC doesn't chunk UUID-typed columns correctly since UUID comparison isn't implemented correctly[1]. [1] https://learn.microsoft.com/en-us/sql/connect/ado-net/sql/compare-guid-uniqueidentifier-values?view=sql-server-ver16 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35911) Flink kafka-source connector
Gang Yang created FLINK-35911: - Summary: Flink kafka-source connector Key: FLINK-35911 URL: https://issues.apache.org/jira/browse/FLINK-35911 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: kafka-3.0.1 Reporter: Gang Yang 问题描述:业务反馈数据丢失,经排查发现是上游Kafka-Source部分分区数据没有被消费,其次业务也多次从状态重启,但这个问题仍然存在。 解决方式:后续为了排查定位问题配置了参数:pipeline.operator-chaining = 'false',然后有状态重启任务,发现任务恢复正常。 业务场景:kafka write hdfs Source定义如下: {code:java} // code placeholder CREATE TABLE `play_log_source` ( `appName` VARCHAR, `appInfo.channel` VARCHAR, `channel_name` AS `appInfo.channel`, `appInfo.packageName` VARCHAR, `package_name` AS `appInfo.packageName`, `deviceInfo.deviceId` VARCHAR, `device_id` AS `deviceInfo.deviceId`, `deviceInfo.deviceName` VARCHAR ) WITH ( 'nested-json.key.fields.deserialize-min.enabled' = 'true', 'connector' = 'kafka', 'format' = 'nested-json' );{code} Flink版本:Flink-1.18.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35910) Add BTRIM function
Dylan He created FLINK-35910: Summary: Add BTRIM function Key: FLINK-35910 URL: https://issues.apache.org/jira/browse/FLINK-35910 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Dylan He Add BTRIM function as the same in Spark, where 'B' stands for BOTH. Returns {{str}} with leading and trailing characters removed. Syntax: {code:sql} BTRIM(str [, trimStr]) {code} Arguments: * {{{}str{}}}: A STRING expression. * {{{}trimStr{}}}: An optional STRING expression with characters to be trimmed. The default is a space character. Returns: A STRING. See also: * [Spark|https://spark.apache.org/docs/3.5.1/sql-ref-functions-builtin.html#string-functions] * [Databricks|https://docs.databricks.com/en/sql/language-manual/functions/btrim.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35909) JdbcFilterPushdownPreparedStatementVisitorTest fails
Rui Fan created FLINK-35909: --- Summary: JdbcFilterPushdownPreparedStatementVisitorTest fails Key: FLINK-35909 URL: https://issues.apache.org/jira/browse/FLINK-35909 Project: Flink Issue Type: Bug Components: Connectors / JDBC Reporter: Rui Fan JdbcFilterPushdownPreparedStatementVisitorTest fails after FLINK-35363 is merged. Hey [~eskabetxe] , would you mind taking a look this issue in your free time? thanks~ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35908) Flink log can not output method name and line number.
yiheng tang created FLINK-35908: --- Summary: Flink log can not output method name and line number. Key: FLINK-35908 URL: https://issues.apache.org/jira/browse/FLINK-35908 Project: Flink Issue Type: Bug Reporter: yiheng tang log4j.properties: {code} appender.main.name = MainAppender appender.main.type = RollingFile appender.main.append = false appender.main.fileName = ${sys:log.file} appender.main.filePattern = ${sys:log.file}.%i appender.main.layout.type = PatternLayout appender.main.layout.pattern = %d{-MM-dd HH:mm:ss,SSS} [%t] %-5p %c{1} (%M:%L) - %m%n appender.main.policies.type = Policies appender.main.policies.size.type = SizeBasedTriggeringPolicy appender.main.policies.size.size = 100MB appender.main.strategy.type = DefaultRolloverStrategy appender.main.strategy.max = 5 {code} But flink jobmanager log output as follows: {code:java} 2024-07-28 22:00:12,489 [flink-akka.actor.default-dispatcher-3] INFO ExecutionGraph (:) - Source_Custom_Source -> Flat_Map(9/16) - execution #0 (afde77b9b3ac94a2ffab46e5f96dc14c) switched from RECOVERING to RUNNING. 2024-07-28 22:00:12,489 [flink-akka.actor.default-dispatcher-21] INFO ExecutionGraph (:) - Source_Custom_Source -> Flat_Map(13/16) - execution #0 (e0e7ce36f7d40372b71f956f7c19dbd3) switched from RECOVERING to RUNNING. 2024-07-28 22:00:12,524 [flink-akka.actor.default-dispatcher-21] INFO ExecutionGraph (:) - Window_TumblingProce -> Map -> (Sink_Print_to_Std_Ou, Sink_Unnamed)(15/16) - execution #0 (dc904d83c235247271caa2c8ef7be1ed) switched from RECOVERING to RUNNING. {code} Flink logs do not output the method name and line number. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35907) Inconsistent handling of bit and bit(n) types in Flink CDC PostgreSQL Connector
jiangcheng created FLINK-35907: -- Summary: Inconsistent handling of bit and bit(n) types in Flink CDC PostgreSQL Connector Key: FLINK-35907 URL: https://issues.apache.org/jira/browse/FLINK-35907 Project: Flink Issue Type: Bug Environment: I use the following maven settings. {code:java} com.ververica flink-connector-postgres-cdc 2.4-vvr-8.0-metrics-SNAPSHOT {code} PostgreSQL Server Version:I have tested on 14.0, 15.0, 16.0. This error occurs regardless of PG Version. JDK: Java 8. Reporter: jiangcheng I am encountering issues with the Flink CDC PostgreSQL Connector's handling of {{bit}} and {{bit(n)}} data types, which seem to deviate from expected behavior, leading to inconsistencies and errors during both snapshot and incremental sync phases. h1. Problems h2. *Problem 1: Misinterpretation of {{bit}} and {{bit(1)}}* During data retrieval, {{bit}} and {{bit(1)}} types are interpreted as {{{}org.apache.kafka.connect.data.Schema.BOOLEAN{}}}, mirroring the treatment of PostgreSQL's native {{bool}} or {{boolean}} types. This may lead to loss of precision when the actual intention could be to preserve the bit pattern, rather than a simple true/false value. h2. *Problem 2: Error with {{bit(n)}} during Snapshot Phase* For {{bit(n)}} where {{{}n > 1{}}}, an error is encountered during the snapshot phase within {{{}PostgresScanFetchTask{}}}. The issue arises from attempting to read these values using {{{}PgResult#getBoolean{}}}, which is inappropriate for {{bit(n)}} types not representing a standard boolean state. Strangely, this error does not surface during the incremental phase, because no {{PgResult#getBoolean}} is invoked during the incremental phase. h1. My Analysis h2. BIT type is interpreted to BOOLEAN Diving into the code reveals that for both scenarios, the connector relies on {{PgResult#getObject}} which internally identifies {{bit}} and {{bit(n)}} as {{{}java.sql.Type.BOOLEAN{}}}. This misclassification triggers the problematic usage of {{getBoolean}} for non-standard boolean representations like {{{}bit(n){}}}. h2. h2. Inconsistency between Snapshot and Incremental Phases The discrepancy between the snapshot phase and incremental phase is noteworthy. During the snapshot phase, errors manifest due to direct interaction with {{PgResult#getObject}} in {{{}PostgresScanFetchTask#createDataEventsForTable{}}}, and {{PgResult#getObject}} is further forward to {{{}PgResult#getBoolean{}}}. Conversely, in the incremental phase, {{bit(n)}} values are coerced into {{{}org.apache.kafka.connect.data.Schema.BYTES{}}}, resulting in a loss of the original {{n}} precision information. This forces consumers to assume an 8-bit byte array representation, obscuring the intended bit-width and potentially leading to incorrect interpretations (e.g., I insert a {{bit(10)}} value into PostgreSQL Server named {{{}'000111'{}}}, which is represented as a byte array of length = 1 in SourceRecord, the first element is 127). h1. *My Opinion* >From my perspective, the following approaches may solve this problem. *Consistent Handling:* The connector should uniformly and accurately handle all {{bit}} variants, respecting their distinct PostgreSQL definitions. *Preserve Precision:* For {{{}bit(n){}}}, ensure that the precision {{n}} is maintained throughout processing, allowing consumers to correctly interpret the intended bit sequence without assuming a default byte size. *Schema Transparency:* Enhance metadata handling to communicate the original {{bit(n)}} schema accurately to downstream systems, enabling them to process the data with full knowledge of its intended format. h1. Conclusion Addressing these discrepancies will not only improve the reliability of the Flink CDC PostgreSQL Connector but also enhance its compatibility with a broader range of use cases that rely on precise {{bit}} data handling. I look forward to a resolution that ensures consistent and accurate processing across both snapshot and incremental modes. Thank you for considering this issue, and I'm available to provide further details or assist in any way possible. h1. Reproduct the Error I use the following code to read record from PostgreSQL Connector by implementing the deserialize method in {{{}DebeziumDeserializationSchema{}}}: {code:java} public class PostgreSqlRecordSourceDeserializeSchema implements DebeziumDeserializationSchema { public void deserialize(SourceRecord sourceRecord, Collector out) throws Exception { // skipping irrelevant business logic ... Struct rowValue = ((Struct) sourceRecord.value()).getStruct(Envelope.FieldName.AFTER); for (Field field: rowValue.schema().fields()){ switch (field.schema().type()) { case BOOLEAN: // handli
[jira] [Created] (FLINK-35906) Inconsistent timestamp precision handling in Flink CDC PostgreSQL Connector
jiangcheng created FLINK-35906: -- Summary: Inconsistent timestamp precision handling in Flink CDC PostgreSQL Connector Key: FLINK-35906 URL: https://issues.apache.org/jira/browse/FLINK-35906 Project: Flink Issue Type: Bug Components: Flink CDC Environment: I use the following maven settings. {code:java} com.ververica flink-connector-postgres-cdc 2.4-vvr-8.0-metrics-SNAPSHOT {code} PostgreSQL Server Version:I have tested on 14.0, 15.0, 16.0. This error occurs regardless of PG Version. JDK: Java 8. Reporter: jiangcheng I have encountered an inconsistency issue with the Flink CDC PostgreSQL Connector when it comes to handling different time types, specifically time, timetz, timestamp, and timestamptz. The problem revolves around the precision of these time-related values during both snapshot and incremental phases. h1. Issue Details *time type* During the snapshot phase, the precision is reduced to milliseconds (ms), whereas in the incremental phase, the correct microsecond (micros) precision is maintained. *timetz type* This is where the most discrepancy arises. In the snapshot phase, the precision drops to seconds (s), and the time is interpreted according to the system's timezone, leading to potential misinterpretation of the actual stored value. Conversely, in the incremental phase, the precision increases to milliseconds (ms), but the timezone is fixed at 0 (UTC), causing further discrepancies. An illustrative example involves inserting 10:13:02.264525+08 into PostgreSQL; during the snapshot, it is retrieved as 10:13:02Z, while incrementally it appears as 02:13:02.264525Z. Specifically, I use the following code to read the data. *timestamp type* Both in snapshot and incremental modes, the precision is consistently at the microsecond level, which aligns with expectations. *timestamptz type* Unlike expected, both phases yield a reduced precision to milliseconds (ms), deviating from the native microsecond precision supported by PostgreSQL for this type. h1. Expected Behavior The Flink CDC PostgreSQL Connector should maintain the native precision provided by PostgreSQL for all time-related data types across both snapshot and incremental phases, ensuring that time and timestamptz types are accurately represented down to microseconds, and timetz correctly handles timezone information alongside its precision. h1. Reproduct the Error I use the following code to read record from PostgreSQL Connector by implementing the deserialize method in {{{}DebeziumDeserializationSchema{}}}: {code:java} public class PostgreSqlRecordSourceDeserializeSchema implements DebeziumDeserializationSchema { public void deserialize(SourceRecord sourceRecord, Collector out) throws Exception { // skipping irrelevant business logic ... Struct rowValue = ((Struct) sourceRecord.value()).getStruct(Envelope.FieldName.AFTER); for (Field field: rowValue.schema().fields()){ switch (field.schema().type()) { case INT64: if (StringUtils.equals(field.schema().name(), MicroTime.class.getName())) { // handling time type Long value = rowValue.getInt64(field.name()); } else if (StringUtils.equals(field.schema().name(), MicroTimestamp.class.getName())) { // handling timestamp type Long value = rowValue.getInt64(field.name()); } else // skipping irrelevant business logic ... break; case STRING: if (StringUtils.equals(field.schema().name(), ZonedTimestamp.class.getName())) { // handling timestamptz type String value = rowValue.getString(field.name()); } else if (StringUtils.equals(ZonedTime.class.getName(), field.schema().name())) { // handling timetz type String value = rowValue.getString(field.name()); } else // skipping irrelevant business logic ... break; case // skipping irrelevant business logic ... } // skipping irrelevant business logic ... } } {code} h1. Version I use the following maven settings. {code:java} com.ververica flink-connector-postgres-cdc 2.4-vvr-8.0-metrics-SNAPSHOT {code} PostgreSQL Server Version:I have tested on 14.0, 15.0, 16.0. This error occurs regardless of PG Version. JDK: Java 8. h1. Offer for Assistance I am willing to provide additional test scenarios or results to help diagnose this issue further. Moreover, I am open to collaborating on reviewing potential fixes or providing any
[jira] [Created] (FLINK-35905) Flink physical operator replacement support
Wang Qilong created FLINK-35905: --- Summary: Flink physical operator replacement support Key: FLINK-35905 URL: https://issues.apache.org/jira/browse/FLINK-35905 Project: Flink Issue Type: Bug Components: API / Scala, Table SQL / API Affects Versions: 1.15.0 Environment: Flink1.15 and so on Reporter: Wang Qilong I have been studying the FlinkSQL source code recently and have learned about the execution process of FlinkSQL, which has led to a question: Does Flinksql provide some SPI implementations that support custom physical operators, such as customizing a FileSourceScanExec in the execPhysicalPlan layer? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35904) Test harness for async state processing operators
Zakelly Lan created FLINK-35904: --- Summary: Test harness for async state processing operators Key: FLINK-35904 URL: https://issues.apache.org/jira/browse/FLINK-35904 Project: Flink Issue Type: Sub-task Reporter: Zakelly Lan We need a test harness for async state processing operators, which could simulate the mailbox and task thread. This is essential for further sql operator developing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35903) Add CentralizedSlotAssigner support to adaptive scheduler
yuanfenghu created FLINK-35903: -- Summary: Add CentralizedSlotAssigner support to adaptive scheduler Key: FLINK-35903 URL: https://issues.apache.org/jira/browse/FLINK-35903 Project: Flink Issue Type: Improvement Components: Runtime / Coordination Reporter: yuanfenghu h2. *Question:* When using an adaptive scheduler for the current task, reducing the task's parallelism via REST API triggers the task to restart. If the numberOfTaskSlots in my taskmanager is greater than 1, it may result in some taskmanager slots being idle and unable to release the taskmanager resource. h3. *How can this issue be resolved:* We need a new SlotAssigner strategy. When the task triggers the above process, the slots should be requested centrally within the taskmanager to ensure we can maximize the release of unnecessary resources. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35902) WebUI Metrics RangeError on huge parallelism
Sergey Paryshev created FLINK-35902: --- Summary: WebUI Metrics RangeError on huge parallelism Key: FLINK-35902 URL: https://issues.apache.org/jira/browse/FLINK-35902 Project: Flink Issue Type: Bug Components: Runtime / Web Frontend Affects Versions: 1.19.1, 1.18.1, 1.17.2, 1.20.0 Reporter: Sergey Paryshev Attachments: flink-high-parallelism-webui.png Displaying metrics with high parallelism results in a RangeError (similar to StackOverflow in JS). To reproduce the error locally it is enough to launch locally any job with parallelism of one of the operators 1600 or higher and go to the metrics menu. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35901) Setting web.submit.enable to false prevents FlinkSessionJobs from working
Ralph Blaise created FLINK-35901: Summary: Setting web.submit.enable to false prevents FlinkSessionJobs from working Key: FLINK-35901 URL: https://issues.apache.org/jira/browse/FLINK-35901 Project: Flink Issue Type: Bug Components: API / Core Affects Versions: 1.19.1 Environment: Azure Kubernetes Service 1.28.5 Reporter: Ralph Blaise Setting web.submit.enable to false in a flinkdeployment in kubernetes doesn't allow flinksessionjobs for it to work. It instead results in the error below: _{"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"java.util.concurrent.ExecutionException:_ _org.apache.flink.runtime.rest.util.RestClientException: [POST request not_ _allowed]"_ _I have to re-enable web.submit.enable in order for FlinkSessionJobs to work_ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35900) PyFlink: update version of Apache Beam dependency
Lydia L created FLINK-35900: --- Summary: PyFlink: update version of Apache Beam dependency Key: FLINK-35900 URL: https://issues.apache.org/jira/browse/FLINK-35900 Project: Flink Issue Type: Bug Affects Versions: 1.19.1 Reporter: Lydia L PyFlink relies on Apache Beam
[jira] [Created] (FLINK-35899) Accumulated TumblingProcessingTimeWindows in the state for a few days
Sergey Anokhovskiy created FLINK-35899: -- Summary: Accumulated TumblingProcessingTimeWindows in the state for a few days Key: FLINK-35899 URL: https://issues.apache.org/jira/browse/FLINK-35899 Project: Flink Issue Type: Bug Components: API / Core, Runtime / Checkpointing Affects Versions: 1.13.5 Reporter: Sergey Anokhovskiy One of the sub-task out of 40 of the TumblingProcessingTimeWindows operator accumulated windows over a day. The next restart of the job caused it to process the accumulated windows, which caused the checkpointing timeout. Once the sub-task has processed the old windows (might take several hours) it works normally again. *Could you please come up with the ideas of what might cause the window operator sub-task to accumulate old windows for days?* Here is more context: At Yelp we built a connector to the database based on Flink. We aimed to reduce the load to the database. That's why a time window with reduce function was introduced in that only the latest version of the document does matter for us. Here is the configuration of the window: private def windowedStream(input: DataStream[FieldsToIndex]) = { input.keyBy(f => f.id) .window(TumblingProcessingTimeWindows.of(seconds(elasticPipeJobConfig.deduplicationWindowTimeInSec))) .reduce( (e1, e2) => { if (e1.messageTimestamp > e2.messageTimestamp) { e1 } else { e2 } }) } It works as expected most of the time but a few times per year on sub-task of the dedup_window operator got stuck and caused checkpointing to fail. We took a look at the state data and added extra logging to the custom trigger and here is what we found: # It turned out that the state of the 17th (different number every incident) sub-task is more than 100 times bigger than the others (see the txt file). It caused the job and particular sub-task to initialize slowly (see the screen shot) # Statics of RocksDB tables: _timer_state/processing_window-timers ~8MB, _timer_state/event_window-timers was 0 window-contents was ~20GB with ~960k entries and ~14k unique message ids. Counts by id were distributed (see ids_destribution.txt) # Each window-contents value has associated timer entry in _timer_state/processing_window-timers The timers accumulated gradually, time (Pacific) bucket counts (see timers.txt) # The earliest entries are from 9:23am Pacific on June 26th, over a day before the incident. Flink log showed that a taskmanager went away at 9:28, forcing a job restart (see taskmanager_exception.txt). The job came back up at ~9:41am. # Debug logs in the custom trigger in the functions onClear/onProcessingTime/onElement/onEventTime confirmed that the job is busy on processing the old windows # It seems that the subtask was in a bad state after the restart. It is unclear if it was stuck not processing any window events, or if it was just not removing them from the state. The next time the job restarted it had to churn through a days worth of writes, causing the delay. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35898) When running Flink on Yarn, if the log file path contains a space, the startup will fail.
yiheng tang created FLINK-35898: --- Summary: When running Flink on Yarn, if the log file path contains a space, the startup will fail. Key: FLINK-35898 URL: https://issues.apache.org/jira/browse/FLINK-35898 Project: Flink Issue Type: Bug Components: Deployment / YARN Affects Versions: 1.20.0 Reporter: yiheng tang We use Yarn to launch Flink jobs and try to add custom parameters to configure the log output format in the YarnLogConfigUtil.getLog4jCommand method. However, when the parameter value contains spaces, it causes the startup to fail. The start command in the Yarn container is as follows, omitting some irrelevant parameters. ```bash /bin/bash -c "$JAVA_HOME/bin/java -Dlog.file='/data08/yarn/userlogs/application_1719804259110_16060/container_e43_1719804259110_16060_01_01/jobmanager.log' -Dlog4j.configuration=file.properties -Dlog4j.configurationFile=file.properties -Dlog4j2.isThreadContextMapInheritable=true -Dlog.layout.pattern='%d\{-MM-dd HH:mm} %-5p %-60c %-60t %x - %m%n %ex' -Dlog.level=INFO" ``` We discovered that when the value of log.layout.pattern contains spaces, the command is incorrectly truncated, causing the content after the space (`HH:mm} %-5p %-60c %-60t %x - %m%n %ex' -Dlog.level=INFO"`) to be unrecognized, and therefore the Flink job cannot be launched in the Yarn container. Therefore, we suggest using single quotes (') instead of double quotes (") to wrap the values of parameters in the `getLogBackCommand` method and `getLog4jCommand` method of the `YarnLogConfigUtil` class. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35897) Some checkpoint files and localState files can't be cleanUp when checkpoint is aborted
Jinzhong Li created FLINK-35897: --- Summary: Some checkpoint files and localState files can't be cleanUp when checkpoint is aborted Key: FLINK-35897 URL: https://issues.apache.org/jira/browse/FLINK-35897 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing, Runtime / State Backends Reporter: Jinzhong Li h2. Problem When the job checkpoint is canceled ([asyncsnapshotcallable.java/#L129| [https://github.com/apache/flink/blob/d4294c59e6f2ec8702f53916ea49cf23f6db8961/flink-runtime/src/main/java/org/apache/flink/runtime/state/AsyncSnapshotCallable.java#L129]]), it is still possible for the asynchronous snapshot thread to continue executing and generate a completed checkpoint ([RocksIncrementalSnapshotStrategy.java#L324| [https://github.com/apache/flink/blob/d4294c59e6f2ec8702f53916ea49cf23f6db8961/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/snapshot/RocksIncrementalSnapshotStrategy.java#L324]]). In this case, there will be no role is responsible for the completed checkpoint cleanup, neither async snapshot thread, nor SubtaskCheckpointCoordinatorImpl. h3. How to reproduce it We can reproduce this issue by running the [DataGenWordCount example in my debug branch|[https://github.com/ljz2051/flink/commit/33c0c55098a49a0b56c9404256a560da5069f26c]], in which I've added some debug code. h3. How to fix it When the asynchronous snapshot thread completes a checkpoint, it needs to cleanup the completed checkpoint if it finds that the checkpoint has been canceled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35896) eventHandler in RescaleApiScalingRealizer handle the event of scaling failure
yuanfenghu created FLINK-35896: -- Summary: eventHandler in RescaleApiScalingRealizer handle the event of scaling failure Key: FLINK-35896 URL: https://issues.apache.org/jira/browse/FLINK-35896 Project: Flink Issue Type: Improvement Components: Autoscaler Affects Versions: 2.0.0 Reporter: yuanfenghu When using flink-autoscaler-standalone, if fail during the Scaling process, need to handle the failure event through evenhandler -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35895) flink parquet vectorized reader should use readNextFilteredRowGroup() instead of readNextRowGroup()
Kai Chen created FLINK-35895: Summary: flink parquet vectorized reader should use readNextFilteredRowGroup() instead of readNextRowGroup() Key: FLINK-35895 URL: https://issues.apache.org/jira/browse/FLINK-35895 Project: Flink Issue Type: Bug Components: Connectors / FileSystem, Formats (JSON, Avro, Parquet, ORC, SequenceFile) Affects Versions: 1.19.1, 1.18.1, 1.17.2, 1.16.3, 1.15.4 Reporter: Kai Chen reader.readNextRowGroup() should be changed to reader.readNextFilteredRowGroup(); https://github.com/apache/flink/blob/d4294c59e6f2ec8702f53916ea49cf23f6db8961/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetVectorizedInputFormat.java#L422 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35894) Add Elasticsearch Sink Connector for Flink CDC Pipeline
wuzexian created FLINK-35894: Summary: Add Elasticsearch Sink Connector for Flink CDC Pipeline Key: FLINK-35894 URL: https://issues.apache.org/jira/browse/FLINK-35894 Project: Flink Issue Type: Improvement Reporter: wuzexian I propose adding a new Elasticsearch Sink Connector to the Flink CDC Pipeline. This connector will enable integration with Elasticsearch, allowing real-time data synchronization and indexing from data source managed by Flink CDC. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35893) Add state compatibility for Serializer of TableChangeInfo
LvYanquan created FLINK-35893: - Summary: Add state compatibility for Serializer of TableChangeInfo Key: FLINK-35893 URL: https://issues.apache.org/jira/browse/FLINK-35893 Project: Flink Issue Type: Technical Debt Components: Flink CDC Affects Versions: cdc-3.1.1 Reporter: LvYanquan Fix For: cdc-3.2.0 Version info of SERIALIZER for TableChangeInfo was not included in [TransformSchemaOperator|https://github.com/apache/flink-cdc/blob/ea71b2302ddc5f9b7be65843dbf3f5bed4ca9d8e/flink-cdc-runtime/src/main/java/org/apache/flink/cdc/runtime/operators/transform/TransformSchemaOperator.java#L127], which may cause incompatible state in the future. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35892) Flink Kubernetes Operator docs lacks info on FlinkSessionJob submission
Sergey Kononov created FLINK-35892: -- Summary: Flink Kubernetes Operator docs lacks info on FlinkSessionJob submission Key: FLINK-35892 URL: https://issues.apache.org/jira/browse/FLINK-35892 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.9.0 Reporter: Sergey Kononov Hello, After FLINK-26808 fix a job submission to Flink session cluster via Flink Kubernetes Operator (FKO) is still impossible without setting config option web.submit.enable to true. If it's false FKO produces an exception: {{i.j.o.p.e.ReconciliationDispatcher [ERROR][fko/jobname] Error during event processing ExecutionScope\{ resource id: ResourceID{name='jobname', namespace='fko'}, version: 173807615} failed.}} {{org.apache.flink.kubernetes.operator.exception.ReconciliationException: java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.util.RestClientException: [POST request not allowed]}} As I understand it is an intended behavior as FKO uses {{/jars/*}} REST endpoints to submit jobs to a session cluster and they are not initialized if web.submit.enable is false. I would suggest to mention in FKO docs ([here|https://github.com/apache/flink-kubernetes-operator/blob/main/docs/content/docs/custom-resource/overview.md#flinksessionjob-spec-overview[]|https://github.com/apache/flink-kubernetes-operator/blob/main/docs/content/docs/custom-resource/overview.md#flinksessionjob-spec-overview]) that the option web.submit.enable should be set to true if one intends to use FKO with the session cluster. I lost quite a lot of time to figure this out. Best regards, Sergey -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35891) Support writing dynamic bucket table of Paimon
LvYanquan created FLINK-35891: - Summary: Support writing dynamic bucket table of Paimon Key: FLINK-35891 URL: https://issues.apache.org/jira/browse/FLINK-35891 Project: Flink Issue Type: Technical Debt Components: Flink CDC Affects Versions: cdc-3.2.0 Reporter: LvYanquan Support writing table in [https://paimon.apache.org/docs/master/primary-key-table/data-distribution/#dynamic-bucket] mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35890) Select only the increment but still enter the SNAPSHOT,lead
sanqingleo created FLINK-35890: -- Summary: Select only the increment but still enter the SNAPSHOT,lead Key: FLINK-35890 URL: https://issues.apache.org/jira/browse/FLINK-35890 Project: Flink Issue Type: Bug Components: Flink CDC Affects Versions: cdc-3.1.1 Reporter: sanqingleo Attachments: image-2024-07-24-16-47-19-258.png Hi, when I was using Flink CDC Postgre, I chose 'debezium.snapshot.mode' = 'never' to skip the snapshot phase, but the code still entered the snapshot branch, and this log was printed, 'Database snapshot phase can 't perform checkpoint, acquired Checkpoint lock.' (Since our business creates the task first and then writes the data), the task cannot perform ckp, and the status is always stuck in in progress. After the timeout, the task restarts. Normally, if you only choose incremental data synchronization, you should not enter this branch. The failure of ckp does not mean that the first one fails. After multiple ckp are successfully performed, it will then get stuck, causing a timeout and restarting the task. This is a problem that must occur. After it appeared in production, it also recurred many times in local testing. Then we test, if there is data flowing in, the task status will be normal. sql and conf like this !image-2024-07-24-16-47-19-258.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35889) mongo cdc restore from expired resume token and job status still running but expect failed
Darren_Han created FLINK-35889: -- Summary: mongo cdc restore from expired resume token and job status still running but expect failed Key: FLINK-35889 URL: https://issues.apache.org/jira/browse/FLINK-35889 Project: Flink Issue Type: Bug Components: Flink CDC Affects Versions: cdc-3.1.0 Reporter: Darren_Han version mongodb: 3.6 flink: 1.14.6 mongo-cdc:2.4.2 Restarting job from a savepoint/checpoint which contains expired resume token/point, job status is always running and do not capture change data, printing logs continuously. Here is some example logs: 2024-07-23 11:11:04,214 INFO com.mongodb.kafka.connect.source.MongoSourceTask [] - An exception occurred when trying to get the next item from the Change Stream com.mongodb.MongoQueryException: Query failed with error code 280 and error message 'resume of change stream was not possible, as the resume token was not found. \{_data: BinData(0, "xx")}' 2024-07-23 17:53:27,330 INFO com.mongodb.kafka.connect.source.MongoSourceTask [] - An exception occurred when trying to get the next item from the Change Stream com.mongodb.MongoQueryException: Query failed with error code 280 and error message 'resume of change notification was not possible, as the resume point may no longer be in the oplog. ' on server -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35888) Add e2e test for paimon DataSink
LvYanquan created FLINK-35888: - Summary: Add e2e test for paimon DataSink Key: FLINK-35888 URL: https://issues.apache.org/jira/browse/FLINK-35888 Project: Flink Issue Type: Technical Debt Components: Flink CDC Affects Versions: cdc-3.2.0 Reporter: LvYanquan Fix For: cdc-3.2.0 Paimon DataSink was already completed, but not e2e test was added. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35887) Null Pointer Exception in TypeExtractor.isRecord when trying to provide type info for interface
Jacob Jona Fahlenkamp created FLINK-35887: - Summary: Null Pointer Exception in TypeExtractor.isRecord when trying to provide type info for interface Key: FLINK-35887 URL: https://issues.apache.org/jira/browse/FLINK-35887 Project: Flink Issue Type: Bug Components: API / Type Serialization System Affects Versions: 1.19.1 Reporter: Jacob Jona Fahlenkamp The following code {code:java} import org.apache.flink.api.common.typeinfo.TypeInfo; import org.apache.flink.api.common.typeinfo.TypeInfoFactory; import org.apache.flink.api.common.typeinfo.TypeInformation; import org.apache.flink.api.common.typeinfo.Types; import org.apache.flink.types.PojoTestUtils; import org.junit.jupiter.api.Test; import java.lang.reflect.Type; import java.util.Map; public class DebugTest { @TypeInfo(FooFactory.class) public interface Foo{} public static class FooFactory extends TypeInfoFactory { @Override public TypeInformation createTypeInfo(Type type, Map> map) { return Types.POJO(Foo.class, Map.of()); } } @Test void test() { PojoTestUtils.assertSerializedAsPojo(Foo.class); } } {code} throws this exception: {code:java} java.lang.NullPointerException: Cannot invoke "java.lang.Class.getName()" because the return value of "java.lang.Class.getSuperclass()" is null at org.apache.flink.api.java.typeutils.TypeExtractor.isRecord(TypeExtractor.java:2227) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.(PojoSerializer.java:125) at org.apache.flink.api.java.typeutils.PojoTypeInfo.createPojoSerializer(PojoTypeInfo.java:359) at org.apache.flink.api.java.typeutils.PojoTypeInfo.createSerializer(PojoTypeInfo.java:347) at org.apache.flink.types.PojoTestUtils.assertSerializedAsPojo(PojoTestUtils.java:48) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35886) Incorrect watermark idleness timeout accounting when subtask is backpressured/blocked
Piotr Nowojski created FLINK-35886: -- Summary: Incorrect watermark idleness timeout accounting when subtask is backpressured/blocked Key: FLINK-35886 URL: https://issues.apache.org/jira/browse/FLINK-35886 Project: Flink Issue Type: Bug Components: API / DataStream, Runtime / Task Affects Versions: 1.19.1, 1.18.1, 1.20.0 Reporter: Piotr Nowojski Currently when using watermark with idleness in Flink, idleness can be incorrectly detected when reading records from a source that is blocked by the runtime. For example this can easily happen when source is either backpressured, or blocked by the watermark alignment. In those cases, despite there are more records to be read from the source (or source’s split), runtime is deciding not to poll (or being unable to) those records. In such case idleness timeout can kick in, marking source/source split as idle, which can lead to incorrect combined watermark calculations and dropping of incorrectly marked late records. h4. Watermark alignment If there are two source splits, A and B , and maxAllowedWatermarkDrift is set to 30s. # Partition A emitted watermark 1042 sec, while partition B sits at watermark 1000 sec. # {{1042s - 1000s > maxAllowedWatermarkDrift}}, so partition A is blocked by the watermark alignment. # For the duration of idleTimeout, partition B is emitting some large batch of records, that do not advance watermark of that partition by much. For example either watermark for partition B stays 1000s, or is updated by a small amount to for example 1005s. # idleTimeout kicks in, marking partition A as idle # partition B finishes emitting large batch of those older records, and let's say now there is a gap in rowtimes. Previously partition B was emitting records with rowtime ~1000s, now it jumps to for example ~5000s. # As partition A is idle, combined watermark jumps to ~5000s as well. # Watermark alignment unblocks partition A, and it continues emitting records with rowtime ~1042s. But now all of those records are dropped due to being late. h4. Backpressure When there are two SourceOperator’s, A and B. Due to for example some data skew, it could happen that either only A gets backpressured, or A is backpressured quicker/sooner. Either way, during that time when A is backpressured, while B is not, B can bump the combined watermark high enough, so that when backpressure recedes, fresh records from A will be considered as late, leading to incorrect results. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35885) SlicingWindowOperator should not process watermark with proctime
Baozhu Zhao created FLINK-35885: --- Summary: SlicingWindowOperator should not process watermark with proctime Key: FLINK-35885 URL: https://issues.apache.org/jira/browse/FLINK-35885 Project: Flink Issue Type: Bug Components: Table SQL / Runtime Affects Versions: 1.17.2, 1.13.6 Environment: flink 1.13.6 or flink 1.17.2 Reporter: Baozhu Zhao We have discovered an unexpected case where abnormal data with a count of 0 occurs when performing proctime window aggregation on data with a watermark, The SQL is as follows {code:sql} CREATE TABLE s1 ( id INT, event_time TIMESTAMP(3), name string, proc_time AS PROCTIME (), WATERMARK FOR event_time AS event_time - INTERVAL '10' SECOND ) WITH ('connector' = 'my-source') ; SELECT * FROM ( SELECT name, COUNT(id) AS total_count, window_start, window_end FROM TABLE ( TUMBLE ( TABLE s1, DESCRIPTOR (proc_time), INTERVAL '30' SECONDS ) ) GROUP BY window_start, window_end, name ) WHERE total_count = 0; {code} For detailed test code, please refer to xxx -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35884) Support
JunboWang created FLINK-35884: - Summary: Support Key: FLINK-35884 URL: https://issues.apache.org/jira/browse/FLINK-35884 Project: Flink Issue Type: Improvement Components: Flink CDC Reporter: JunboWang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35883) Wildcard projection inserts column at wrong place
yux created FLINK-35883: --- Summary: Wildcard projection inserts column at wrong place Key: FLINK-35883 URL: https://issues.apache.org/jira/browse/FLINK-35883 Project: Flink Issue Type: Bug Components: Flink CDC Reporter: yux In this case where a wildcard projection was declared: ```yaml transform: - projection: \*, 'extras' AS extras ``` For upstream schema [a, b, c], transform operator should send [a, b, c, extras] to downstream. However, if another column 'd' was inserted at the end, upstream schema would be [a, b, c, d], and one might expect transformed schema to be [a, b, c, d, extras]. But it's [a, b, c, extras, d], since `AddColumnEvent{d, position=LAST}` was applied to [a, b, c, extras] after the projection process. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35882) CLONE - [Release-1.20] Vote on the release candidate
Weijie Guo created FLINK-35882: -- Summary: CLONE - [Release-1.20] Vote on the release candidate Key: FLINK-35882 URL: https://issues.apache.org/jira/browse/FLINK-35882 Project: Flink Issue Type: Sub-task Affects Versions: 1.17.0 Reporter: Weijie Guo Assignee: Weijie Guo Fix For: 1.17.0 Once you have built and individually reviewed the release candidate, please share it for the community-wide review. Please review foundation-wide [voting guidelines|http://www.apache.org/foundation/voting.html] for more information. Start the review-and-vote thread on the dev@ mailing list. Here’s an email template; please adjust as you see fit. {quote}From: Release Manager To: dev@flink.apache.org Subject: [VOTE] Release 1.2.3, release candidate #3 Hi everyone, Please review and vote on the release candidate #3 for the version 1.2.3, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], * the official Apache source release and binary convenience releases to be deployed to dist.apache.org [2], which are signed with the key with fingerprint [3], * all artifacts to be deployed to the Maven Central Repository [4], * source code tag "release-1.2.3-rc3" [5], * website pull request listing the new release and adding announcement blog post [6]. The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. Thanks, Release Manager [1] link [2] link [3] [https://dist.apache.org/repos/dist/release/flink/KEYS] [4] link [5] link [6] link {quote} *If there are any issues found in the release candidate, reply on the vote thread to cancel the vote.* There’s no need to wait 72 hours. Proceed to the Fix Issues step below and address the problem. However, some issues don’t require cancellation. For example, if an issue is found in the website pull request, just correct it on the spot and the vote can continue as-is. For cancelling a release, the release manager needs to send an email to the release candidate thread, stating that the release candidate is officially cancelled. Next, all artifacts created specifically for the RC in the previous steps need to be removed: * Delete the staging repository in Nexus * Remove the source / binary RC files from dist.apache.org * Delete the source code tag in git *If there are no issues, reply on the vote thread to close the voting.* Then, tally the votes in a separate email. Here’s an email template; please adjust as you see fit. {quote}From: Release Manager To: dev@flink.apache.org Subject: [RESULT] [VOTE] Release 1.2.3, release candidate #3 I'm happy to announce that we have unanimously approved this release. There are XXX approving votes, XXX of which are binding: * approver 1 * approver 2 * approver 3 * approver 4 There are no disapproving votes. Thanks everyone! {quote} h3. Expectations * Community votes to release the proposed candidate, with at least three approving PMC votes Any issues that are raised till the vote is over should be either resolved or moved into the next release (if applicable). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35880) CLONE - [Release-1.20] Stage source and binary releases on dist.apache.org
Weijie Guo created FLINK-35880: -- Summary: CLONE - [Release-1.20] Stage source and binary releases on dist.apache.org Key: FLINK-35880 URL: https://issues.apache.org/jira/browse/FLINK-35880 Project: Flink Issue Type: Sub-task Affects Versions: 1.20.0 Reporter: Weijie Guo Assignee: Weijie Guo Fix For: 1.20.0 Copy the source release to the dev repository of dist.apache.org: # If you have not already, check out the Flink section of the dev repository on dist.apache.org via Subversion. In a fresh directory: {code:bash} $ svn checkout https://dist.apache.org/repos/dist/dev/flink --depth=immediates {code} # Make a directory for the new release and copy all the artifacts (Flink source/binary distributions, hashes, GPG signatures and the python subdirectory) into that newly created directory: {code:bash} $ mkdir flink/flink-${RELEASE_VERSION}-rc${RC_NUM} $ mv /tools/releasing/release/* flink/flink-${RELEASE_VERSION}-rc${RC_NUM} {code} # Add and commit all the files. {code:bash} $ cd flink flink $ svn add flink-${RELEASE_VERSION}-rc${RC_NUM} flink $ svn commit -m "Add flink-${RELEASE_VERSION}-rc${RC_NUM}" {code} # Verify that files are present under [https://dist.apache.org/repos/dist/dev/flink|https://dist.apache.org/repos/dist/dev/flink]. # Push the release tag if not done already (the following command assumes to be called from within the apache/flink checkout): {code:bash} $ git push refs/tags/release-${RELEASE_VERSION}-rc${RC_NUM} {code} h3. Expectations * Maven artifacts deployed to the staging repository of [repository.apache.org|https://repository.apache.org/content/repositories/] * Source distribution deployed to the dev repository of [dist.apache.org|https://dist.apache.org/repos/dist/dev/flink/] * Check hashes (e.g. shasum -c *.sha512) * Check signatures (e.g. {{{}gpg --verify flink-1.2.3-source-release.tar.gz.asc flink-1.2.3-source-release.tar.gz{}}}) * {{grep}} for legal headers in each file. * If time allows check the NOTICE files of the modules whose dependencies have been changed in this release in advance, since the license issues from time to time pop up during voting. See [Verifying a Flink Release|https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Release] "Checking License" section. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35881) CLONE - [Release-1.20] Propose a pull request for website updates
Weijie Guo created FLINK-35881: -- Summary: CLONE - [Release-1.20] Propose a pull request for website updates Key: FLINK-35881 URL: https://issues.apache.org/jira/browse/FLINK-35881 Project: Flink Issue Type: Sub-task Affects Versions: 1.17.0 Reporter: Weijie Guo Assignee: Weijie Guo Fix For: 1.20.0 The final step of building the candidate is to propose a website pull request containing the following changes: # update [apache/flink-web:_config.yml|https://github.com/apache/flink-web/blob/asf-site/_config.yml] ## update {{FLINK_VERSION_STABLE}} and {{FLINK_VERSION_STABLE_SHORT}} as required ## update version references in quickstarts ({{{}q/{}}} directory) as required ## (major only) add a new entry to {{flink_releases}} for the release binaries and sources ## (minor only) update the entry for the previous release in the series in {{flink_releases}} ### Please pay notice to the ids assigned to the download entries. They should be unique and reflect their corresponding version number. ## add a new entry to {{release_archive.flink}} # add a blog post announcing the release in _posts # add a organized release notes page under docs/content/release-notes and docs/content.zh/release-notes (like [https://nightlies.apache.org/flink/flink-docs-release-1.15/release-notes/flink-1.15/]). The page is based on the non-empty release notes collected from the issues, and only the issues that affect existing users should be included (e.g., instead of new functionality). It should be in a separate PR since it would be merged to the flink project. (!) Don’t merge the PRs before finalizing the release. h3. Expectations * Website pull request proposed to list the [release|http://flink.apache.org/downloads.html] * (major only) Check {{docs/config.toml}} to ensure that ** the version constants refer to the new version ** the {{baseurl}} does not point to {{flink-docs-master}} but {{flink-docs-release-X.Y}} instead -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35879) CLONE - [Release-1.20] Build and stage Java and Python artifacts
Weijie Guo created FLINK-35879: -- Summary: CLONE - [Release-1.20] Build and stage Java and Python artifacts Key: FLINK-35879 URL: https://issues.apache.org/jira/browse/FLINK-35879 Project: Flink Issue Type: Sub-task Affects Versions: 1.20.0 Reporter: Weijie Guo Assignee: Weijie Guo Fix For: 1.20.0 # Create a local release branch ((!) this step can not be skipped for minor releases): {code:bash} $ cd ./tools tools/ $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$RELEASE_VERSION RELEASE_CANDIDATE=$RC_NUM releasing/create_release_branch.sh {code} # Tag the release commit: {code:bash} $ git tag -s ${TAG} -m "${TAG}" {code} # We now need to do several things: ## Create the source release archive ## Deploy jar artefacts to the [Apache Nexus Repository|https://repository.apache.org/], which is the staging area for deploying the jars to Maven Central ## Build PyFlink wheel packages You might want to create a directory on your local machine for collecting the various source and binary releases before uploading them. Creating the binary releases is a lengthy process but you can do this on another machine (for example, in the "cloud"). When doing this, you can skip signing the release files on the remote machine, download them to your local machine and sign them there. # Build the source release: {code:bash} tools $ RELEASE_VERSION=$RELEASE_VERSION releasing/create_source_release.sh {code} # Stage the maven artifacts: {code:bash} tools $ releasing/deploy_staging_jars.sh {code} Review all staged artifacts ([https://repository.apache.org/]). They should contain all relevant parts for each module, including pom.xml, jar, test jar, source, test source, javadoc, etc. Carefully review any new artifacts. # Close the staging repository on Apache Nexus. When prompted for a description, enter “Apache Flink, version X, release candidate Y”. Then, you need to build the PyFlink wheel packages (since 1.11): # Set up an azure pipeline in your own Azure account. You can refer to [Azure Pipelines|https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository] for more details on how to set up azure pipeline for a fork of the Flink repository. Note that a google cloud mirror in Europe is used for downloading maven artifacts, therefore it is recommended to set your [Azure organization region|https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/change-organization-location] to Europe to speed up the downloads. # Push the release candidate branch to your forked personal Flink repository, e.g. {code:bash} tools $ git push refs/heads/release-${RELEASE_VERSION}-rc${RC_NUM}:release-${RELEASE_VERSION}-rc${RC_NUM} {code} # Trigger the Azure Pipelines manually to build the PyFlink wheel packages ## Go to your Azure Pipelines Flink project → Pipelines ## Click the "New pipeline" button on the top right ## Select "GitHub" → your GitHub Flink repository → "Existing Azure Pipelines YAML file" ## Select your branch → Set path to "/azure-pipelines.yaml" → click on "Continue" → click on "Variables" ## Then click "New Variable" button, fill the name with "MODE", and the value with "release". Click "OK" to set the variable and the "Save" button to save the variables, then back on the "Review your pipeline" screen click "Run" to trigger the build. ## You should now see a build where only the "CI build (release)" is running # Download the PyFlink wheel packages from the build result page after the jobs of "build_wheels mac" and "build_wheels linux" have finished. ## Download the PyFlink wheel packages ### Open the build result page of the pipeline ### Go to the {{Artifacts}} page (build_wheels linux -> 1 artifact) ### Click {{wheel_Darwin_build_wheels mac}} and {{wheel_Linux_build_wheels linux}} separately to download the zip files ## Unzip these two zip files {code:bash} $ cd /path/to/downloaded_wheel_packages $ unzip wheel_Linux_build_wheels\ linux.zip $ unzip wheel_Darwin_build_wheels\ mac.zip{code} ## Create directory {{./dist}} under the directory of {{{}flink-python{}}}: {code:bash} $ cd $ mkdir flink-python/dist{code} ## Move the unzipped wheel packages to the directory of {{{}flink-python/dist{}}}: {code:java} $ mv /path/to/wheel_Darwin_build_wheels\ mac/* flink-python/dist/ $ mv /path/to/wheel_Linux_build_wheels\ linux/* flink-python/dist/ $ cd tools{code} Finally, we create the binary convenience release files: {code:bash} tools $ RELEASE_VERSION=$RELEASE_VERSION releasing/create_binary_release.sh {code} If you want to run this step in parallel on a remote
[jira] [Created] (FLINK-35878) Build Release Candidate: 1.20.0-rc2
Weijie Guo created FLINK-35878: -- Summary: Build Release Candidate: 1.20.0-rc2 Key: FLINK-35878 URL: https://issues.apache.org/jira/browse/FLINK-35878 Project: Flink Issue Type: New Feature Affects Versions: 1.20.0 Reporter: Weijie Guo Assignee: Weijie Guo Fix For: 1.20.0 The core of the release process is the build-vote-fix cycle. Each cycle produces one release candidate. The Release Manager repeats this cycle until the community approves one release candidate, which is then finalized. h4. Prerequisites Set up a few environment variables to simplify Maven commands that follow. This identifies the release candidate being built. Start with {{RC_NUM}} equal to 1 and increment it for each candidate: {code} RC_NUM="1" TAG="release-${RELEASE_VERSION}-rc${RC_NUM}" {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)