Re: Frequent Flink JM restarts due to Kube API server errors.
This might be related with FLINK-28481, which is a bug in fabric8io k8s client. [1]. https://issues.apache.org/jira/browse/FLINK-28481 Best, Yang On Tue, Feb 6, 2024 at 12:30 PM Lavkesh Lahngir wrote: > Hi, Matthias, I was wondering if there are any timeout or heartbeat > configurations for KubeHA available. > > Thanks. > > On Mon, 5 Feb 2024 at 8:58 PM, Matthias Pohl .invalid> > wrote: > > > That's stated in the Jira issue. I didn't have the time to investigate it > > further. > > > > On Mon, Feb 5, 2024 at 1:55 PM Lavkesh Lahngir > wrote: > > > > > Hi Matthias, > > > Thanks for the suggestion. Do we know which part of code caused this > > issue > > > and how it was fixed? > > > > > > Thanks! > > > > > > On Mon, 5 Feb 2024 at 18:06, Matthias Pohl > > .invalid> > > > wrote: > > > > > > > Hi Lavkesh, > > > > FLINK-33998 [1] sounds quite similar to what you describe. > > > > > > > > The solution was to upgrade to Flink version 1.14.6. I didn't have > the > > > > capacity to look into the details considering that the mentioned > Flink > > > > version 1.14 is not officially supported by the community anymore > and a > > > fix > > > > seems to have been provided with a newer version. > > > > > > > > Matthias > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-33998 > > > > > > > > On Mon, Feb 5, 2024 at 6:18 AM Lavkesh Lahngir > > > wrote: > > > > > > > > > Hii, Few more details: > > > > > We are running GKE version 1.27.7-gke.1121002. > > > > > and using flink version 1.14.3. > > > > > > > > > > Thanks! > > > > > > > > > > On Mon, 5 Feb 2024 at 12:05, Lavkesh Lahngir > > > wrote: > > > > > > > > > > > Hii All, > > > > > > > > > > > > We run a Flink operator on GKE, deploying one Flink job per job > > > > manager. > > > > > > We utilize > > > > > > > > > > > > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > > > > > > for high availability. The JobManager employs config maps for > > > > > checkpointing > > > > > > and leader election. If, at any point, the Kube API server > returns > > an > > > > > error > > > > > > (5xx or 4xx), the JM pod is restarted. This occurrence is > sporadic, > > > > > > happening every 1-2 days for some jobs among the 400 running in > the > > > > same > > > > > > cluster, each with its JobManager pod. > > > > > > > > > > > > What might be causing these errors from the Kube? One possibility > > is > > > > that > > > > > > when the JM writes the config map and attempts to retrieve it > > > > immediately > > > > > > after, it could result in a 404 error. > > > > > > Are there any configurations to increase heartbeat or timeouts > that > > > > might > > > > > > be causing temporary disconnections from the Kube API server? > > > > > > > > > > > > Thank you! > > > > > > > > > > > > > > > > > > > > >
[jira] [Created] (FLINK-34384) Release Testing: Verify FLINK-33735 Improve the exponential-delay restart-strategy
lincoln lee created FLINK-34384: --- Summary: Release Testing: Verify FLINK-33735 Improve the exponential-delay restart-strategy Key: FLINK-34384 URL: https://issues.apache.org/jira/browse/FLINK-34384 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Affects Versions: 1.19.0 Reporter: lincoln lee Assignee: Rui Fan Fix For: 1.19.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34383) Modify the comment with incorrect syntax
li you created FLINK-34383: -- Summary: Modify the comment with incorrect syntax Key: FLINK-34383 URL: https://issues.apache.org/jira/browse/FLINK-34383 Project: Flink Issue Type: Improvement Components: Documentation Reporter: li you There is an error in the syntax of the comment for the class PermanentBlobCache -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34382) Release Testing: Verify FLINK-33625 Support System out and err to be redirected to LOG or discarded
lincoln lee created FLINK-34382: --- Summary: Release Testing: Verify FLINK-33625 Support System out and err to be redirected to LOG or discarded Key: FLINK-34382 URL: https://issues.apache.org/jira/browse/FLINK-34382 Project: Flink Issue Type: Sub-task Components: Runtime / Configuration Affects Versions: 1.19.0 Reporter: lincoln lee Assignee: Rui Fan Fix For: 1.19.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34381) `RelDataType#getFullTypeString` should be used to print in `RelTreeWriterImpl` if `withRowType` is true instead of `Object#toString`
xuyang created FLINK-34381: -- Summary: `RelDataType#getFullTypeString` should be used to print in `RelTreeWriterImpl` if `withRowType` is true instead of `Object#toString` Key: FLINK-34381 URL: https://issues.apache.org/jira/browse/FLINK-34381 Project: Flink Issue Type: Technical Debt Components: Table SQL / Planner Affects Versions: 1.9.0, 1.19.0 Reporter: xuyang Currently `RelTreeWriterImpl` use `rel.getRowType.toString` to print row type. {code:java} if (withRowType) { s.append(", rowType=[").append(rel.getRowType.toString).append("]") } {code} However, looking deeper into the code, we should use `rel.getRowType.getFullTypeString` to print the row type. Because the function `getFullTypeString` will print richer type information such as `nullable`. Take `StructuredRelDataType` as an example, the diff is below: {code:java} // source util.addTableSource[(Long, Int, String)]("MyTable", 'a, 'b, 'c) // sql SELECT a, c FROM MyTable // rel.getRowType.toString RecordType(BIGINT a, VARCHAR(2147483647) c) // rel.getRowType.getFullTypeString RecordType(BIGINT a, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" c) NOT NULL{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: 退订
Hi, 请分别发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 和 dev-unsubscr...@flink.apache.org 地址来取消订阅来自 user...@flink.apache.org 和 dev@flink.apache.org 邮件组的邮件,你可以参考[1][2] 管理你的邮件订阅。 Please send email to user-zh-unsubscr...@flink.apache.org and dev-unsubscr...@flink.apache.org if you want to unsubscribe the mail from user...@flink.apache.org and dev@flink.apache.org, and you can refer [1][2] for more details. Best, Hang [1] https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8 [2] https://flink.apache.org/community.html#mailing-lists 12260035 <12260...@qq.com.invalid> 于2024年2月6日周二 14:17写道: > 退订 > > > > > --原始邮件-- > 发件人: > "dev" > < > qr7...@163.com; > 发送时间:2024年1月19日(星期五) 下午3:36 > 收件人:"dev" > 主题:退订 > > > > 退订
[jira] [Created] (FLINK-34380) Strange RowKind and records about intermediate output when using minibatch join
xuyang created FLINK-34380: -- Summary: Strange RowKind and records about intermediate output when using minibatch join Key: FLINK-34380 URL: https://issues.apache.org/jira/browse/FLINK-34380 Project: Flink Issue Type: Bug Components: Table SQL / Runtime Affects Versions: 1.19.0 Reporter: xuyang Fix For: 1.19.0 {code:java} // Add it in CalcItCase @Test def test(): Unit = { env.setParallelism(1) val rows = Seq( changelogRow("+I", java.lang.Integer.valueOf(1), "1"), changelogRow("-U", java.lang.Integer.valueOf(1), "1"), changelogRow("+U", java.lang.Integer.valueOf(1), "99"), changelogRow("-D", java.lang.Integer.valueOf(1), "99") ) val dataId = TestValuesTableFactory.registerData(rows) val ddl = s""" |CREATE TABLE t1 ( | a int, | b string |) WITH ( | 'connector' = 'values', | 'data-id' = '$dataId', | 'bounded' = 'false' |) """.stripMargin tEnv.executeSql(ddl) val ddl2 = s""" |CREATE TABLE t2 ( | a int, | b string |) WITH ( | 'connector' = 'values', | 'data-id' = '$dataId', | 'bounded' = 'false' |) """.stripMargin tEnv.executeSql(ddl2) tEnv.getConfig.getConfiguration .set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_ENABLED, Boolean.box(true)) tEnv.getConfig.getConfiguration .set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_ALLOW_LATENCY, Duration.ofSeconds(5)) tEnv.getConfig.getConfiguration .set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_SIZE, Long.box(3L)) println(tEnv.sqlQuery("SELECT * from t1 join t2 on t1.a = t2.a").explain()) tEnv.executeSql("SELECT * from t1 join t2 on t1.a = t2.a").print() } {code} Output: {code:java} ++-+-+-+-+ | op | a | b | a0 | b0 | ++-+-+-+-+ | +U | 1 | 1 | 1 | 99 | | +U | 1 | 99 | 1 | 99 | | -U | 1 | 1 | 1 | 99 | | -D | 1 | 99 | 1 | 99 | ++-+-+-+-+{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34379) table.optimizer.dynamic-filtering.enabled lead to OutOfMemoryError
zhu created FLINK-34379: --- Summary: table.optimizer.dynamic-filtering.enabled lead to OutOfMemoryError Key: FLINK-34379 URL: https://issues.apache.org/jira/browse/FLINK-34379 Project: Flink Issue Type: Bug Affects Versions: 1.18.1, 1.17.2 Environment: 1.17.1 Reporter: zhu When using batch computing, I union all about 50 tables and then join other table. When compiling the execution plan, there throws OutOfMemoryError: Java heap space, which was no problem in 1.15.2. However, both 1.17.2 and 1.18.1 all throws same errors,This causes jobmanager to restart. Currently,it has been found that this is caused by table.optimizer.dynamic-filtering.enabled, which defaults is true,When I set table.optimizer.dynamic-filtering.enabled to false, it can be compiled and executed normally code TableEnvironment.create(EnvironmentSettings.newInstance() .withConfiguration(configuration) .inBatchMode().build()) sql=select att,filename,'table0' as mo_name from table0 UNION All select att,filename,'table1' as mo_name from table1 UNION All select att,filename,'table2' as mo_name from table2 UNION All select att,filename,'table3' as mo_name from table3 UNION All select att,filename,'table4' as mo_name from table4 UNION All select att,filename,'table5' as mo_name from table5 UNION All select att,filename,'table6' as mo_name from table6 UNION All select att,filename,'table7' as mo_name from table7 UNION All select att,filename,'table8' as mo_name from table8 UNION All select att,filename,'table9' as mo_name from table9 UNION All select att,filename,'table10' as mo_name from table10 UNION All select att,filename,'table11' as mo_name from table11 UNION All select att,filename,'table12' as mo_name from table12 UNION All select att,filename,'table13' as mo_name from table13 UNION All select att,filename,'table14' as mo_name from table14 UNION All select att,filename,'table15' as mo_name from table15 UNION All select att,filename,'table16' as mo_name from table16 UNION All select att,filename,'table17' as mo_name from table17 UNION All select att,filename,'table18' as mo_name from table18 UNION All select att,filename,'table19' as mo_name from table19 UNION All select att,filename,'table20' as mo_name from table20 UNION All select att,filename,'table21' as mo_name from table21 UNION All select att,filename,'table22' as mo_name from table22 UNION All select att,filename,'table23' as mo_name from table23 UNION All select att,filename,'table24' as mo_name from table24 UNION All select att,filename,'table25' as mo_name from table25 UNION All select att,filename,'table26' as mo_name from table26 UNION All select att,filename,'table27' as mo_name from table27 UNION All select att,filename,'table28' as mo_name from table28 UNION All select att,filename,'table29' as mo_name from table29 UNION All select att,filename,'table30' as mo_name from table30 UNION All select att,filename,'table31' as mo_name from table31 UNION All select att,filename,'table32' as mo_name from table32 UNION All select att,filename,'table33' as mo_name from table33 UNION All select att,filename,'table34' as mo_name from table34 UNION All select att,filename,'table35' as mo_name from table35 UNION All select att,filename,'table36' as mo_name from table36 UNION All select att,filename,'table37' as mo_name from table37 UNION All select att,filename,'table38' as mo_name from table38 UNION All select att,filename,'table39' as mo_name from table39 UNION All select att,filename,'table40' as mo_name from table40 UNION All select att,filename,'table41' as mo_name from table41 UNION All select att,filename,'table42' as mo_name from table42 UNION All select att,filename,'table43' as mo_name from table43 UNION All select att,filename,'table44' as mo_name from table44 UNION All select att,filename,'table45' as mo_name from table45 UNION All select att,filename,'table46' as mo_name from table46 UNION All select att,filename,'table47' as mo_name from table47 UNION All select att,filename,'table48' as mo_name from table48 UNION All select att,filename,'table49' as mo_name from table49 UNION All select att,filename,'table50' as mo_name from table50 UNION All select att,filename,'table51' as mo_name from table51 UNION All select att,filename,'table52' as mo_name from table52 UNION All select att,filename,'table53' as mo_name from table53; Table allUnionTable = tEnv.sqlQuery(sql); Table res = allUnionTable.join( allUnionTable .groupBy(col("att")) .select(col("att"), col("att").count().as(COUNT_NAME)) .filter(col(COUNT_NAME).isGreater(1)) .select(col(key).as("l_key")), col(key).isEqual(col("l_key")) ); res.printExplain(ExplainDetail.JSON_EXECUTION_PLAN); Exception trace Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3181) at
Re: Frequent Flink JM restarts due to Kube API server errors.
Hi, Matthias, I was wondering if there are any timeout or heartbeat configurations for KubeHA available. Thanks. On Mon, 5 Feb 2024 at 8:58 PM, Matthias Pohl wrote: > That's stated in the Jira issue. I didn't have the time to investigate it > further. > > On Mon, Feb 5, 2024 at 1:55 PM Lavkesh Lahngir wrote: > > > Hi Matthias, > > Thanks for the suggestion. Do we know which part of code caused this > issue > > and how it was fixed? > > > > Thanks! > > > > On Mon, 5 Feb 2024 at 18:06, Matthias Pohl > .invalid> > > wrote: > > > > > Hi Lavkesh, > > > FLINK-33998 [1] sounds quite similar to what you describe. > > > > > > The solution was to upgrade to Flink version 1.14.6. I didn't have the > > > capacity to look into the details considering that the mentioned Flink > > > version 1.14 is not officially supported by the community anymore and a > > fix > > > seems to have been provided with a newer version. > > > > > > Matthias > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-33998 > > > > > > On Mon, Feb 5, 2024 at 6:18 AM Lavkesh Lahngir > > wrote: > > > > > > > Hii, Few more details: > > > > We are running GKE version 1.27.7-gke.1121002. > > > > and using flink version 1.14.3. > > > > > > > > Thanks! > > > > > > > > On Mon, 5 Feb 2024 at 12:05, Lavkesh Lahngir > > wrote: > > > > > > > > > Hii All, > > > > > > > > > > We run a Flink operator on GKE, deploying one Flink job per job > > > manager. > > > > > We utilize > > > > > > > > > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > > > > > for high availability. The JobManager employs config maps for > > > > checkpointing > > > > > and leader election. If, at any point, the Kube API server returns > an > > > > error > > > > > (5xx or 4xx), the JM pod is restarted. This occurrence is sporadic, > > > > > happening every 1-2 days for some jobs among the 400 running in the > > > same > > > > > cluster, each with its JobManager pod. > > > > > > > > > > What might be causing these errors from the Kube? One possibility > is > > > that > > > > > when the JM writes the config map and attempts to retrieve it > > > immediately > > > > > after, it could result in a 404 error. > > > > > Are there any configurations to increase heartbeat or timeouts that > > > might > > > > > be causing temporary disconnections from the Kube API server? > > > > > > > > > > Thank you! > > > > > > > > > > > > > > >
[jira] [Created] (FLINK-34378) Minibatch join disrupted the original order of input records
xuyang created FLINK-34378: -- Summary: Minibatch join disrupted the original order of input records Key: FLINK-34378 URL: https://issues.apache.org/jira/browse/FLINK-34378 Project: Flink Issue Type: Technical Debt Reporter: xuyang I'm not sure if it's a bug, the following case can re-produce this bug. {code:java} // add it in CalcITCase @Test def test(): Unit = { env.setParallelism(1) val rows = Seq( row(1, "1"), row(2, "2"), row(3, "3"), row(4, "4"), row(5, "5"), row(6, "6"), row(7, "7"), row(8, "8")) val dataId = TestValuesTableFactory.registerData(rows) val ddl = s""" |CREATE TABLE t1 ( | a int, | b string |) WITH ( | 'connector' = 'values', | 'data-id' = '$dataId', | 'bounded' = 'false' |) """.stripMargin tEnv.executeSql(ddl) val ddl2 = s""" |CREATE TABLE t2 ( | a int, | b string |) WITH ( | 'connector' = 'values', | 'data-id' = '$dataId', | 'bounded' = 'false' |) """.stripMargin tEnv.executeSql(ddl2) tEnv.getConfig.getConfiguration .set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_ENABLED, Boolean.box(true)) tEnv.getConfig.getConfiguration .set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_ALLOW_LATENCY, Duration.ofSeconds(5)) tEnv.getConfig.getConfiguration .set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_SIZE, Long.box(20L)) println(tEnv.sqlQuery("SELECT * from t1 join t2 on t1.a = t2.a").explain()) tEnv.executeSql("SELECT * from t1 join t2 on t1.a = t2.a").print() }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34377) Release Testing : Verify FLINK-33297 Support standard YAML for FLINK configuration
Junrui Li created FLINK-34377: - Summary: Release Testing : Verify FLINK-33297 Support standard YAML for FLINK configuration Key: FLINK-34377 URL: https://issues.apache.org/jira/browse/FLINK-34377 Project: Flink Issue Type: Sub-task Components: Runtime / Configuration Affects Versions: 1.19.0 Reporter: Junrui Li Assignee: Junrui Li Fix For: 1.19.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34376) FLINK SQL SUM() causes a precision error
Fangliang Liu created FLINK-34376: - Summary: FLINK SQL SUM() causes a precision error Key: FLINK-34376 URL: https://issues.apache.org/jira/browse/FLINK-34376 Project: Flink Issue Type: Bug Components: Table SQL / Runtime Affects Versions: 1.18.1, 1.14.3 Reporter: Fangliang Liu -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34375) Complete work for syntax `DESCRIBE EXTENDED tableName`
xuyang created FLINK-34375: -- Summary: Complete work for syntax `DESCRIBE EXTENDED tableName` Key: FLINK-34375 URL: https://issues.apache.org/jira/browse/FLINK-34375 Project: Flink Issue Type: Sub-task Components: Table SQL / API Affects Versions: 1.10.0, 1.19.0 Reporter: xuyang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34374) Complete work for syntax `DESCRIBE EXTENDED DATABASE databaseName`
xuyang created FLINK-34374: -- Summary: Complete work for syntax `DESCRIBE EXTENDED DATABASE databaseName` Key: FLINK-34374 URL: https://issues.apache.org/jira/browse/FLINK-34374 Project: Flink Issue Type: Sub-task Components: Table SQL / API Affects Versions: 1.10.0, 1.19.0 Reporter: xuyang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34373) Complete work for syntax `DESCRIBE DATABASE databaseName`
xuyang created FLINK-34373: -- Summary: Complete work for syntax `DESCRIBE DATABASE databaseName` Key: FLINK-34373 URL: https://issues.apache.org/jira/browse/FLINK-34373 Project: Flink Issue Type: Sub-task Components: Table SQL / API Affects Versions: 1.10.0, 1.19.0 Reporter: xuyang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34372) Complete work for syntax `DESCRIBE CATALOG catalogName`
xuyang created FLINK-34372: -- Summary: Complete work for syntax `DESCRIBE CATALOG catalogName` Key: FLINK-34372 URL: https://issues.apache.org/jira/browse/FLINK-34372 Project: Flink Issue Type: Sub-task Components: Table SQL / API Affects Versions: 1.10.0, 1.19.0 Reporter: xuyang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34371) FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment
Yunfeng Zhou created FLINK-34371: Summary: FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment Key: FLINK-34371 URL: https://issues.apache.org/jira/browse/FLINK-34371 Project: Flink Issue Type: Improvement Components: Runtime / Checkpointing, Runtime / Task Reporter: Yunfeng Zhou This is an umbrella ticket for FLIP-331. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[DISCUSS] Flink aggregate push down is not user-friendly.
Hi, Devs, When I try to develop a new connector, which support aggregate push down. I found that Flink aggregate pushdown is not user-friendly. The `AggregateExpression` passed to the connector by `SupportsAggregatePushDown#applyAggregates` doesn't provide access to subclasses. This makes it impossible for me to directly determine the type of agg operator unless I import the planner module, but this is discouraged and considered a heavyweight action. Because I cann't access the subclasses of `AggregateExpression#FunctionDefinition`, like `CountAggFunction` and am unable to import the planner module, I'm forced to match the agg operators using hack way that match fully qualified class names like the following code: FunctionDefinition functionDefinition = aggregateExpressions.get(0).getFunctionDefinition(); if (!(functionDefinition .getClass() .getCanonicalName() .equals( "org.apache.flink.table.planner.functions.aggfunctions.CountAggFunction") || functionDefinition .getClass() .getCanonicalName() .equals( "org.apache.flink.table.planner.functions.aggfunctions.Count1AggFunction"))) { return false; } I think the problem may also exist with other SupportsXxxPushDown. Should we consider which planner classes can be exposed to developers to facilitate their use? Yours, Swuferhong (Yunhong Zheng).
[jira] [Created] (FLINK-34370) [Umbrella] Complete work about enhanced Flink SQL DDL
xuyang created FLINK-34370: -- Summary: [Umbrella] Complete work about enhanced Flink SQL DDL Key: FLINK-34370 URL: https://issues.apache.org/jira/browse/FLINK-34370 Project: Flink Issue Type: Improvement Components: Table SQL / API Affects Versions: 1.10.0 Reporter: xuyang This is a umbrella Jira for completing work for [Flip-69](https://cwiki.apache.org/confluence/display/FLINK/FLIP-69%3A+Flink+SQL+DDL+Enhancement) about enhanced Flink SQL DDL. With [FLINK-34254](https://issues.apache.org/jira/browse/FLINK-34254), it seems that this flip is not finished yet. -- This message was sent by Atlassian Jira (v8.20.10#820010)
jira permission request
HI, I want to contribute to Apache flink. Would you please give me the contributor permission? My jira id is lxliyou. Thanks.
Re: [DISCUSS] FLIP-417: Expose JobManagerOperatorMetrics via REST API
Hi all, Alex (cc'ed) and I discussed offline about the endpoint path and making it easier to use. Focusing on the main clients of the Flink Metrics REST API, the Flink UI and Flink Kubernetes Operator, we determined that exposing the operator id is unnecessary. There are few considerations: 1. Integrate Flink UI to show source coordinator metrics The Flink UI currently doesn't expose per operator metrics, only task metrics. For operator metrics, metric reporters provide that extensibility to expose operator granularity metrics. So, the operator id is unnecessary for this case. 2. Integrate Flink Kubernetes Operator to read autoscaling metrics from source enumerator The K8s operator currently reads vertex metrics from the Flink Metric REST API to perform autoscaling. In this situation, the operator id is unnecessary as well and in fact, a vertex can only contain 1 source. Therefore, we don't need a parameter for the operator id. Due to these requirements, I recommend to use the path: `/jobs//vertices//coordinator-metrics`. `coordinator-metrics` makes it obvious that the metrics are from the OperatorCoordinator, not to be confused with something like `operator-metrics` (which operator is it?). In addition, I also propose to fix the metric scope [1]: - metrics.scope.operator - Default: .taskmanager - Applied to all metrics that were scoped to an operator. The default should not contain the subtask index, since the coordinator does not correspond to a subtask index. In addition, this configuration could be renamed to `metrics.scope.coordinator` since `operator` is vague. While we will point to the new config in the docs, backward compatibility will be provided for the old config key. I have updated the FLIP doc with these details. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#system-scope Best, Mason On Tue, Jan 23, 2024 at 12:13 PM Mason Chen wrote: > Hi all, > > I hope that everyone has had sufficient time to review the discussion and > the updates on the FLIP doc. If there are no objections, I'll start a > voting thread in 2 days. > > Best, > Mason > > On Thu, Jan 18, 2024 at 2:39 PM Mason Chen wrote: > >> Hi Lijie, >> >> That's also a possibility but I would prefer to keep it consistent with >> how the existing metric APIs are used. >> >> For example, in the current metric APIs [1], there is no way to figure >> out the vertexid and subtaskindex without getting the job graph from >> `/jobs//plan` and correspondingly there are no APIs to return a map >> of all metrics for every vertex and to return a map of all metrics for >> every subtask. Essentially, the plan API is already required to use the >> finer-grained metric apis. >> >> In addition, keeping the design similar lends itself better for the >> implementation. The metric handler utilities [2] assume a >> ComponentMetricStore is returned rather than a Map. >> >> I've updated the FLIP doc ( >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-417%3A+Expose+JobManagerOperatorMetrics+via+REST+API) >> with our discussion so far. >> >> [1] >> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#rest-api-integration >> [2] >> https://github.com/apache/flink/blob/a41229b24d82e8c561350c42d8a98dfb865c3f69/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/metrics/AbstractMetricsHandler.java#L109 >> >> Best, >> Mason >> >> On Wed, Jan 17, 2024 at 8:28 AM Lijie Wang >> wrote: >> >>> Hi Mason, >>> >>> Thanks for driving the discussion. +1 for the proposal. >>> >>> How about we return all operator metrics in a vertex? (the response is a >>> map of operatorId/operatorName -> operator-metrics). Correspondingly, the >>> url may be changed to /jobs//vertices//operator-metrics >>> >>> In this way, users can skip the step of obtaining the operator id. >>> >>> Best, >>> Lijie >>> >>> Hang Ruan 于2024年1月17日周三 10:31写道: >>> >>> > Hi, Mason. >>> > >>> > The field `operatorName` in JobManagerOperatorQueryScopeInfo refers to >>> the >>> > fields in OperatorQueryScopeInfo and chooses the operatorName instead >>> of >>> > OperatorID. >>> > It is fine by my side to change from opertorName to operatorID in this >>> > FLIP. >>> > >>> > Best, >>> > Hang >>> > >>> > Mason Chen 于2024年1月17日周三 09:39写道: >>> > >>> > > Hi Xuyang and Hang, >>> > > >>> > > Thanks for your support and feedback! See my responses below: >>> > > >>> > > 1. IIRC, in a sense, operator ID and vertex ID are the same thing. >>> The >>> > > > operator ID can >>> > > > be converted from the vertex ID[1]. Therefore, it is somewhat >>> strange >>> > to >>> > > > have both vertex >>> > > > ID and operator ID in a single URL. >>> > > > >>> > > I think Hang explained it perfectly. Essentially, a vertix may >>> contain >>> > one >>> > > or more operators so the operator ID is required to distinguish this >>> > case. >>> > > >>> > > 2. If I misunderstood the semantics of operator IDs here, then what >>> is >>> >
[jira] [Created] (FLINK-34369) Elasticsearch connector supports SSL provider
Mingliang Liu created FLINK-34369: - Summary: Elasticsearch connector supports SSL provider Key: FLINK-34369 URL: https://issues.apache.org/jira/browse/FLINK-34369 Project: Flink Issue Type: Improvement Components: Connectors / ElasticSearch Affects Versions: 1.17.1 Reporter: Mingliang Liu The current Flink ElasticSearch connector does not support SSL option, causing issues connecting to secure ES clusters. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34368) Update GCS filesystems to latest available version v3.0
Martijn Visser created FLINK-34368: -- Summary: Update GCS filesystems to latest available version v3.0 Key: FLINK-34368 URL: https://issues.apache.org/jira/browse/FLINK-34368 Project: Flink Issue Type: Technical Debt Components: FileSystems Reporter: Martijn Visser Assignee: Martijn Visser Update to https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v3.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[DISCUSS] Kubernetes Operator 1.8.0 release planning
Hi all! I would like to kick off the release planning for the operator 1.8.0 release. The last operator release was November 22 last year. Since then we have added a number of fixes and improvements to both the operator and the autoscaler logic. There are a few outstanding PRs currently, including some larger features for the Autoscaler (JDBC event handler, Heap tuning), we have to make a decision regarding those as well whether to include in the release or not. @Maximilian Michels , @Rui Fan <1996fan...@gmail.com> what's your take regarding those PRs? I generally like to be a bit more conservative with large new features to avoid introducing last minute instabilities. My proposal would be to aim for the end of this week as the freeze date (Feb 9) and then we can prepare RC1 on monday. I am happy to volunteer as a release manager but I am of course open to working together with someone on this. What do you think? Cheers, Gyula
[jira] [Created] (FLINK-34367) Release Testing Instructions: Verify FLINK-34027 AsyncScalarFunction for asynchronous scalar function support
lincoln lee created FLINK-34367: --- Summary: Release Testing Instructions: Verify FLINK-34027 AsyncScalarFunction for asynchronous scalar function support Key: FLINK-34367 URL: https://issues.apache.org/jira/browse/FLINK-34367 Project: Flink Issue Type: Sub-task Components: Runtime / REST, Runtime / Web Frontend Affects Versions: 1.19.0 Reporter: lincoln lee Assignee: Yu Chen Fix For: 1.19.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34366) Add support to group rows by column ordinals
Martijn Visser created FLINK-34366: -- Summary: Add support to group rows by column ordinals Key: FLINK-34366 URL: https://issues.apache.org/jira/browse/FLINK-34366 Project: Flink Issue Type: New Feature Components: Table SQL / API Reporter: Martijn Visser Reference: BigQuery https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#group_by_col_ordinals The GROUP BY clause can refer to expression names in the SELECT list. The GROUP BY clause also allows ordinal references to expressions in the SELECT list, using integer values. 1 refers to the first value in the SELECT list, 2 the second, and so forth. The value list can combine ordinals and value names. The following queries are equivalent: {code:sql} WITH PlayerStats AS ( SELECT 'Adams' as LastName, 'Noam' as FirstName, 3 as PointsScored UNION ALL SELECT 'Buchanan', 'Jie', 0 UNION ALL SELECT 'Coolidge', 'Kiran', 1 UNION ALL SELECT 'Adams', 'Noam', 4 UNION ALL SELECT 'Buchanan', 'Jie', 13) SELECT SUM(PointsScored) AS total_points, LastName, FirstName FROM PlayerStats GROUP BY LastName, FirstName; /*--+--+---+ | total_points | LastName | FirstName | +--+--+---+ | 7| Adams| Noam | | 13 | Buchanan | Jie | | 1| Coolidge | Kiran | +--+--+---*/ {code} {code:sql} WITH PlayerStats AS ( SELECT 'Adams' as LastName, 'Noam' as FirstName, 3 as PointsScored UNION ALL SELECT 'Buchanan', 'Jie', 0 UNION ALL SELECT 'Coolidge', 'Kiran', 1 UNION ALL SELECT 'Adams', 'Noam', 4 UNION ALL SELECT 'Buchanan', 'Jie', 13) SELECT SUM(PointsScored) AS total_points, LastName, FirstName FROM PlayerStats GROUP BY 2, 3; /*--+--+---+ | total_points | LastName | FirstName | +--+--+---+ | 7| Adams| Noam | | 13 | Buchanan | Jie | | 1| Coolidge | Kiran | +--+--+---*/ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34365) [docs] Delete repeated pages in Chinese Flink website and correct the Paimon url
Waterking created FLINK-34365: - Summary: [docs] Delete repeated pages in Chinese Flink website and correct the Paimon url Key: FLINK-34365 URL: https://issues.apache.org/jira/browse/FLINK-34365 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Waterking Attachments: 微信截图_20240205214854.png The "教程" column on the [Flink 中文网|https://flink.apache.org/zh/how-to-contribute/contribute-documentation/] currently has two "[With Paimon(incubating) (formerly Flink Table Store)|https://paimon.apache.org/docs/master/engines/flink/%22];. Therefore, I delete one for brevity. Also, the current link is wrong and I correct it with this link "[With Paimon(incubating) (formerly Flink Table Store)|https://paimon.apache.org/docs/master/engines/flink]; -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Frequent Flink JM restarts due to Kube API server errors.
That's stated in the Jira issue. I didn't have the time to investigate it further. On Mon, Feb 5, 2024 at 1:55 PM Lavkesh Lahngir wrote: > Hi Matthias, > Thanks for the suggestion. Do we know which part of code caused this issue > and how it was fixed? > > Thanks! > > On Mon, 5 Feb 2024 at 18:06, Matthias Pohl .invalid> > wrote: > > > Hi Lavkesh, > > FLINK-33998 [1] sounds quite similar to what you describe. > > > > The solution was to upgrade to Flink version 1.14.6. I didn't have the > > capacity to look into the details considering that the mentioned Flink > > version 1.14 is not officially supported by the community anymore and a > fix > > seems to have been provided with a newer version. > > > > Matthias > > > > [1] https://issues.apache.org/jira/browse/FLINK-33998 > > > > On Mon, Feb 5, 2024 at 6:18 AM Lavkesh Lahngir > wrote: > > > > > Hii, Few more details: > > > We are running GKE version 1.27.7-gke.1121002. > > > and using flink version 1.14.3. > > > > > > Thanks! > > > > > > On Mon, 5 Feb 2024 at 12:05, Lavkesh Lahngir > wrote: > > > > > > > Hii All, > > > > > > > > We run a Flink operator on GKE, deploying one Flink job per job > > manager. > > > > We utilize > > > > > > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > > > > for high availability. The JobManager employs config maps for > > > checkpointing > > > > and leader election. If, at any point, the Kube API server returns an > > > error > > > > (5xx or 4xx), the JM pod is restarted. This occurrence is sporadic, > > > > happening every 1-2 days for some jobs among the 400 running in the > > same > > > > cluster, each with its JobManager pod. > > > > > > > > What might be causing these errors from the Kube? One possibility is > > that > > > > when the JM writes the config map and attempts to retrieve it > > immediately > > > > after, it could result in a 404 error. > > > > Are there any configurations to increase heartbeat or timeouts that > > might > > > > be causing temporary disconnections from the Kube API server? > > > > > > > > Thank you! > > > > > > > > > >
Re: Frequent Flink JM restarts due to Kube API server errors.
Hi Matthias, Thanks for the suggestion. Do we know which part of code caused this issue and how it was fixed? Thanks! On Mon, 5 Feb 2024 at 18:06, Matthias Pohl wrote: > Hi Lavkesh, > FLINK-33998 [1] sounds quite similar to what you describe. > > The solution was to upgrade to Flink version 1.14.6. I didn't have the > capacity to look into the details considering that the mentioned Flink > version 1.14 is not officially supported by the community anymore and a fix > seems to have been provided with a newer version. > > Matthias > > [1] https://issues.apache.org/jira/browse/FLINK-33998 > > On Mon, Feb 5, 2024 at 6:18 AM Lavkesh Lahngir wrote: > > > Hii, Few more details: > > We are running GKE version 1.27.7-gke.1121002. > > and using flink version 1.14.3. > > > > Thanks! > > > > On Mon, 5 Feb 2024 at 12:05, Lavkesh Lahngir wrote: > > > > > Hii All, > > > > > > We run a Flink operator on GKE, deploying one Flink job per job > manager. > > > We utilize > > > > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > > > for high availability. The JobManager employs config maps for > > checkpointing > > > and leader election. If, at any point, the Kube API server returns an > > error > > > (5xx or 4xx), the JM pod is restarted. This occurrence is sporadic, > > > happening every 1-2 days for some jobs among the 400 running in the > same > > > cluster, each with its JobManager pod. > > > > > > What might be causing these errors from the Kube? One possibility is > that > > > when the JM writes the config map and attempts to retrieve it > immediately > > > after, it could result in a 404 error. > > > Are there any configurations to increase heartbeat or timeouts that > might > > > be causing temporary disconnections from the Kube API server? > > > > > > Thank you! > > > > > >
[RESULT][VOTE] FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment
Hi devs, I'm happy to announce that FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment [1] has been accepted with 6 approving votes (4 binding) [2]: - Xintong Song (binding) - Rui Fan (binding) - Weijie Guo (binding) - Dong Lin (binding) - Hang Ruan (non-binding) - Yuxin Tan (non-binding) There are no disapproving votes. Thanks again to everyone who participated in the discussion and voting. Best regards, Xuannan [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-331%3A+Support+EndOfStreamTrigger+and+isOutputOnlyAfterEndOfStream+operator+attribute+to+optimize+task+deployment [2] https://lists.apache.org/thread/oy9mdmh6gk8pc0wjdk5kg8dz3jllz9ow
Re: [VOTE] FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment
Hi all, Thank you all! Closing the vote. The result will be announced in a separate email. Best regards, Xuannan On Mon, Feb 5, 2024 at 12:11 PM Yuxin Tan wrote: > > +1 (non-binding) > > Best, > Yuxin > > > Hang Ruan 于2024年2月5日周一 11:22写道: > > > +1 (non-binding) > > > > Best, > > Hang > > > > Dong Lin 于2024年2月5日周一 11:08写道: > > > > > Thanks for the FLIP. > > > > > > +1 (binding) > > > > > > Best, > > > Dong > > > > > > On Wed, Jan 31, 2024 at 11:41 AM Xuannan Su > > wrote: > > > > > > > Hi everyone, > > > > > > > > Thanks for all the feedback about the FLIP-331: Support > > > > EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute > > > > to optimize task deployment [1] [2]. > > > > > > > > I'd like to start a vote for it. The vote will be open for at least 72 > > > > hours(excluding weekends,until Feb 5, 12:00AM GMT) unless there is an > > > > objection or an insufficient number of votes. > > > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-331%3A+Support+EndOfStreamTrigger+and+isOutputOnlyAfterEndOfStream+operator+attribute+to+optimize+task+deployment > > > > [2] https://lists.apache.org/thread/qq39rmg3f23ysx5m094s4c4cq0m4tdj5 > > > > > > > > > > > > Best, > > > > Xuannan > > > > > > > > >
Re: [VOTE] Release flink-connector-parent, release candidate #1
Hi, I just got back from vacations. I'll close the vote thread and proceed to the release later this week. Here is the ticket: https://issues.apache.org/jira/browse/FLINK-34364 Best Etienne Le 04/02/2024 à 05:06, Qingsheng Ren a écrit : +1 (binding) - Verified checksum and signature - Verified pom content - Built flink-connector-kafka from source with the parent pom in staging Best, Qingsheng On Thu, Feb 1, 2024 at 11:19 PM Chesnay Schepler wrote: - checked source/maven pom contents Please file a ticket to exclude tools/release from the source release. +1 (binding) On 29/01/2024 15:59, Maximilian Michels wrote: - Inspected the source for licenses and corresponding headers - Checksums and signature OK +1 (binding) On Tue, Jan 23, 2024 at 4:08 PM Etienne Chauchot wrote: Hi everyone, Please review and vote on the release candidate #1 for the version 1.1.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], * the official Apache source release to be deployed to dist.apache.org [2], which are signed with the key with fingerprint D1A76BA19D6294DD0033F6843A019F0B8DD163EA [3], * all artifacts to be deployed to the Maven Central Repository [4], * source code tag v1.1.0-rc1 [5], * website pull request listing the new release [6] * confluence wiki: connector parent upgrade to version 1.1.0 that will be validated after the artifact is released (there is no PR mechanism on the wiki) [7] The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. Thanks, Etienne [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353442 [2] https://dist.apache.org/repos/dist/dev/flink/flink-connector-parent-1.1.0-rc1 [3]https://dist.apache.org/repos/dist/release/flink/KEYS [4] https://repository.apache.org/content/repositories/orgapacheflink-1698/ [5] https://github.com/apache/flink-connector-shared-utils/releases/tag/v1.1.0-rc1 [6]https://github.com/apache/flink-web/pull/717 [7] https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development
[jira] [Created] (FLINK-34364) stage_source_release.sh should exclude tools/release directory from the source release
Etienne Chauchot created FLINK-34364: Summary: stage_source_release.sh should exclude tools/release directory from the source release Key: FLINK-34364 URL: https://issues.apache.org/jira/browse/FLINK-34364 Project: Flink Issue Type: Improvement Components: Connectors / Parent, Release System Reporter: Etienne Chauchot This directory is the mount point of the release utils repository and should be excluded from the source release. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34363) Connectors release utils should allow to not specify flink version in stage_jars.sh
Etienne Chauchot created FLINK-34363: Summary: Connectors release utils should allow to not specify flink version in stage_jars.sh Key: FLINK-34363 URL: https://issues.apache.org/jira/browse/FLINK-34363 Project: Flink Issue Type: Improvement Components: Connectors / Parent, Release System Reporter: Etienne Chauchot Assignee: Etienne Chauchot For connectors-parent release, Flink version is not needed. The stage_jars.sh script should allow to specify only ${project_version} and not ${project_version}-${flink_minor_version} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Frequent Flink JM restarts due to Kube API server errors.
Hi Lavkesh, FLINK-33998 [1] sounds quite similar to what you describe. The solution was to upgrade to Flink version 1.14.6. I didn't have the capacity to look into the details considering that the mentioned Flink version 1.14 is not officially supported by the community anymore and a fix seems to have been provided with a newer version. Matthias [1] https://issues.apache.org/jira/browse/FLINK-33998 On Mon, Feb 5, 2024 at 6:18 AM Lavkesh Lahngir wrote: > Hii, Few more details: > We are running GKE version 1.27.7-gke.1121002. > and using flink version 1.14.3. > > Thanks! > > On Mon, 5 Feb 2024 at 12:05, Lavkesh Lahngir wrote: > > > Hii All, > > > > We run a Flink operator on GKE, deploying one Flink job per job manager. > > We utilize > > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > > for high availability. The JobManager employs config maps for > checkpointing > > and leader election. If, at any point, the Kube API server returns an > error > > (5xx or 4xx), the JM pod is restarted. This occurrence is sporadic, > > happening every 1-2 days for some jobs among the 400 running in the same > > cluster, each with its JobManager pod. > > > > What might be causing these errors from the Kube? One possibility is that > > when the JM writes the config map and attempts to retrieve it immediately > > after, it could result in a 404 error. > > Are there any configurations to increase heartbeat or timeouts that might > > be causing temporary disconnections from the Kube API server? > > > > Thank you! > > >
Re: [ANNOUNCE] Flink 1.19 feature freeze & sync summary on 01/30/2024
Thanks to Xintong for clarifying this! @Rui Due to the rules of feature freeze: "Only bug fixes and documentation changes are allowed."[1], your merge request has been discussed among 1.19 RMs, we also agree that do not merge these PRs which are purely cleanup work and no more benefits for users. Thank you for agreeing. Also update the progress: *- Cutting release branch* We're still working on two blockers[2][3], we'll decide when to cut 1.19 release branch after next release sync(Feb 6th) discussion [1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Release+Management [2] https://issues.apache.org/jira/browse/FLINK-34337 [3] https://issues.apache.org/jira/browse/FLINK-34007 Best, Yun, Jing, Martijn and Lincoln Rui Fan <1996fan...@gmail.com> 于2024年2月5日周一 13:42写道: > > My opinion would be to follow the process by default, and to make > exceptions only if there're good reasons. > > Sounds make sense, I will merge it after 1.19 branch cutting. > > Thanks Xintong for the explanation! And sorry for bothering. > > Best, > Rui > > On Mon, Feb 5, 2024 at 1:20 PM Xintong Song wrote: > > > Thanks for the info. > > > > My opinion would be to follow the process by default, and to make > > exceptions only if there're good reasons. From your description, it > sounds > > like merging the PR in or after 1.19 doesn't really make a difference. In > > that case, I'd suggest to merge it for the next release (i.e. merge it > into > > master after the 1.19 branch cutting). > > > > Best, > > > > Xintong > > > > > > > > On Mon, Feb 5, 2024 at 12:52 PM Rui Fan <1996fan...@gmail.com> wrote: > > > > > Thanks Xintong for the reply. > > > > > > They are Flink internal classes, and they are not used anymore. > > > So I think they don't affect users, the benefit of removing them > > > is to simplify Flink's code and reduce maintenance costs. > > > > > > If we just merge some user-related PRs recently, I could merge > > > it after 1.19. Thank you again~ > > > > > > Best, > > > Rui > > > > > > On Mon, Feb 5, 2024 at 12:21 PM Xintong Song > > > wrote: > > > > > > > Hi Rui, > > > > > > > > Quick question, would there be any downside if this PR doesn't go > into > > > > 1.19? Or any user benefit from getting it into this release? > > > > > > > > Best, > > > > > > > > Xintong > > > > > > > > > > > > > > > > On Sun, Feb 4, 2024 at 10:16 AM Rui Fan <1996fan...@gmail.com> > wrote: > > > > > > > > > Hi release managers, > > > > > > > > > > > The feature freeze of 1.19 has started now. That means that no > new > > > > > features > > > > > > or improvements should now be merged into the master branch > unless > > > you > > > > > ask > > > > > > the release managers first, which has already been done for PRs, > or > > > > > pending > > > > > > on CI to pass. Bug fixes and documentation PRs can still be > merged. > > > > > > > > > > I'm curious whether the code cleanup could be merged? > > > > > FLINK-31449[1] removed DeclarativeSlotManager related logic. > > > > > Some other classes are not used anymore after FLINK-31449. > > > > > FLINK-34345[2][3] will remove them. > > > > > > > > > > I checked these classes are not used in the master branch. > > > > > And the PR[3] is reviewed for now, could I merge it now or > > > > > after flink-1.19? > > > > > > > > > > Looking forward to your feedback, thanks~ > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-31449 > > > > > [2] https://issues.apache.org/jira/browse/FLINK-34345 > > > > > [3] https://github.com/apache/flink/pull/24257 > > > > > > > > > > Best, > > > > > Rui > > > > > > > > > > On Wed, Jan 31, 2024 at 5:20 PM Lincoln Lee < > lincoln.8...@gmail.com> > > > > > wrote: > > > > > > > > > >> Hi Matthias, > > > > >> > > > > >> Thanks for letting us know! After discussed with 1.19 release > > > managers, > > > > we > > > > >> agreed to merge these pr. > > > > >> > > > > >> Thank you for the work on GHA workflows! > > > > >> > > > > >> Best, > > > > >> Yun, Jing, Martijn and Lincoln > > > > >> > > > > >> > > > > >> Matthias Pohl 于2024年1月30日周二 22:20写道: > > > > >> > > > > >> > Thanks for the update, Lincoln. > > > > >> > > > > > >> > fyi: I merged FLINK-32684 (deprecating AkkaOptions) [1] since we > > > > agreed > > > > >> in > > > > >> > today's meeting that this change is still ok to go in. > > > > >> > > > > > >> > The beta version of the GitHub Actions workflows (FLIP-396 [2]) > > are > > > > also > > > > >> > finalized (see related PRs for basic CI [3], nightly master [4] > > and > > > > >> nightly > > > > >> > scheduling [5]). I'd like to merge the changes before creating > the > > > > >> > release-1.19 branch. That would enable us to see whether we miss > > > > >> anything > > > > >> > in the GHA workflows setup when creating a new release branch. > > > > >> > > > > > >> > The changes are limited to a few CI scripts that are also used > for > > > > Azure > > > > >> > Pipelines (see [3]). The majority of the changes are > GHA-specific > > > and > > > > >> >
[jira] [Created] (FLINK-34362) Add argument to reuse connector docs cache in setup_docs.sh to improve build times
Jane Chan created FLINK-34362: - Summary: Add argument to reuse connector docs cache in setup_docs.sh to improve build times Key: FLINK-34362 URL: https://issues.apache.org/jira/browse/FLINK-34362 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 1.19.0 Reporter: Jane Chan Problem: The current build process of Flink's documentation involves the `setup_docs.sh` script, which re-clones connector repositories every time the documentation is built. This operation is time-consuming, particularly for developers in regions with slower internet connections or facing network restrictions (like the Great Firewall in China). This results in a build process that can take an excessive amount of time, hindering developer productivity. Proposal: We could add a command-line argument (e.g., --use-doc-cache) to the `setup_docs.sh` script, which, when set, skips the cloning step if the connector repositories have already been cloned previously. As a result, developers can opt to use the cache when they do not require the latest versions of the connectors' documentation. This change will reduce build times significantly and improve the developer experience for those working on the documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34361) PyFlink end-to-end test fails in GHA
Matthias Pohl created FLINK-34361: - Summary: PyFlink end-to-end test fails in GHA Key: FLINK-34361 URL: https://issues.apache.org/jira/browse/FLINK-34361 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.19.0 Reporter: Matthias Pohl "PyFlink end-to-end test" fails: https://github.com/apache/flink/actions/runs/7778642859/job/21208811659#step:14:7420 The only error I could identify is: {code} ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. conda 23.5.2 requires ruamel-yaml<0.18,>=0.11.14, but you have ruamel-yaml 0.18.5 which is incompatible. Feb 05 03:31:54 Successfully installed apache-beam-2.48.0 avro-python3-1.10.2 cloudpickle-2.2.1 crcmod-1.7 cython-3.0.8 dill-0.3.1.1 dnspython-2.5.0 docopt-0.6.2 exceptiongroup-1.2.0 fastavro-1.9.3 fasteners-0.19 find-libpython-0.3.1 grpcio-1.50.0 grpcio-tools-1.50.0 hdfs-2.7.3 httplib2-0.22.0 iniconfig-2.0.0 numpy-1.24.4 objsize-0.6.1 orjson-3.9.13 pandas-2.2.0 pemja-0.4.1 proto-plus-1.23.0 protobuf-4.23.4 py4j-0.10.9.7 pyarrow-11.0.0 pydot-1.4.2 pymongo-4.6.1 pyparsing-3.1.1 pytest-7.4.4 python-dateutil-2.8.2 pytz-2024.1 regex-2023.12.25 ruamel.yaml-0.18.5 ruamel.yaml.clib-0.2.8 tomli-2.0.1 typing-extensions-4.9.0 tzdata-2023.4 /home/runner/work/flink/flink/flink-python/dev/.conda/lib/python3.10/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /home/runner/work/flink/flink/flink-python/pyflink/fn_execution/table/window_aggregate_fast.pxd tree = Parsing.p_module(s, pxd, full_module_name) {code} Not sure whether that's the actual cause. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34360) GHA e2e test failure due to no space left on device error
Matthias Pohl created FLINK-34360: - Summary: GHA e2e test failure due to no space left on device error Key: FLINK-34360 URL: https://issues.apache.org/jira/browse/FLINK-34360 Project: Flink Issue Type: Bug Components: Tests Reporter: Matthias Pohl https://github.com/apache/flink/actions/runs/7763815214 {code} AdaptiveScheduler / E2E (group 2) Process completed with exit code 1. AdaptiveScheduler / E2E (group 2) You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 35 MB {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)