Re: Frequent Flink JM restarts due to Kube API server errors.

2024-02-05 Thread Yang Wang
This might be related with FLINK-28481, which is a bug in fabric8io k8s
client.

[1]. https://issues.apache.org/jira/browse/FLINK-28481

Best,
Yang

On Tue, Feb 6, 2024 at 12:30 PM Lavkesh Lahngir  wrote:

> Hi, Matthias, I was wondering if there are any timeout or heartbeat
> configurations for KubeHA available.
>
> Thanks.
>
> On Mon, 5 Feb 2024 at 8:58 PM, Matthias Pohl  .invalid>
> wrote:
>
> > That's stated in the Jira issue. I didn't have the time to investigate it
> > further.
> >
> > On Mon, Feb 5, 2024 at 1:55 PM Lavkesh Lahngir 
> wrote:
> >
> > > Hi Matthias,
> > > Thanks for the suggestion. Do we know which part of code caused this
> > issue
> > > and how it was fixed?
> > >
> > > Thanks!
> > >
> > > On Mon, 5 Feb 2024 at 18:06, Matthias Pohl  > > .invalid>
> > > wrote:
> > >
> > > > Hi Lavkesh,
> > > > FLINK-33998 [1] sounds quite similar to what you describe.
> > > >
> > > > The solution was to upgrade to Flink version 1.14.6. I didn't have
> the
> > > > capacity to look into the details considering that the mentioned
> Flink
> > > > version 1.14 is not officially supported by the community anymore
> and a
> > > fix
> > > > seems to have been provided with a newer version.
> > > >
> > > > Matthias
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-33998
> > > >
> > > > On Mon, Feb 5, 2024 at 6:18 AM Lavkesh Lahngir 
> > > wrote:
> > > >
> > > > > Hii, Few more details:
> > > > > We are running GKE version 1.27.7-gke.1121002.
> > > > > and using flink version 1.14.3.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > On Mon, 5 Feb 2024 at 12:05, Lavkesh Lahngir 
> > > wrote:
> > > > >
> > > > > > Hii All,
> > > > > >
> > > > > > We run a Flink operator on GKE, deploying one Flink job per job
> > > > manager.
> > > > > > We utilize
> > > > > >
> > > >
> > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> > > > > > for high availability. The JobManager employs config maps for
> > > > > checkpointing
> > > > > > and leader election. If, at any point, the Kube API server
> returns
> > an
> > > > > error
> > > > > > (5xx or 4xx), the JM pod is restarted. This occurrence is
> sporadic,
> > > > > > happening every 1-2 days for some jobs among the 400 running in
> the
> > > > same
> > > > > > cluster, each with its JobManager pod.
> > > > > >
> > > > > > What might be causing these errors from the Kube? One possibility
> > is
> > > > that
> > > > > > when the JM writes the config map and attempts to retrieve it
> > > > immediately
> > > > > > after, it could result in a 404 error.
> > > > > > Are there any configurations to increase heartbeat or timeouts
> that
> > > > might
> > > > > > be causing temporary disconnections from the Kube API server?
> > > > > >
> > > > > > Thank you!
> > > > > >
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (FLINK-34384) Release Testing: Verify FLINK-33735 Improve the exponential-delay restart-strategy

2024-02-05 Thread lincoln lee (Jira)
lincoln lee created FLINK-34384:
---

 Summary: Release Testing: Verify FLINK-33735 Improve the 
exponential-delay restart-strategy 
 Key: FLINK-34384
 URL: https://issues.apache.org/jira/browse/FLINK-34384
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.19.0
Reporter: lincoln lee
Assignee: Rui Fan
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34383) Modify the comment with incorrect syntax

2024-02-05 Thread li you (Jira)
li you created FLINK-34383:
--

 Summary: Modify the comment with incorrect syntax
 Key: FLINK-34383
 URL: https://issues.apache.org/jira/browse/FLINK-34383
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: li you


There is an error in the syntax of the comment for the class PermanentBlobCache



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34382) Release Testing: Verify FLINK-33625 Support System out and err to be redirected to LOG or discarded

2024-02-05 Thread lincoln lee (Jira)
lincoln lee created FLINK-34382:
---

 Summary: Release Testing: Verify FLINK-33625 Support System out 
and err to be redirected to LOG or discarded
 Key: FLINK-34382
 URL: https://issues.apache.org/jira/browse/FLINK-34382
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Configuration
Affects Versions: 1.19.0
Reporter: lincoln lee
Assignee: Rui Fan
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34381) `RelDataType#getFullTypeString` should be used to print in `RelTreeWriterImpl` if `withRowType` is true instead of `Object#toString`

2024-02-05 Thread xuyang (Jira)
xuyang created FLINK-34381:
--

 Summary: `RelDataType#getFullTypeString` should be used to print 
in `RelTreeWriterImpl` if `withRowType` is true instead of `Object#toString`
 Key: FLINK-34381
 URL: https://issues.apache.org/jira/browse/FLINK-34381
 Project: Flink
  Issue Type: Technical Debt
  Components: Table SQL / Planner
Affects Versions: 1.9.0, 1.19.0
Reporter: xuyang


Currently `RelTreeWriterImpl` use `rel.getRowType.toString` to print row type.
{code:java}
if (withRowType) {
  s.append(", rowType=[").append(rel.getRowType.toString).append("]")
} {code}
However, looking deeper into the code, we should use 
`rel.getRowType.getFullTypeString` to print the row type. Because the function 
`getFullTypeString` will print richer type information such as `nullable`. Take 
`StructuredRelDataType` as an example, the diff is below:
{code:java}
// source
util.addTableSource[(Long, Int, String)]("MyTable", 'a, 'b, 'c)

// sql
SELECT a, c FROM MyTable

// rel.getRowType.toString
RecordType(BIGINT a, VARCHAR(2147483647) c)

// rel.getRowType.getFullTypeString
RecordType(BIGINT a, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" c) NOT 
NULL{code}
   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: 退订

2024-02-05 Thread Hang Ruan
Hi,

请分别发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 和
dev-unsubscr...@flink.apache.org 地址来取消订阅来自 user...@flink.apache.org
 和 dev@flink.apache.org 邮件组的邮件,你可以参考[1][2] 管理你的邮件订阅。
Please send email to user-zh-unsubscr...@flink.apache.org and
dev-unsubscr...@flink.apache.org if you want to unsubscribe the mail from
user...@flink.apache.org  and dev@flink.apache.org,
and you can refer [1][2] for more details.

Best,
Hang

[1]
https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8
[2] https://flink.apache.org/community.html#mailing-lists

12260035 <12260...@qq.com.invalid> 于2024年2月6日周二 14:17写道:

> 退订
>
>
>
>
> --原始邮件--
> 发件人:
>   "dev"
> <
> qr7...@163.com;
> 发送时间:2024年1月19日(星期五) 下午3:36
> 收件人:"dev"
> 主题:退订
>
>
>
> 退订


[jira] [Created] (FLINK-34380) Strange RowKind and records about intermediate output when using minibatch join

2024-02-05 Thread xuyang (Jira)
xuyang created FLINK-34380:
--

 Summary: Strange RowKind and records about intermediate output 
when using minibatch join
 Key: FLINK-34380
 URL: https://issues.apache.org/jira/browse/FLINK-34380
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.19.0
Reporter: xuyang
 Fix For: 1.19.0


{code:java}
// Add it in CalcItCase

@Test
  def test(): Unit = {
env.setParallelism(1)
val rows = Seq(
  changelogRow("+I", java.lang.Integer.valueOf(1), "1"),
  changelogRow("-U", java.lang.Integer.valueOf(1), "1"),
  changelogRow("+U", java.lang.Integer.valueOf(1), "99"),
  changelogRow("-D", java.lang.Integer.valueOf(1), "99")
)
val dataId = TestValuesTableFactory.registerData(rows)

val ddl =
  s"""
 |CREATE TABLE t1 (
 |  a int,
 |  b string
 |) WITH (
 |  'connector' = 'values',
 |  'data-id' = '$dataId',
 |  'bounded' = 'false'
 |)
   """.stripMargin
tEnv.executeSql(ddl)

val ddl2 =
  s"""
 |CREATE TABLE t2 (
 |  a int,
 |  b string
 |) WITH (
 |  'connector' = 'values',
 |  'data-id' = '$dataId',
 |  'bounded' = 'false'
 |)
   """.stripMargin
tEnv.executeSql(ddl2)

tEnv.getConfig.getConfiguration
  .set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_ENABLED, 
Boolean.box(true))
tEnv.getConfig.getConfiguration
  .set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_ALLOW_LATENCY, 
Duration.ofSeconds(5))
tEnv.getConfig.getConfiguration
  .set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_SIZE, Long.box(3L))

println(tEnv.sqlQuery("SELECT * from t1 join t2 on t1.a = t2.a").explain())

tEnv.executeSql("SELECT * from t1 join t2 on t1.a = t2.a").print()
  } {code}
Output:
{code:java}
++-+-+-+-+
| op |           a |               b |          a0 |      b0 |
++-+-+-+-+
| +U |           1 |               1 |           1 |      99 |
| +U |           1 |              99 |           1 |      99 |
| -U |           1 |               1 |           1 |      99 |
| -D |           1 |              99 |           1 |      99 |
++-+-+-+-+{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34379) table.optimizer.dynamic-filtering.enabled lead to OutOfMemoryError

2024-02-05 Thread zhu (Jira)
zhu created FLINK-34379:
---

 Summary: table.optimizer.dynamic-filtering.enabled lead to 
OutOfMemoryError
 Key: FLINK-34379
 URL: https://issues.apache.org/jira/browse/FLINK-34379
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.18.1, 1.17.2
 Environment: 1.17.1
Reporter: zhu


When using batch computing, I union all about 50 tables and then join other 
table. When compiling the execution plan, 
there throws OutOfMemoryError: Java heap space, which was no problem in  
1.15.2. However, both 1.17.2 and 1.18.1 all throws same errors,This causes 
jobmanager to restart. Currently,it has been found that this is caused by 
table.optimizer.dynamic-filtering.enabled, which defaults is true,When I set 
table.optimizer.dynamic-filtering.enabled to false, it can be compiled and 
executed normally

code

TableEnvironment.create(EnvironmentSettings.newInstance()
.withConfiguration(configuration)
.inBatchMode().build())

sql=select att,filename,'table0' as mo_name from table0 UNION All select 
att,filename,'table1' as mo_name from table1 UNION All select 
att,filename,'table2' as mo_name from table2 UNION All select 
att,filename,'table3' as mo_name from table3 UNION All select 
att,filename,'table4' as mo_name from table4 UNION All select 
att,filename,'table5' as mo_name from table5 UNION All select 
att,filename,'table6' as mo_name from table6 UNION All select 
att,filename,'table7' as mo_name from table7 UNION All select 
att,filename,'table8' as mo_name from table8 UNION All select 
att,filename,'table9' as mo_name from table9 UNION All select 
att,filename,'table10' as mo_name from table10 UNION All select 
att,filename,'table11' as mo_name from table11 UNION All select 
att,filename,'table12' as mo_name from table12 UNION All select 
att,filename,'table13' as mo_name from table13 UNION All select 
att,filename,'table14' as mo_name from table14 UNION All select 
att,filename,'table15' as mo_name from table15 UNION All select 
att,filename,'table16' as mo_name from table16 UNION All select 
att,filename,'table17' as mo_name from table17 UNION All select 
att,filename,'table18' as mo_name from table18 UNION All select 
att,filename,'table19' as mo_name from table19 UNION All select 
att,filename,'table20' as mo_name from table20 UNION All select 
att,filename,'table21' as mo_name from table21 UNION All select 
att,filename,'table22' as mo_name from table22 UNION All select 
att,filename,'table23' as mo_name from table23 UNION All select 
att,filename,'table24' as mo_name from table24 UNION All select 
att,filename,'table25' as mo_name from table25 UNION All select 
att,filename,'table26' as mo_name from table26 UNION All select 
att,filename,'table27' as mo_name from table27 UNION All select 
att,filename,'table28' as mo_name from table28 UNION All select 
att,filename,'table29' as mo_name from table29 UNION All select 
att,filename,'table30' as mo_name from table30 UNION All select 
att,filename,'table31' as mo_name from table31 UNION All select 
att,filename,'table32' as mo_name from table32 UNION All select 
att,filename,'table33' as mo_name from table33 UNION All select 
att,filename,'table34' as mo_name from table34 UNION All select 
att,filename,'table35' as mo_name from table35 UNION All select 
att,filename,'table36' as mo_name from table36 UNION All select 
att,filename,'table37' as mo_name from table37 UNION All select 
att,filename,'table38' as mo_name from table38 UNION All select 
att,filename,'table39' as mo_name from table39 UNION All select 
att,filename,'table40' as mo_name from table40 UNION All select 
att,filename,'table41' as mo_name from table41 UNION All select 
att,filename,'table42' as mo_name from table42 UNION All select 
att,filename,'table43' as mo_name from table43 UNION All select 
att,filename,'table44' as mo_name from table44 UNION All select 
att,filename,'table45' as mo_name from table45 UNION All select 
att,filename,'table46' as mo_name from table46 UNION All select 
att,filename,'table47' as mo_name from table47 UNION All select 
att,filename,'table48' as mo_name from table48 UNION All select 
att,filename,'table49' as mo_name from table49 UNION All select 
att,filename,'table50' as mo_name from table50 UNION All select 
att,filename,'table51' as mo_name from table51 UNION All select 
att,filename,'table52' as mo_name from table52 UNION All select 
att,filename,'table53' as mo_name from table53;

Table allUnionTable = tEnv.sqlQuery(sql);
Table res =
allUnionTable.join(
allUnionTable
.groupBy(col("att"))
.select(col("att"), col("att").count().as(COUNT_NAME))
.filter(col(COUNT_NAME).isGreater(1))
.select(col(key).as("l_key")),
col(key).isEqual(col("l_key"))
);

res.printExplain(ExplainDetail.JSON_EXECUTION_PLAN);

 

 

Exception trace

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3181)
    at 

Re: Frequent Flink JM restarts due to Kube API server errors.

2024-02-05 Thread Lavkesh Lahngir
Hi, Matthias, I was wondering if there are any timeout or heartbeat
configurations for KubeHA available.

Thanks.

On Mon, 5 Feb 2024 at 8:58 PM, Matthias Pohl 
wrote:

> That's stated in the Jira issue. I didn't have the time to investigate it
> further.
>
> On Mon, Feb 5, 2024 at 1:55 PM Lavkesh Lahngir  wrote:
>
> > Hi Matthias,
> > Thanks for the suggestion. Do we know which part of code caused this
> issue
> > and how it was fixed?
> >
> > Thanks!
> >
> > On Mon, 5 Feb 2024 at 18:06, Matthias Pohl  > .invalid>
> > wrote:
> >
> > > Hi Lavkesh,
> > > FLINK-33998 [1] sounds quite similar to what you describe.
> > >
> > > The solution was to upgrade to Flink version 1.14.6. I didn't have the
> > > capacity to look into the details considering that the mentioned Flink
> > > version 1.14 is not officially supported by the community anymore and a
> > fix
> > > seems to have been provided with a newer version.
> > >
> > > Matthias
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-33998
> > >
> > > On Mon, Feb 5, 2024 at 6:18 AM Lavkesh Lahngir 
> > wrote:
> > >
> > > > Hii, Few more details:
> > > > We are running GKE version 1.27.7-gke.1121002.
> > > > and using flink version 1.14.3.
> > > >
> > > > Thanks!
> > > >
> > > > On Mon, 5 Feb 2024 at 12:05, Lavkesh Lahngir 
> > wrote:
> > > >
> > > > > Hii All,
> > > > >
> > > > > We run a Flink operator on GKE, deploying one Flink job per job
> > > manager.
> > > > > We utilize
> > > > >
> > >
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> > > > > for high availability. The JobManager employs config maps for
> > > > checkpointing
> > > > > and leader election. If, at any point, the Kube API server returns
> an
> > > > error
> > > > > (5xx or 4xx), the JM pod is restarted. This occurrence is sporadic,
> > > > > happening every 1-2 days for some jobs among the 400 running in the
> > > same
> > > > > cluster, each with its JobManager pod.
> > > > >
> > > > > What might be causing these errors from the Kube? One possibility
> is
> > > that
> > > > > when the JM writes the config map and attempts to retrieve it
> > > immediately
> > > > > after, it could result in a 404 error.
> > > > > Are there any configurations to increase heartbeat or timeouts that
> > > might
> > > > > be causing temporary disconnections from the Kube API server?
> > > > >
> > > > > Thank you!
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (FLINK-34378) Minibatch join disrupted the original order of input records

2024-02-05 Thread xuyang (Jira)
xuyang created FLINK-34378:
--

 Summary: Minibatch join disrupted the original order of input 
records
 Key: FLINK-34378
 URL: https://issues.apache.org/jira/browse/FLINK-34378
 Project: Flink
  Issue Type: Technical Debt
Reporter: xuyang


I'm not sure if it's a bug, the following case can re-produce this bug.
{code:java}
// add it in CalcITCase
@Test
def test(): Unit = {
  env.setParallelism(1)
  val rows = Seq(
row(1, "1"),
row(2, "2"),
row(3, "3"),
row(4, "4"),
row(5, "5"),
row(6, "6"),
row(7, "7"),
row(8, "8"))
  val dataId = TestValuesTableFactory.registerData(rows)

  val ddl =
s"""
   |CREATE TABLE t1 (
   |  a int,
   |  b string
   |) WITH (
   |  'connector' = 'values',
   |  'data-id' = '$dataId',
   |  'bounded' = 'false'
   |)
 """.stripMargin
  tEnv.executeSql(ddl)

  val ddl2 =
s"""
   |CREATE TABLE t2 (
   |  a int,
   |  b string
   |) WITH (
   |  'connector' = 'values',
   |  'data-id' = '$dataId',
   |  'bounded' = 'false'
   |)
 """.stripMargin
  tEnv.executeSql(ddl2)

  tEnv.getConfig.getConfiguration
.set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_ENABLED, Boolean.box(true))
  tEnv.getConfig.getConfiguration
.set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_ALLOW_LATENCY, 
Duration.ofSeconds(5))
  tEnv.getConfig.getConfiguration
.set(ExecutionConfigOptions.TABLE_EXEC_MINIBATCH_SIZE, Long.box(20L))

  println(tEnv.sqlQuery("SELECT * from t1 join t2 on t1.a = t2.a").explain())

  tEnv.executeSql("SELECT * from t1 join t2 on t1.a = t2.a").print()
}{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34377) Release Testing : Verify FLINK-33297 Support standard YAML for FLINK configuration

2024-02-05 Thread Junrui Li (Jira)
Junrui Li created FLINK-34377:
-

 Summary: Release Testing : Verify FLINK-33297 Support standard 
YAML for FLINK configuration
 Key: FLINK-34377
 URL: https://issues.apache.org/jira/browse/FLINK-34377
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Configuration
Affects Versions: 1.19.0
Reporter: Junrui Li
Assignee: Junrui Li
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34376) FLINK SQL SUM() causes a precision error

2024-02-05 Thread Fangliang Liu (Jira)
Fangliang Liu created FLINK-34376:
-

 Summary: FLINK SQL SUM() causes a precision error
 Key: FLINK-34376
 URL: https://issues.apache.org/jira/browse/FLINK-34376
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.18.1, 1.14.3
Reporter: Fangliang Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34375) Complete work for syntax `DESCRIBE EXTENDED tableName`

2024-02-05 Thread xuyang (Jira)
xuyang created FLINK-34375:
--

 Summary: Complete work for syntax `DESCRIBE EXTENDED tableName`
 Key: FLINK-34375
 URL: https://issues.apache.org/jira/browse/FLINK-34375
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Affects Versions: 1.10.0, 1.19.0
Reporter: xuyang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34374) Complete work for syntax `DESCRIBE EXTENDED DATABASE databaseName`

2024-02-05 Thread xuyang (Jira)
xuyang created FLINK-34374:
--

 Summary: Complete work for syntax `DESCRIBE EXTENDED DATABASE 
databaseName`
 Key: FLINK-34374
 URL: https://issues.apache.org/jira/browse/FLINK-34374
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Affects Versions: 1.10.0, 1.19.0
Reporter: xuyang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34373) Complete work for syntax `DESCRIBE DATABASE databaseName`

2024-02-05 Thread xuyang (Jira)
xuyang created FLINK-34373:
--

 Summary: Complete work for syntax `DESCRIBE DATABASE databaseName`
 Key: FLINK-34373
 URL: https://issues.apache.org/jira/browse/FLINK-34373
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Affects Versions: 1.10.0, 1.19.0
Reporter: xuyang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34372) Complete work for syntax `DESCRIBE CATALOG catalogName`

2024-02-05 Thread xuyang (Jira)
xuyang created FLINK-34372:
--

 Summary: Complete work for syntax `DESCRIBE CATALOG catalogName`
 Key: FLINK-34372
 URL: https://issues.apache.org/jira/browse/FLINK-34372
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Affects Versions: 1.10.0, 1.19.0
Reporter: xuyang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34371) FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment

2024-02-05 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-34371:


 Summary: FLIP-331: Support EndOfStreamTrigger and 
isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment
 Key: FLINK-34371
 URL: https://issues.apache.org/jira/browse/FLINK-34371
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing, Runtime / Task
Reporter: Yunfeng Zhou


This is an umbrella ticket for FLIP-331.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Flink aggregate push down is not user-friendly.

2024-02-05 Thread yh z
Hi, Devs,
When I try to develop a new connector, which support aggregate push
down. I found that  Flink aggregate pushdown is not user-friendly. The
`AggregateExpression` passed to the connector by
`SupportsAggregatePushDown#applyAggregates` doesn't provide access to
subclasses. This makes it impossible for me to directly determine the type
of agg operator unless I import the planner module, but this is discouraged
and considered a heavyweight action.
Because I cann't access the subclasses of
`AggregateExpression#FunctionDefinition`, like `CountAggFunction` and am
unable to import the planner module, I'm forced to match the agg operators
using hack way that match fully qualified class names like the following
code:

FunctionDefinition functionDefinition =
aggregateExpressions.get(0).getFunctionDefinition();
if (!(functionDefinition
.getClass()
.getCanonicalName()
.equals(

"org.apache.flink.table.planner.functions.aggfunctions.CountAggFunction")
|| functionDefinition
.getClass()
.getCanonicalName()
.equals(

"org.apache.flink.table.planner.functions.aggfunctions.Count1AggFunction")))
{
return false;
}


I think the problem may also exist with other SupportsXxxPushDown. Should
we consider which planner classes can be exposed to developers to
facilitate their use?

Yours,
Swuferhong (Yunhong Zheng).


[jira] [Created] (FLINK-34370) [Umbrella] Complete work about enhanced Flink SQL DDL

2024-02-05 Thread xuyang (Jira)
xuyang created FLINK-34370:
--

 Summary: [Umbrella] Complete work about enhanced Flink SQL DDL 
 Key: FLINK-34370
 URL: https://issues.apache.org/jira/browse/FLINK-34370
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Affects Versions: 1.10.0
Reporter: xuyang


This is a umbrella Jira for completing work for 
[Flip-69](https://cwiki.apache.org/confluence/display/FLINK/FLIP-69%3A+Flink+SQL+DDL+Enhancement)
 about enhanced Flink SQL DDL.

With [FLINK-34254](https://issues.apache.org/jira/browse/FLINK-34254), it seems 
that this flip is not finished yet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


jira permission request

2024-02-05 Thread 李游
HI,
I want to contribute to Apache flink.
Would you please give me the contributor permission?
My jira id is lxliyou.
Thanks.

Re: [DISCUSS] FLIP-417: Expose JobManagerOperatorMetrics via REST API

2024-02-05 Thread Mason Chen
Hi all,

Alex (cc'ed) and I discussed offline about the endpoint path and making it
easier to use. Focusing on the main clients of the Flink Metrics REST API,
the Flink UI and Flink Kubernetes Operator, we determined that exposing the
operator id is unnecessary. There are few considerations:

1. Integrate Flink UI to show source coordinator metrics

The Flink UI currently doesn't expose per operator metrics, only task
metrics. For operator metrics, metric reporters provide that extensibility
to expose operator granularity metrics. So, the operator id is unnecessary
for this case.

2. Integrate Flink Kubernetes Operator to read autoscaling metrics from
source enumerator

The K8s operator currently reads vertex metrics from the Flink Metric REST
API to perform autoscaling. In this situation, the operator id is
unnecessary as well and in fact, a vertex can only contain 1 source.
Therefore, we don't need a parameter for the operator id.

Due to these requirements, I recommend to use the path:

`/jobs//vertices//coordinator-metrics`.
`coordinator-metrics` makes it obvious that the metrics are from the
OperatorCoordinator, not to be confused with something like
`operator-metrics` (which operator is it?).

In addition, I also propose to fix the metric scope [1]:

   - metrics.scope.operator
  - Default:
  .taskmanager
  - Applied to all metrics that were scoped to an operator.

The default should not contain the subtask index, since the coordinator
does not correspond to a subtask index. In addition, this configuration
could be renamed to `metrics.scope.coordinator` since `operator` is vague.
While we will point to the new config in the docs, backward compatibility
will be provided for the old config key.

I have updated the FLIP doc with these details.

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#system-scope

Best,
Mason

On Tue, Jan 23, 2024 at 12:13 PM Mason Chen  wrote:

> Hi all,
>
> I hope that everyone has had sufficient time to review the discussion and
> the updates on the FLIP doc. If there are no objections, I'll start a
> voting thread in 2 days.
>
> Best,
> Mason
>
> On Thu, Jan 18, 2024 at 2:39 PM Mason Chen  wrote:
>
>> Hi Lijie,
>>
>> That's also a possibility but I would prefer to keep it consistent with
>> how the existing metric APIs are used.
>>
>> For example, in the current metric APIs [1], there is no way to figure
>> out the vertexid and subtaskindex without getting the job graph from
>> `/jobs//plan` and correspondingly there are no APIs to return a map
>> of all metrics for every vertex and to return a map of all metrics for
>> every subtask. Essentially, the plan API is already required to use the
>> finer-grained metric apis.
>>
>> In addition, keeping the design similar lends itself better for the
>> implementation. The metric handler utilities [2] assume a
>> ComponentMetricStore is returned rather than a Map.
>>
>> I've updated the FLIP doc (
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-417%3A+Expose+JobManagerOperatorMetrics+via+REST+API)
>> with our discussion so far.
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#rest-api-integration
>> [2]
>> https://github.com/apache/flink/blob/a41229b24d82e8c561350c42d8a98dfb865c3f69/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/metrics/AbstractMetricsHandler.java#L109
>>
>> Best,
>> Mason
>>
>> On Wed, Jan 17, 2024 at 8:28 AM Lijie Wang 
>> wrote:
>>
>>> Hi Mason,
>>>
>>> Thanks for driving the discussion. +1 for the proposal.
>>>
>>> How about we return all operator metrics in a vertex? (the response is a
>>> map of operatorId/operatorName -> operator-metrics). Correspondingly, the
>>> url may be changed to /jobs//vertices//operator-metrics
>>>
>>> In this way, users can skip the step of obtaining the operator id.
>>>
>>> Best,
>>> Lijie
>>>
>>> Hang Ruan  于2024年1月17日周三 10:31写道:
>>>
>>> > Hi, Mason.
>>> >
>>> > The field `operatorName` in JobManagerOperatorQueryScopeInfo refers to
>>> the
>>> > fields in OperatorQueryScopeInfo and chooses the operatorName instead
>>> of
>>> > OperatorID.
>>> > It is fine by my side to change from opertorName to operatorID in this
>>> > FLIP.
>>> >
>>> > Best,
>>> > Hang
>>> >
>>> > Mason Chen  于2024年1月17日周三 09:39写道:
>>> >
>>> > > Hi Xuyang and Hang,
>>> > >
>>> > > Thanks for your support and feedback! See my responses below:
>>> > >
>>> > > 1. IIRC, in a sense, operator ID and vertex ID are the same thing.
>>> The
>>> > > > operator ID can
>>> > > > be converted from the vertex ID[1]. Therefore, it is somewhat
>>> strange
>>> > to
>>> > > > have both vertex
>>> > > > ID and operator ID in a single URL.
>>> > > >
>>> > > I think Hang explained it perfectly. Essentially, a vertix may
>>> contain
>>> > one
>>> > > or more operators so the operator ID is required to distinguish this
>>> > case.
>>> > >
>>> > > 2. If I misunderstood the semantics of operator IDs here, then what
>>> is
>>> > 

[jira] [Created] (FLINK-34369) Elasticsearch connector supports SSL provider

2024-02-05 Thread Mingliang Liu (Jira)
Mingliang Liu created FLINK-34369:
-

 Summary: Elasticsearch connector supports SSL provider
 Key: FLINK-34369
 URL: https://issues.apache.org/jira/browse/FLINK-34369
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / ElasticSearch
Affects Versions: 1.17.1
Reporter: Mingliang Liu


The current Flink ElasticSearch connector does not support SSL option, causing 
issues connecting to secure ES clusters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34368) Update GCS filesystems to latest available version v3.0

2024-02-05 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-34368:
--

 Summary: Update GCS filesystems to latest available version v3.0
 Key: FLINK-34368
 URL: https://issues.apache.org/jira/browse/FLINK-34368
 Project: Flink
  Issue Type: Technical Debt
  Components: FileSystems
Reporter: Martijn Visser
Assignee: Martijn Visser


Update to 
https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v3.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Kubernetes Operator 1.8.0 release planning

2024-02-05 Thread Gyula Fóra
Hi all!

I would like to kick off the release planning for the operator 1.8.0
release. The last operator release was November 22 last year. Since then we
have added a number of fixes and improvements to both the operator and the
autoscaler logic.

There are a few outstanding PRs currently, including some larger features
for the Autoscaler (JDBC event handler, Heap tuning), we have to make a
decision regarding those as well whether to include in the release or
not. @Maximilian
Michels  , @Rui Fan <1996fan...@gmail.com> what's your take
regarding those PRs? I generally like to be a bit more conservative with
large new features to avoid introducing last minute instabilities.

My proposal would be to aim for the end of this week as the freeze date
(Feb 9) and then we can prepare RC1 on monday.

I am happy to volunteer as a release manager but I am of course open to
working together with someone on this.

What do you think?

Cheers,
Gyula


[jira] [Created] (FLINK-34367) Release Testing Instructions: Verify FLINK-34027 AsyncScalarFunction for asynchronous scalar function support

2024-02-05 Thread lincoln lee (Jira)
lincoln lee created FLINK-34367:
---

 Summary: Release Testing Instructions: Verify FLINK-34027 
AsyncScalarFunction for asynchronous scalar function support
 Key: FLINK-34367
 URL: https://issues.apache.org/jira/browse/FLINK-34367
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / REST, Runtime / Web Frontend
Affects Versions: 1.19.0
Reporter: lincoln lee
Assignee: Yu Chen
 Fix For: 1.19.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34366) Add support to group rows by column ordinals

2024-02-05 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-34366:
--

 Summary: Add support to group rows by column ordinals
 Key: FLINK-34366
 URL: https://issues.apache.org/jira/browse/FLINK-34366
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: Martijn Visser


Reference: BigQuery 
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#group_by_col_ordinals

The GROUP BY clause can refer to expression names in the SELECT list. The GROUP 
BY clause also allows ordinal references to expressions in the SELECT list, 
using integer values. 1 refers to the first value in the SELECT list, 2 the 
second, and so forth. The value list can combine ordinals and value names. The 
following queries are equivalent:

{code:sql}
WITH PlayerStats AS (
  SELECT 'Adams' as LastName, 'Noam' as FirstName, 3 as PointsScored UNION ALL
  SELECT 'Buchanan', 'Jie', 0 UNION ALL
  SELECT 'Coolidge', 'Kiran', 1 UNION ALL
  SELECT 'Adams', 'Noam', 4 UNION ALL
  SELECT 'Buchanan', 'Jie', 13)
SELECT SUM(PointsScored) AS total_points, LastName, FirstName
FROM PlayerStats
GROUP BY LastName, FirstName;

/*--+--+---+
 | total_points | LastName | FirstName |
 +--+--+---+
 | 7| Adams| Noam  |
 | 13   | Buchanan | Jie   |
 | 1| Coolidge | Kiran |
 +--+--+---*/
{code}

{code:sql}
WITH PlayerStats AS (
  SELECT 'Adams' as LastName, 'Noam' as FirstName, 3 as PointsScored UNION ALL
  SELECT 'Buchanan', 'Jie', 0 UNION ALL
  SELECT 'Coolidge', 'Kiran', 1 UNION ALL
  SELECT 'Adams', 'Noam', 4 UNION ALL
  SELECT 'Buchanan', 'Jie', 13)
SELECT SUM(PointsScored) AS total_points, LastName, FirstName
FROM PlayerStats
GROUP BY 2, 3;

/*--+--+---+
 | total_points | LastName | FirstName |
 +--+--+---+
 | 7| Adams| Noam  |
 | 13   | Buchanan | Jie   |
 | 1| Coolidge | Kiran |
 +--+--+---*/
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34365) [docs] Delete repeated pages in Chinese Flink website and correct the Paimon url

2024-02-05 Thread Waterking (Jira)
Waterking created FLINK-34365:
-

 Summary: [docs] Delete repeated pages in Chinese Flink website and 
correct the Paimon url
 Key: FLINK-34365
 URL: https://issues.apache.org/jira/browse/FLINK-34365
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Waterking
 Attachments: 微信截图_20240205214854.png

The "教程" column on the [Flink 
中文网|https://flink.apache.org/zh/how-to-contribute/contribute-documentation/] 
currently has two "[With Paimon(incubating) (formerly Flink Table 
Store)|https://paimon.apache.org/docs/master/engines/flink/%22];.

Therefore, I delete one for brevity.

Also, the current link is wrong and I correct it with this link "[With 
Paimon(incubating) (formerly Flink Table 
Store)|https://paimon.apache.org/docs/master/engines/flink];



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Frequent Flink JM restarts due to Kube API server errors.

2024-02-05 Thread Matthias Pohl
That's stated in the Jira issue. I didn't have the time to investigate it
further.

On Mon, Feb 5, 2024 at 1:55 PM Lavkesh Lahngir  wrote:

> Hi Matthias,
> Thanks for the suggestion. Do we know which part of code caused this issue
> and how it was fixed?
>
> Thanks!
>
> On Mon, 5 Feb 2024 at 18:06, Matthias Pohl  .invalid>
> wrote:
>
> > Hi Lavkesh,
> > FLINK-33998 [1] sounds quite similar to what you describe.
> >
> > The solution was to upgrade to Flink version 1.14.6. I didn't have the
> > capacity to look into the details considering that the mentioned Flink
> > version 1.14 is not officially supported by the community anymore and a
> fix
> > seems to have been provided with a newer version.
> >
> > Matthias
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-33998
> >
> > On Mon, Feb 5, 2024 at 6:18 AM Lavkesh Lahngir 
> wrote:
> >
> > > Hii, Few more details:
> > > We are running GKE version 1.27.7-gke.1121002.
> > > and using flink version 1.14.3.
> > >
> > > Thanks!
> > >
> > > On Mon, 5 Feb 2024 at 12:05, Lavkesh Lahngir 
> wrote:
> > >
> > > > Hii All,
> > > >
> > > > We run a Flink operator on GKE, deploying one Flink job per job
> > manager.
> > > > We utilize
> > > >
> > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> > > > for high availability. The JobManager employs config maps for
> > > checkpointing
> > > > and leader election. If, at any point, the Kube API server returns an
> > > error
> > > > (5xx or 4xx), the JM pod is restarted. This occurrence is sporadic,
> > > > happening every 1-2 days for some jobs among the 400 running in the
> > same
> > > > cluster, each with its JobManager pod.
> > > >
> > > > What might be causing these errors from the Kube? One possibility is
> > that
> > > > when the JM writes the config map and attempts to retrieve it
> > immediately
> > > > after, it could result in a 404 error.
> > > > Are there any configurations to increase heartbeat or timeouts that
> > might
> > > > be causing temporary disconnections from the Kube API server?
> > > >
> > > > Thank you!
> > > >
> > >
> >
>


Re: Frequent Flink JM restarts due to Kube API server errors.

2024-02-05 Thread Lavkesh Lahngir
Hi Matthias,
Thanks for the suggestion. Do we know which part of code caused this issue
and how it was fixed?

Thanks!

On Mon, 5 Feb 2024 at 18:06, Matthias Pohl 
wrote:

> Hi Lavkesh,
> FLINK-33998 [1] sounds quite similar to what you describe.
>
> The solution was to upgrade to Flink version 1.14.6. I didn't have the
> capacity to look into the details considering that the mentioned Flink
> version 1.14 is not officially supported by the community anymore and a fix
> seems to have been provided with a newer version.
>
> Matthias
>
> [1] https://issues.apache.org/jira/browse/FLINK-33998
>
> On Mon, Feb 5, 2024 at 6:18 AM Lavkesh Lahngir  wrote:
>
> > Hii, Few more details:
> > We are running GKE version 1.27.7-gke.1121002.
> > and using flink version 1.14.3.
> >
> > Thanks!
> >
> > On Mon, 5 Feb 2024 at 12:05, Lavkesh Lahngir  wrote:
> >
> > > Hii All,
> > >
> > > We run a Flink operator on GKE, deploying one Flink job per job
> manager.
> > > We utilize
> > >
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> > > for high availability. The JobManager employs config maps for
> > checkpointing
> > > and leader election. If, at any point, the Kube API server returns an
> > error
> > > (5xx or 4xx), the JM pod is restarted. This occurrence is sporadic,
> > > happening every 1-2 days for some jobs among the 400 running in the
> same
> > > cluster, each with its JobManager pod.
> > >
> > > What might be causing these errors from the Kube? One possibility is
> that
> > > when the JM writes the config map and attempts to retrieve it
> immediately
> > > after, it could result in a 404 error.
> > > Are there any configurations to increase heartbeat or timeouts that
> might
> > > be causing temporary disconnections from the Kube API server?
> > >
> > > Thank you!
> > >
> >
>


[RESULT][VOTE] FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment

2024-02-05 Thread Xuannan Su
Hi devs,

I'm happy to announce that FLIP-331: Support EndOfStreamTrigger and
isOutputOnlyAfterEndOfStream operator attribute to optimize task
deployment [1] has been accepted with 6 approving votes (4 binding)
[2]:

- Xintong Song (binding)
- Rui Fan (binding)
- Weijie Guo (binding)
- Dong Lin (binding)
- Hang Ruan (non-binding)
- Yuxin Tan (non-binding)

There are no disapproving votes.

Thanks again to everyone who participated in the discussion and voting.

Best regards,
Xuannan

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-331%3A+Support+EndOfStreamTrigger+and+isOutputOnlyAfterEndOfStream+operator+attribute+to+optimize+task+deployment
[2] https://lists.apache.org/thread/oy9mdmh6gk8pc0wjdk5kg8dz3jllz9ow


Re: [VOTE] FLIP-331: Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute to optimize task deployment

2024-02-05 Thread Xuannan Su
Hi all,

Thank you all! Closing the vote. The result will be announced in a
separate email.

Best regards,
Xuannan

On Mon, Feb 5, 2024 at 12:11 PM Yuxin Tan  wrote:
>
> +1 (non-binding)
>
> Best,
> Yuxin
>
>
> Hang Ruan  于2024年2月5日周一 11:22写道:
>
> >  +1 (non-binding)
> >
> > Best,
> > Hang
> >
> > Dong Lin  于2024年2月5日周一 11:08写道:
> >
> > > Thanks for the FLIP.
> > >
> > > +1 (binding)
> > >
> > > Best,
> > > Dong
> > >
> > > On Wed, Jan 31, 2024 at 11:41 AM Xuannan Su 
> > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Thanks for all the feedback about the FLIP-331: Support
> > > > EndOfStreamTrigger and isOutputOnlyAfterEndOfStream operator attribute
> > > > to optimize task deployment [1] [2].
> > > >
> > > > I'd like to start a vote for it. The vote will be open for at least 72
> > > > hours(excluding weekends,until Feb 5, 12:00AM GMT) unless there is an
> > > > objection or an insufficient number of votes.
> > > >
> > > > [1]
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-331%3A+Support+EndOfStreamTrigger+and+isOutputOnlyAfterEndOfStream+operator+attribute+to+optimize+task+deployment
> > > > [2] https://lists.apache.org/thread/qq39rmg3f23ysx5m094s4c4cq0m4tdj5
> > > >
> > > >
> > > > Best,
> > > > Xuannan
> > > >
> > >
> >


Re: [VOTE] Release flink-connector-parent, release candidate #1

2024-02-05 Thread Etienne Chauchot

Hi,

I just got back from vacations. I'll close the vote thread and proceed 
to the release later this week.


Here is the ticket: https://issues.apache.org/jira/browse/FLINK-34364

Best

Etienne

Le 04/02/2024 à 05:06, Qingsheng Ren a écrit :

+1 (binding)

- Verified checksum and signature
- Verified pom content
- Built flink-connector-kafka from source with the parent pom in staging

Best,
Qingsheng

On Thu, Feb 1, 2024 at 11:19 PM Chesnay Schepler  wrote:


- checked source/maven pom contents

Please file a ticket to exclude tools/release from the source release.

+1 (binding)

On 29/01/2024 15:59, Maximilian Michels wrote:

- Inspected the source for licenses and corresponding headers
- Checksums and signature OK

+1 (binding)

On Tue, Jan 23, 2024 at 4:08 PM Etienne Chauchot

wrote:

Hi everyone,

Please review and vote on the release candidate #1 for the version
1.1.0, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org
[2], which are signed with the key with fingerprint
D1A76BA19D6294DD0033F6843A019F0B8DD163EA [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag v1.1.0-rc1 [5],
* website pull request listing the new release [6]

* confluence wiki: connector parent upgrade to version 1.1.0 that will
be validated after the artifact is released (there is no PR mechanism on
the wiki) [7]

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,

Etienne

[1]


https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353442

[2]


https://dist.apache.org/repos/dist/dev/flink/flink-connector-parent-1.1.0-rc1

[3]https://dist.apache.org/repos/dist/release/flink/KEYS
[4]

https://repository.apache.org/content/repositories/orgapacheflink-1698/

[5]


https://github.com/apache/flink-connector-shared-utils/releases/tag/v1.1.0-rc1

[6]https://github.com/apache/flink-web/pull/717

[7]


https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development



[jira] [Created] (FLINK-34364) stage_source_release.sh should exclude tools/release directory from the source release

2024-02-05 Thread Etienne Chauchot (Jira)
Etienne Chauchot created FLINK-34364:


 Summary: stage_source_release.sh should exclude tools/release 
directory from the source release
 Key: FLINK-34364
 URL: https://issues.apache.org/jira/browse/FLINK-34364
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Parent, Release System
Reporter: Etienne Chauchot


This directory is the mount point of the release utils repository and should be 
excluded from the source release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34363) Connectors release utils should allow to not specify flink version in stage_jars.sh

2024-02-05 Thread Etienne Chauchot (Jira)
Etienne Chauchot created FLINK-34363:


 Summary: Connectors release utils should allow to not specify 
flink version in stage_jars.sh
 Key: FLINK-34363
 URL: https://issues.apache.org/jira/browse/FLINK-34363
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Parent, Release System
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


For connectors-parent release, Flink version is not needed. The stage_jars.sh 
script should allow to specify only ${project_version} and not 
${project_version}-${flink_minor_version}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Frequent Flink JM restarts due to Kube API server errors.

2024-02-05 Thread Matthias Pohl
Hi Lavkesh,
FLINK-33998 [1] sounds quite similar to what you describe.

The solution was to upgrade to Flink version 1.14.6. I didn't have the
capacity to look into the details considering that the mentioned Flink
version 1.14 is not officially supported by the community anymore and a fix
seems to have been provided with a newer version.

Matthias

[1] https://issues.apache.org/jira/browse/FLINK-33998

On Mon, Feb 5, 2024 at 6:18 AM Lavkesh Lahngir  wrote:

> Hii, Few more details:
> We are running GKE version 1.27.7-gke.1121002.
> and using flink version 1.14.3.
>
> Thanks!
>
> On Mon, 5 Feb 2024 at 12:05, Lavkesh Lahngir  wrote:
>
> > Hii All,
> >
> > We run a Flink operator on GKE, deploying one Flink job per job manager.
> > We utilize
> > org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> > for high availability. The JobManager employs config maps for
> checkpointing
> > and leader election. If, at any point, the Kube API server returns an
> error
> > (5xx or 4xx), the JM pod is restarted. This occurrence is sporadic,
> > happening every 1-2 days for some jobs among the 400 running in the same
> > cluster, each with its JobManager pod.
> >
> > What might be causing these errors from the Kube? One possibility is that
> > when the JM writes the config map and attempts to retrieve it immediately
> > after, it could result in a 404 error.
> > Are there any configurations to increase heartbeat or timeouts that might
> > be causing temporary disconnections from the Kube API server?
> >
> > Thank you!
> >
>


Re: [ANNOUNCE] Flink 1.19 feature freeze & sync summary on 01/30/2024

2024-02-05 Thread Lincoln Lee
Thanks to Xintong for clarifying this!  @Rui Due to the rules of feature
freeze: "Only bug fixes and documentation changes are allowed."[1],  your
merge request has been discussed among 1.19 RMs, we also agree that do not
merge these PRs which are purely cleanup work and no more benefits for
users. Thank you for agreeing.

Also update the progress:

*- Cutting release branch*
We're still working on two blockers[2][3],  we'll decide when to cut 1.19
release branch after next release sync(Feb 6th) discussion

[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Release+Management
[2] https://issues.apache.org/jira/browse/FLINK-34337
[3] https://issues.apache.org/jira/browse/FLINK-34007

Best,
Yun, Jing, Martijn and Lincoln

Rui Fan <1996fan...@gmail.com> 于2024年2月5日周一 13:42写道:

> > My opinion would be to follow the process by default, and to make
> exceptions only if there're good reasons.
>
> Sounds make sense, I will merge it after 1.19 branch cutting.
>
> Thanks Xintong for the explanation! And sorry for bothering.
>
> Best,
> Rui
>
> On Mon, Feb 5, 2024 at 1:20 PM Xintong Song  wrote:
>
> > Thanks for the info.
> >
> > My opinion would be to follow the process by default, and to make
> > exceptions only if there're good reasons. From your description, it
> sounds
> > like merging the PR in or after 1.19 doesn't really make a difference. In
> > that case, I'd suggest to merge it for the next release (i.e. merge it
> into
> > master after the 1.19 branch cutting).
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Mon, Feb 5, 2024 at 12:52 PM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Thanks Xintong for the reply.
> > >
> > > They are Flink internal classes, and they are not used anymore.
> > > So I think they don't affect users, the benefit of removing them
> > > is to simplify Flink's code and reduce maintenance costs.
> > >
> > > If we just merge some user-related PRs recently, I could merge
> > > it after 1.19. Thank you again~
> > >
> > > Best,
> > > Rui
> > >
> > > On Mon, Feb 5, 2024 at 12:21 PM Xintong Song 
> > > wrote:
> > >
> > > > Hi Rui,
> > > >
> > > > Quick question, would there be any downside if this PR doesn't go
> into
> > > > 1.19? Or any user benefit from getting it into this release?
> > > >
> > > > Best,
> > > >
> > > > Xintong
> > > >
> > > >
> > > >
> > > > On Sun, Feb 4, 2024 at 10:16 AM Rui Fan <1996fan...@gmail.com>
> wrote:
> > > >
> > > > > Hi release managers,
> > > > >
> > > > > > The feature freeze of 1.19 has started now. That means that no
> new
> > > > > features
> > > > > > or improvements should now be merged into the master branch
> unless
> > > you
> > > > > ask
> > > > > > the release managers first, which has already been done for PRs,
> or
> > > > > pending
> > > > > > on CI to pass. Bug fixes and documentation PRs can still be
> merged.
> > > > >
> > > > > I'm curious whether the code cleanup could be merged?
> > > > > FLINK-31449[1] removed DeclarativeSlotManager related logic.
> > > > > Some other classes are not used anymore after FLINK-31449.
> > > > > FLINK-34345[2][3] will remove them.
> > > > >
> > > > > I checked these classes are not used in the master branch.
> > > > > And the PR[3] is reviewed for now, could I merge it now or
> > > > > after flink-1.19?
> > > > >
> > > > > Looking forward to your feedback, thanks~
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/FLINK-31449
> > > > > [2] https://issues.apache.org/jira/browse/FLINK-34345
> > > > > [3] https://github.com/apache/flink/pull/24257
> > > > >
> > > > > Best,
> > > > > Rui
> > > > >
> > > > > On Wed, Jan 31, 2024 at 5:20 PM Lincoln Lee <
> lincoln.8...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hi Matthias,
> > > > >>
> > > > >> Thanks for letting us know! After discussed with 1.19 release
> > > managers,
> > > > we
> > > > >> agreed to merge these pr.
> > > > >>
> > > > >> Thank you for the work on GHA workflows!
> > > > >>
> > > > >> Best,
> > > > >> Yun, Jing, Martijn and Lincoln
> > > > >>
> > > > >>
> > > > >> Matthias Pohl  于2024年1月30日周二 22:20写道:
> > > > >>
> > > > >> > Thanks for the update, Lincoln.
> > > > >> >
> > > > >> > fyi: I merged FLINK-32684 (deprecating AkkaOptions) [1] since we
> > > > agreed
> > > > >> in
> > > > >> > today's meeting that this change is still ok to go in.
> > > > >> >
> > > > >> > The beta version of the GitHub Actions workflows (FLIP-396 [2])
> > are
> > > > also
> > > > >> > finalized (see related PRs for basic CI [3], nightly master [4]
> > and
> > > > >> nightly
> > > > >> > scheduling [5]). I'd like to merge the changes before creating
> the
> > > > >> > release-1.19 branch. That would enable us to see whether we miss
> > > > >> anything
> > > > >> > in the GHA workflows setup when creating a new release branch.
> > > > >> >
> > > > >> > The changes are limited to a few CI scripts that are also used
> for
> > > > Azure
> > > > >> > Pipelines (see [3]). The majority of the changes are
> GHA-specific
> > > and
> > > > >> > 

[jira] [Created] (FLINK-34362) Add argument to reuse connector docs cache in setup_docs.sh to improve build times

2024-02-05 Thread Jane Chan (Jira)
Jane Chan created FLINK-34362:
-

 Summary: Add argument to reuse connector docs cache in 
setup_docs.sh to improve build times
 Key: FLINK-34362
 URL: https://issues.apache.org/jira/browse/FLINK-34362
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.19.0
Reporter: Jane Chan


Problem:
The current build process of Flink's documentation involves the `setup_docs.sh` 
script, which re-clones connector repositories every time the documentation is 
built. This operation is time-consuming, particularly for developers in regions 
with slower internet connections or facing network restrictions (like the Great 
Firewall in China). This results in a build process that can take an excessive 
amount of time, hindering developer productivity.

 

Proposal:

We could add a command-line argument (e.g., --use-doc-cache) to the 
`setup_docs.sh` script, which, when set, skips the cloning step if the 
connector repositories have already been cloned previously. As a result, 
developers can opt to use the cache when they do not require the latest 
versions of the connectors' documentation. This change will reduce build times 
significantly and improve the developer experience for those working on the 
documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34361) PyFlink end-to-end test fails in GHA

2024-02-05 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34361:
-

 Summary: PyFlink end-to-end test fails in GHA
 Key: FLINK-34361
 URL: https://issues.apache.org/jira/browse/FLINK-34361
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.19.0
Reporter: Matthias Pohl


"PyFlink end-to-end test" fails:
https://github.com/apache/flink/actions/runs/7778642859/job/21208811659#step:14:7420

The only error I could identify is:
{code}
ERROR: pip's dependency resolver does not currently take into account all the 
packages that are installed. This behaviour is the source of the following 
dependency conflicts.
conda 23.5.2 requires ruamel-yaml<0.18,>=0.11.14, but you have ruamel-yaml 
0.18.5 which is incompatible.
Feb 05 03:31:54 Successfully installed apache-beam-2.48.0 avro-python3-1.10.2 
cloudpickle-2.2.1 crcmod-1.7 cython-3.0.8 dill-0.3.1.1 dnspython-2.5.0 
docopt-0.6.2 exceptiongroup-1.2.0 fastavro-1.9.3 fasteners-0.19 
find-libpython-0.3.1 grpcio-1.50.0 grpcio-tools-1.50.0 hdfs-2.7.3 
httplib2-0.22.0 iniconfig-2.0.0 numpy-1.24.4 objsize-0.6.1 orjson-3.9.13 
pandas-2.2.0 pemja-0.4.1 proto-plus-1.23.0 protobuf-4.23.4 py4j-0.10.9.7 
pyarrow-11.0.0 pydot-1.4.2 pymongo-4.6.1 pyparsing-3.1.1 pytest-7.4.4 
python-dateutil-2.8.2 pytz-2024.1 regex-2023.12.25 ruamel.yaml-0.18.5 
ruamel.yaml.clib-0.2.8 tomli-2.0.1 typing-extensions-4.9.0 tzdata-2023.4
/home/runner/work/flink/flink/flink-python/dev/.conda/lib/python3.10/site-packages/Cython/Compiler/Main.py:381:
 FutureWarning: Cython directive 'language_level' not set, using '3str' for now 
(Py3). This has changed from earlier releases! File: 
/home/runner/work/flink/flink/flink-python/pyflink/fn_execution/table/window_aggregate_fast.pxd
  tree = Parsing.p_module(s, pxd, full_module_name)
{code}
Not sure whether that's the actual cause.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34360) GHA e2e test failure due to no space left on device error

2024-02-05 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34360:
-

 Summary: GHA e2e test failure due to no space left on device error
 Key: FLINK-34360
 URL: https://issues.apache.org/jira/browse/FLINK-34360
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias Pohl


https://github.com/apache/flink/actions/runs/7763815214

{code}
AdaptiveScheduler / E2E (group 2)
Process completed with exit code 1.
AdaptiveScheduler / E2E (group 2)
You are running out of disk space. The runner will stop working when the 
machine runs out of disk space. Free space left: 35 MB
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)