from:"Zhu Zhu \(JIRA\)"

[jira] [Closed] (FLINK-35731) Sink V2 operator is mistakenly assumed always to be parallelism configured

2024-07-08 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35731.
---
Fix Version/s: 1.20.0
   1.19.2
   Resolution: Fixed

1.20: 2b3a47d98bf06ffde3d6d7c850414cc07a47d3f2
1.19: 6caf576d582bf3b2fcb9c6ef71f46115c58ea59c

> Sink V2 operator is mistakenly assumed always to be parallelism configured
> --
>
> Key: FLINK-35731
> URL: https://issues.apache.org/jira/browse/FLINK-35731
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.18.1, 1.20.0, 1.19.1
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0, 1.19.2
>
>
> Currently, the Sink V2 operator is always marked as parallelism configured, 
> which prevents parallelism from being inferred. This can cause confusion for 
> users utilizing the Adaptive Batch scheduler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-35731) Sink V2 operator is mistakenly assumed always to be parallelism configured

2024-07-05 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35731:
---

Assignee: Junrui Li

> Sink V2 operator is mistakenly assumed always to be parallelism configured
> --
>
> Key: FLINK-35731
> URL: https://issues.apache.org/jira/browse/FLINK-35731
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the Sink V2 operator is always marked as parallelism configured, 
> which prevents parallelism from being inferred. This can cause confusion for 
> users utilizing the Adaptive Batch scheduler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-35731) Sink V2 operator is mistakenly assumed always to be parallelism configured

2024-07-05 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-35731:

Affects Version/s: 1.19.1
   1.18.1
   1.17.2
   1.20.0

> Sink V2 operator is mistakenly assumed always to be parallelism configured
> --
>
> Key: FLINK-35731
> URL: https://issues.apache.org/jira/browse/FLINK-35731
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.18.1, 1.20.0, 1.19.1
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the Sink V2 operator is always marked as parallelism configured, 
> which prevents parallelism from being inferred. This can cause confusion for 
> users utilizing the Adaptive Batch scheduler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35731) Sink V2 operator is mistakenly assumed always to be parallelism configured

2024-07-04 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-35731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863109#comment-17863109
 ] 

Zhu Zhu commented on FLINK-35731:
-

[~JunRuiLi] could you open PRs to backport this fix to 1.19 & 1.20?

> Sink V2 operator is mistakenly assumed always to be parallelism configured
> --
>
> Key: FLINK-35731
> URL: https://issues.apache.org/jira/browse/FLINK-35731
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the Sink V2 operator is always marked as parallelism configured, 
> which prevents parallelism from being inferred. This can cause confusion for 
> users utilizing the Adaptive Batch scheduler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35635) Release Testing: Verify FLIP-445: Support dynamic parallelism inference for HiveSource

2024-06-26 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-35635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860303#comment-17860303
 ] 

Zhu Zhu commented on FLINK-35635:
-

The ticket is assigned to you. Thanks for volunteering! [~JunRuiLi]

> Release Testing: Verify FLIP-445: Support dynamic parallelism inference for 
> HiveSource
> --
>
> Key: FLINK-35635
> URL: https://issues.apache.org/jira/browse/FLINK-35635
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: xingbe
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.20.0
>
>
> This issue aims to verify FLIP-445.
> The Hive Source now supports dynamic parallelism inference, and the default 
> parallelism inference mode has also changed from static to dynamic inference. 
> For detailed information, please refer to the 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#source-parallelism-inference].
> We may need to cover the following types of test cases:
> Test 1: Confirm that the dynamic parallelism inference feature is effective.
> Test 2: Confirm whether it is possible to switch the parallelism inference 
> mode or disable parallelism inference through the config option 
> `table.exec.hive.infer-source-parallelism.mode`.
> Test 3: Construct scenarios where DynamicPartitionPruning is effective, and 
> confirm whether the dynamically inferred source parallelism has been 
> optimized.
>  
> Follow up the test for https://issues.apache.org/jira/browse/FLINK-35293
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-35635) Release Testing: Verify FLIP-445: Support dynamic parallelism inference for HiveSource

2024-06-26 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35635:
---

Assignee: Junrui Li

> Release Testing: Verify FLIP-445: Support dynamic parallelism inference for 
> HiveSource
> --
>
> Key: FLINK-35635
> URL: https://issues.apache.org/jira/browse/FLINK-35635
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: xingbe
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.20.0
>
>
> This issue aims to verify FLIP-445.
> The Hive Source now supports dynamic parallelism inference, and the default 
> parallelism inference mode has also changed from static to dynamic inference. 
> For detailed information, please refer to the 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#source-parallelism-inference].
> We may need to cover the following types of test cases:
> Test 1: Confirm that the dynamic parallelism inference feature is effective.
> Test 2: Confirm whether it is possible to switch the parallelism inference 
> mode or disable parallelism inference through the config option 
> `table.exec.hive.infer-source-parallelism.mode`.
> Test 3: Construct scenarios where DynamicPartitionPruning is effective, and 
> confirm whether the dynamically inferred source parallelism has been 
> optimized.
>  
> Follow up the test for https://issues.apache.org/jira/browse/FLINK-35293
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35669) Release Testing: Verify FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-06-26 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-35669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860302#comment-17860302
 ] 

Zhu Zhu commented on FLINK-35669:
-

Assigned. Thanks for volunteering! [~xiasun]

> Release Testing: Verify FLIP-383: Support Job Recovery from JobMaster 
> Failures for Batch Jobs
> -
>
> Key: FLINK-35669
> URL: https://issues.apache.org/jira/browse/FLINK-35669
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Junrui Li
>Assignee: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.20.0
>
>
> In 1.20, we introduced a batch job recovery mechanism to enable batch jobs to 
> recover as much progress as possible after a JobMaster failover, avoiding the 
> need to rerun tasks that have already been finished.
> More information about this feature and how to enable it could be found in: 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/recovery_from_job_master_failure/]
> We may need the following tests:
>  # Start a batch job with High Availability (HA) enabled, and after it has 
> progressed to a certain point, kill the JobManager (jm), then observe whether 
> the job recovers its progress normally.
>  # Use a custom source and ensure that its SplitEnumerator implements the 
> SupportsBatchSnapshot interface, submit the job, and after it has progressed 
> to a certain point, kill the JobManager (jm), then observe whether the job 
> recovers its progress normally.
>  
> Follow up the test for https://issues.apache.org/jira/browse/FLINK-33892



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-35669) Release Testing: Verify FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-06-26 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35669:
---

Assignee: xingbe

> Release Testing: Verify FLIP-383: Support Job Recovery from JobMaster 
> Failures for Batch Jobs
> -
>
> Key: FLINK-35669
> URL: https://issues.apache.org/jira/browse/FLINK-35669
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Junrui Li
>Assignee: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.20.0
>
>
> In 1.20, we introduced a batch job recovery mechanism to enable batch jobs to 
> recover as much progress as possible after a JobMaster failover, avoiding the 
> need to rerun tasks that have already been finished.
> More information about this feature and how to enable it could be found in: 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/recovery_from_job_master_failure/]
> We may need the following tests:
>  # Start a batch job with High Availability (HA) enabled, and after it has 
> progressed to a certain point, kill the JobManager (jm), then observe whether 
> the job recovers its progress normally.
>  # Use a custom source and ensure that its SplitEnumerator implements the 
> SupportsBatchSnapshot interface, submit the job, and after it has progressed 
> to a certain point, kill the JobManager (jm), then observe whether the job 
> recovers its progress normally.
>  
> Follow up the test for https://issues.apache.org/jira/browse/FLINK-33892



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-35656) Hive Source has issues setting max parallelism in dynamic inference mode

2024-06-24 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35656.
---
Resolution: Fixed

master: 01c3fd67ac46898bd520477ae861cd29cceaa636
release-1.20: d93f7421e0d520d6b2899cbff5844867374b96ab

> Hive Source has issues setting max parallelism in dynamic inference mode
> 
>
> Key: FLINK-35656
> URL: https://issues.apache.org/jira/browse/FLINK-35656
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> In the dynamic parallelism inference mode of Hive Source, when 
> `table.exec.hive.infer-source-parallelism.max` is not configured, it does not 
> use `execution.batch.adaptive.auto-parallelism.default-source-parallelism` as 
> the upper bound for parallelism inference, which is inconsistent with the 
> behavior described in the documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-35656) Hive Source has issues setting max parallelism in dynamic inference mode

2024-06-20 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35656:
---

Assignee: xingbe

> Hive Source has issues setting max parallelism in dynamic inference mode
> 
>
> Key: FLINK-35656
> URL: https://issues.apache.org/jira/browse/FLINK-35656
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
> Fix For: 1.20.0
>
>
> In the dynamic parallelism inference mode of Hive Source, when 
> `table.exec.hive.infer-source-parallelism.max` is not configured, it does not 
> use `execution.batch.adaptive.auto-parallelism.default-source-parallelism` as 
> the upper bound for parallelism inference, which is inconsistent with the 
> behavior described in the documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-35635) Release Testing: Verify FLIP-445: Support dynamic parallelism inference for HiveSource

2024-06-18 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35635:
---

Assignee: (was: xingbe)

> Release Testing: Verify FLIP-445: Support dynamic parallelism inference for 
> HiveSource
> --
>
> Key: FLINK-35635
> URL: https://issues.apache.org/jira/browse/FLINK-35635
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.20.0
>
>
> This issue aims to verify FLIP-445.
> The Hive Source now supports dynamic parallelism inference, and the default 
> parallelism inference mode has also changed from static to dynamic inference. 
> For detailed information, please refer to the 
> [documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#source-parallelism-inference].
> We may need to cover the following types of test cases:
> Test 1: Confirm that the dynamic parallelism inference feature is effective.
> Test 2: Confirm whether it is possible to switch the parallelism inference 
> mode or disable parallelism inference through the config option 
> `table.exec.hive.infer-source-parallelism.mode`.
> Test 3: Construct scenarios where DynamicPartitionPruning is effective, and 
> confirm whether the dynamically inferred source parallelism has been 
> optimized.
>  
> Follow up the test for https://issues.apache.org/jira/browse/FLINK-35293
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-35522) The source task may get stuck after a failover occurs in batch jobs

2024-06-06 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35522.
---
Fix Version/s: 1.18.2
   1.19.2
   Resolution: Fixed

1.18: 8ae7986e1dca80a686b678feb4ea3bdfff4a19bb
1.19: 7afa6eaae3218f273c7971434ea88a2cfd966ced

> The source task may get stuck after a failover occurs in batch jobs
> ---
>
> Key: FLINK-35522
> URL: https://issues.apache.org/jira/browse/FLINK-35522
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.2
>
>
> If the source task does not get assigned a split because the SplitEnumerator 
> has no more splits, and a failover occurs during the closing process, the 
> SourceCoordinatorContext will not resend the NoMoreSplit event to the newly 
> started source task, causing the source vertex to remain stuck indefinitely. 
> This case may only occur in batch jobs where speculative execution has been 
> enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35522) The source task may get stuck after a failover occurs in batch jobs

2024-06-05 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-35522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852611#comment-17852611
 ] 

Zhu Zhu commented on FLINK-35522:
-

[~xiasun] would you open 2 PRs to backport this change to 1.18 and 1.19?

> The source task may get stuck after a failover occurs in batch jobs
> ---
>
> Key: FLINK-35522
> URL: https://issues.apache.org/jira/browse/FLINK-35522
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> If the source task does not get assigned a split because the SplitEnumerator 
> has no more splits, and a failover occurs during the closing process, the 
> SourceCoordinatorContext will not resend the NoMoreSplit event to the newly 
> started source task, causing the source vertex to remain stuck indefinitely. 
> This case may only occur in batch jobs where speculative execution has been 
> enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35522) The source task may get stuck after a failover occurs in batch jobs

2024-06-04 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-35522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852241#comment-17852241
 ] 

Zhu Zhu commented on FLINK-35522:
-

Thanks for reporting this problem and volunteering to fix it! [~xiasun]
The ticket is assigned to you.

> The source task may get stuck after a failover occurs in batch jobs
> ---
>
> Key: FLINK-35522
> URL: https://issues.apache.org/jira/browse/FLINK-35522
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
> Fix For: 1.20.0
>
>
> If the source task does not get assigned a split because the SplitEnumerator 
> has no more splits, and a failover occurs during the closing process, the 
> SourceCoordinatorContext will not resend the NoMoreSplit event to the newly 
> started source task, causing the source vertex to remain stuck indefinitely. 
> This case may only occur in batch jobs where speculative execution has been 
> enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-35522) The source task may get stuck after a failover occurs in batch jobs

2024-06-04 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35522:
---

Assignee: xingbe

> The source task may get stuck after a failover occurs in batch jobs
> ---
>
> Key: FLINK-35522
> URL: https://issues.apache.org/jira/browse/FLINK-35522
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
> Fix For: 1.20.0
>
>
> If the source task does not get assigned a split because the SplitEnumerator 
> has no more splits, and a failover occurs during the closing process, the 
> SourceCoordinatorContext will not resend the NoMoreSplit event to the newly 
> started source task, causing the source vertex to remain stuck indefinitely. 
> This case may only occur in batch jobs where speculative execution has been 
> enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-35483) BatchJobRecoveryTest related to JM failover produced no output for 900 second

2024-05-31 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35483.
---
Fix Version/s: 1.20.0
   Resolution: Fixed

f3a3f926c6c6c931bb7ccc52e823d70cfd8aadf5

> BatchJobRecoveryTest related to JM failover produced no output for 900 second
> -
>
> Key: FLINK-35483
> URL: https://issues.apache.org/jira/browse/FLINK-35483
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> testRecoverFromJMFailover
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59919=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=9476



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35399) Add documents for batch job master failure recovery

2024-05-30 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-35399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850739#comment-17850739
 ] 

Zhu Zhu commented on FLINK-35399:
-

56cd9607713d0da874dcc54c4cf6d5b3b52b1050 refined the doc a bit.

> Add documents for batch job master failure recovery
> ---
>
> Key: FLINK-35399
> URL: https://issues.apache.org/jira/browse/FLINK-35399
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Zhu Zhu
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-32384) Remove deprecated configuration keys which violate YAML spec

2024-05-30 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850734#comment-17850734
 ] 

Zhu Zhu commented on FLINK-32384:
-

Thanks for volunteering to contribute to Flink. [~kartikeypant]
However, this is a breaking change. Therefore, we cannot do it until Flink 1.20 
is released and release cycle of Flink 2.0 is started.
You are welcome to take this task if you are free at that moment.
Before that, you can take a look at other tickets.

> Remove deprecated configuration keys which violate YAML spec
> 
>
> Key: FLINK-32384
> URL: https://issues.apache.org/jira/browse/FLINK-32384
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Zhu Zhu
>Priority: Major
>  Labels: 2.0-related
> Fix For: 2.0.0
>
>
> In FLINK-29372, key that violate YAML spec are renamed to a valid form and 
> the old names are deprecated.
> In Flink 2.0 we should remove these deprecated keys. This will prevent users 
> (unintentionally) to create invalid YAML form flink-conf.yaml.
> Then with the work of FLINK-23620,  we can remove the non-standard YAML 
> parsing logic and enforce standard YAML validation in CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-33892) FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-05-30 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33892.
---
Fix Version/s: 1.20.0
   Resolution: Done

> FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs
> -
>
> Key: FLINK-33892
> URL: https://issues.apache.org/jira/browse/FLINK-33892
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> This is the umbrella ticket for 
> [FLIP-383|https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+for+Batch+Jobs]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33892) FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-05-29 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-33892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850582#comment-17850582
 ] 

Zhu Zhu commented on FLINK-33892:
-

The feature development is done except for some follow-up tasks.
Would you add some release notes to this ticket and close it? [~JunRuiLi]

> FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs
> -
>
> Key: FLINK-33892
> URL: https://issues.apache.org/jira/browse/FLINK-33892
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> This is the umbrella ticket for 
> [FLIP-383|https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+for+Batch+Jobs]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-35399) Add documents for batch job master failure recovery

2024-05-29 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35399.
---
Fix Version/s: 1.20.0
   Resolution: Done

aabfd6e2eaa37d554632129c959a309f85d528c5

> Add documents for batch job master failure recovery
> ---
>
> Key: FLINK-35399
> URL: https://issues.apache.org/jira/browse/FLINK-35399
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Zhu Zhu
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-35465) Introduce BatchJobRecoveryHandler for recovery of batch jobs from JobMaster failures.

2024-05-27 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35465.
---
Resolution: Done

master:
b8173eb662ee5823de40de356869d0064de2c22a
3206659db5b7c4ce645072f11f091e0e9e92b0ce
e964af392476e011147be73ae4dab8ff89512994

> Introduce BatchJobRecoveryHandler for recovery of batch jobs from JobMaster 
> failures.
> -
>
> Key: FLINK-35465
> URL: https://issues.apache.org/jira/browse/FLINK-35465
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-35465) Introduce BatchJobRecoveryHandler for recovery of batch jobs from JobMaster failures.

2024-05-27 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-35465:

Fix Version/s: 1.20.0

> Introduce BatchJobRecoveryHandler for recovery of batch jobs from JobMaster 
> failures.
> -
>
> Key: FLINK-35465
> URL: https://issues.apache.org/jira/browse/FLINK-35465
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-35465) Introduce BatchJobRecoveryHandler for recovery of batch jobs from JobMaster failures.

2024-05-27 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35465:
---

Assignee: Junrui Li

> Introduce BatchJobRecoveryHandler for recovery of batch jobs from JobMaster 
> failures.
> -
>
> Key: FLINK-35465
> URL: https://issues.apache.org/jira/browse/FLINK-35465
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-35426) Change the distribution of DynamicFilteringDataCollector to Broadcast

2024-05-26 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35426.
---
Resolution: Done

master: 4b342da6d149113dde821b370a136beef3430fff

> Change the distribution of DynamicFilteringDataCollector to Broadcast
> -
>
> Key: FLINK-35426
> URL: https://issues.apache.org/jira/browse/FLINK-35426
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Currently, the DynamicFilteringDataCollector is utilized in the dynamic 
> partition pruning feature of batch jobs to collect the partition information 
> dynamically filtered by the source. Its current data distribution method is 
> rebalance, and it also acts as an upstream vertex to the probe side Source.
> Presently, when the Scheduler dynamically infers the parallelism for vertices 
> that are both downstream and Source, it considers factors from both sides, 
> which can lead to an overestimation of parallelism due to 
> DynamicFilteringDataCollector being an upstream of the Source. We aim to 
> change the distribution method of the DynamicFilteringDataCollector to 
> broadcast to prevent the dynamic overestimation of Source parallelism.
> Furthermore, given that the DynamicFilteringDataCollector transmits data 
> through the OperatorCoordinator rather than through normal data distribution, 
> this change will not affect the DPP (Dynamic Partition Pruning) functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35426) Change the distribution of DynamicFilteringDataCollector to Broadcast

2024-05-22 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848801#comment-17848801
 ] 

Zhu Zhu commented on FLINK-35426:
-

Good point! [~xiasun]
The task is assigned to you. Feel free to open a pr for it.

> Change the distribution of DynamicFilteringDataCollector to Broadcast
> -
>
> Key: FLINK-35426
> URL: https://issues.apache.org/jira/browse/FLINK-35426
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
> Fix For: 1.20.0
>
>
> Currently, the DynamicFilteringDataCollector is utilized in the dynamic 
> partition pruning feature of batch jobs to collect the partition information 
> dynamically filtered by the source. Its current data distribution method is 
> rebalance, and it also acts as an upstream vertex to the probe side Source.
> Presently, when the Scheduler dynamically infers the parallelism for vertices 
> that are both downstream and Source, it considers factors from both sides, 
> which can lead to an overestimation of parallelism due to 
> DynamicFilteringDataCollector being an upstream of the Source. We aim to 
> change the distribution method of the DynamicFilteringDataCollector to 
> broadcast to prevent the dynamic overestimation of Source parallelism.
> Furthermore, given that the DynamicFilteringDataCollector transmits data 
> through the OperatorCoordinator rather than through normal data distribution, 
> this change will not affect the DPP (Dynamic Partition Pruning) functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-35426) Change the distribution of DynamicFilteringDataCollector to Broadcast

2024-05-22 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35426:
---

Assignee: xingbe

> Change the distribution of DynamicFilteringDataCollector to Broadcast
> -
>
> Key: FLINK-35426
> URL: https://issues.apache.org/jira/browse/FLINK-35426
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
> Fix For: 1.20.0
>
>
> Currently, the DynamicFilteringDataCollector is utilized in the dynamic 
> partition pruning feature of batch jobs to collect the partition information 
> dynamically filtered by the source. Its current data distribution method is 
> rebalance, and it also acts as an upstream vertex to the probe side Source.
> Presently, when the Scheduler dynamically infers the parallelism for vertices 
> that are both downstream and Source, it considers factors from both sides, 
> which can lead to an overestimation of parallelism due to 
> DynamicFilteringDataCollector being an upstream of the Source. We aim to 
> change the distribution method of the DynamicFilteringDataCollector to 
> broadcast to prevent the dynamic overestimation of Source parallelism.
> Furthermore, given that the DynamicFilteringDataCollector transmits data 
> through the OperatorCoordinator rather than through normal data distribution, 
> this change will not affect the DPP (Dynamic Partition Pruning) functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35384) Expose TaskIOMetricGroup to custom Partitioner via init Context

2024-05-21 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-35384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848126#comment-17848126
 ] 

Zhu Zhu commented on FLINK-35384:
-

Enabling metrics for partitioners makes sense and the proposed approach of 
introducing a context sounds good.

How about introducing a sub-metric group specifically for partitioner metrics? 
A single task might contain multiple partitioners for which the metrics should 
not get mixed. It also avoids exposing the internal TaskIOMetricGroup class to 
users.

> Expose TaskIOMetricGroup to custom Partitioner via init Context
> ---
>
> Key: FLINK-35384
> URL: https://issues.apache.org/jira/browse/FLINK-35384
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Affects Versions: 1.9.4
>Reporter: Steven Zhen Wu
>Priority: Major
>
> I am trying to implement a custom range partitioner in the Flink Iceberg 
> sink. Want to publish some counter metrics for certain scenarios. This is 
> like the network metrics exposed in `TaskIOMetricGroup`.
> We can implement the range partitioner using the public interface from 
> `DataStream`. 
> {code}
> public  DataStream partitionCustom(
> Partitioner partitioner, KeySelector keySelector)
> {code}
> We can pass the `TaskIOMetricGroup` to the `StreamPartitioner` that 
> `CustomPartitionerWrapper` extends from. `CustomPartitionerWrapper` wraps the 
> pubic `Partitioner` interface, where we can implement the custom range 
> partitioner.
> `Partitioner` interface is a functional interface today. we can add a new 
> default `setup` method without breaking the backward compatibility.
> {code}
> @Public
> @FunctionalInterface
> public interface Partitioner extends java.io.Serializable, Function {
> *default void setup(TaskIOMetricGroup metrics) {}*
> int partition(K key, int numPartitions);
> }
> {code}
> I know public interface requires a FLIP process. will do that if the 
> community agree with this feature request.
> Personally, `numPartitions` should be passed in the `setup` method too. But 
> it is a breaking change that is NOT worth the benefit right now.
> {code}
> @Public
> @FunctionalInterface
> public interface Partitioner extends java.io.Serializable, Function {
> public void setup(int numPartitions, TaskIOMetricGroup metrics) {}
> int partition(K key);
> }
> {code}
> That would be similar to `StreamPartitioner#setup()` method that we would 
> need to modify for passing the metrics group.
> {code}
> @Internal
> public abstract class StreamPartitioner
> implements ChannelSelector>>, 
> Serializable {
> @Override
> public void setup(int numberOfChannels, TaskIOMetricGroup metrics) {
> this.numberOfChannels = numberOfChannels;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35399) Add documents for batch job master failure recovery

2024-05-19 Thread Zhu Zhu (Jira)

Zhu Zhu created FLINK-35399:
---

 Summary: Add documents for batch job master failure recovery
 Key: FLINK-35399
 URL: https://issues.apache.org/jira/browse/FLINK-35399
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Zhu Zhu
Assignee: Junrui Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-33983) Introduce JobEvent and JobEventStore for Batch Job Recovery

2024-05-19 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33983.
---
Fix Version/s: 1.20.0
   Resolution: Done

master: c7c1d78752836b96591e31422c65b85eca38bd50

> Introduce JobEvent and JobEventStore for Batch Job Recovery
> ---
>
> Key: FLINK-33983
> URL: https://issues.apache.org/jira/browse/FLINK-33983
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-33986) Extend shuffleMaster to support batch snapshot.

2024-05-14 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33986:

Component/s: Runtime / Coordination

> Extend shuffleMaster to support batch snapshot.
> ---
>
> Key: FLINK-33986
> URL: https://issues.apache.org/jira/browse/FLINK-33986
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Extend shuffleMaster to support batch snapshot as follows:
>  # Add method supportsBatchSnapshot to identify whether the shuffle master 
> supports taking snapshot in batch scenarios
>  # Add method snapshotState and restoreState to snapshot and restore the 
> shuffle master's state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-33986) Extend shuffleMaster to support batch snapshot.

2024-05-14 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33986.
---
Fix Version/s: 1.20.0
   Resolution: Done

65d31e26534836909f6b8139c6bd6cd45b91bba4

> Extend shuffleMaster to support batch snapshot.
> ---
>
> Key: FLINK-33986
> URL: https://issues.apache.org/jira/browse/FLINK-33986
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Extend shuffleMaster to support batch snapshot as follows:
>  # Add method supportsBatchSnapshot to identify whether the shuffle master 
> supports taking snapshot in batch scenarios
>  # Add method snapshotState and restoreState to snapshot and restore the 
> shuffle master's state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35293) FLIP-445: Support dynamic parallelism inference for HiveSource

2024-05-13 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-35293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846177#comment-17846177
 ] 

Zhu Zhu commented on FLINK-35293:
-

The change is merged. Could you add release notes for it and close the ticket? 
[~xiasun].

> FLIP-445: Support dynamic parallelism inference for HiveSource
> --
>
> Key: FLINK-35293
> URL: https://issues.apache.org/jira/browse/FLINK-35293
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  introduces dynamic source parallelism inference, which, compared to static 
> inference, utilizes runtime information to more accurately determine the 
> source parallelism. The FileSource already possesses the capability for 
> dynamic parallelism inference. As a follow-up task to FLIP-379, this FLIP 
> plans to implement the dynamic parallelism inference interface for 
> HiveSource, and also switches the default static parallelism inference to 
> dynamic parallelism inference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35293) FLIP-445: Support dynamic parallelism inference for HiveSource

2024-05-13 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-35293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846137#comment-17846137
 ] 

Zhu Zhu commented on FLINK-35293:
-

master: ddb5a5355f9aca3d223f1fff6581d83dd317c2de

> FLIP-445: Support dynamic parallelism inference for HiveSource
> --
>
> Key: FLINK-35293
> URL: https://issues.apache.org/jira/browse/FLINK-35293
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  introduces dynamic source parallelism inference, which, compared to static 
> inference, utilizes runtime information to more accurately determine the 
> source parallelism. The FileSource already possesses the capability for 
> dynamic parallelism inference. As a follow-up task to FLIP-379, this FLIP 
> plans to implement the dynamic parallelism inference interface for 
> HiveSource, and also switches the default static parallelism inference to 
> dynamic parallelism inference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-34661) TaskExecutor supports retain partitions after JM crashed.

2024-05-13 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34661.
---
Fix Version/s: 1.20.0
   Resolution: Done

4e6b42046adbe2f337460d2e50f1fee12cff21a5

> TaskExecutor supports retain partitions after JM crashed.
> -
>
> Key: FLINK-34661
> URL: https://issues.apache.org/jira/browse/FLINK-34661
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-35270.
---
Resolution: Fixed

547e4b53ebe36c39066adcf3a98123a1f7890c15

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Haifei Chen
>Assignee: Haifei Chen
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35270:
---

Assignee: Haifei Chen

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Haifei Chen
>Assignee: Haifei Chen
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-35270:

Component/s: API / Core

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Core
>Reporter: Haifei Chen
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-35270:

Labels: pull-request-available starter  (was: pull-request-available)

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>Reporter: Haifei Chen
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-35270) Enrich information in logs, making it easier for debugging

2024-05-08 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-35270:

Fix Version/s: 1.20.0

> Enrich information in logs, making it easier for debugging
> --
>
> Key: FLINK-35270
> URL: https://issues.apache.org/jira/browse/FLINK-35270
> Project: Flink
>  Issue Type: Improvement
>Reporter: Haifei Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Good logs helps debug a lot in production environment
> Therefore, it'll be better to show more information in logs  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-35293) FLIP-445: Support dynamic parallelism inference for HiveSource

2024-05-06 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-35293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-35293:
---

Assignee: xingbe

> FLIP-445: Support dynamic parallelism inference for HiveSource
> --
>
> Key: FLINK-35293
> URL: https://issues.apache.org/jira/browse/FLINK-35293
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Affects Versions: 1.20.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  introduces dynamic source parallelism inference, which, compared to static 
> inference, utilizes runtime information to more accurately determine the 
> source parallelism. The FileSource already possesses the capability for 
> dynamic parallelism inference. As a follow-up task to FLIP-379, this FLIP 
> plans to implement the dynamic parallelism inference interface for 
> HiveSource, and also switches the default static parallelism inference to 
> dynamic parallelism inference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35009) Change on getTransitivePredecessors breaks connectors

2024-04-09 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-35009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835577#comment-17835577
 ] 

Zhu Zhu commented on FLINK-35009:
-

[~martijnvisser] Thanks for monitoring the kafka connector builds and reporting 
this problem!
Feel free to loop me in to review the pr.

> Change on getTransitivePredecessors breaks connectors
> -
>
> Key: FLINK-35009
> URL: https://issues.apache.org/jira/browse/FLINK-35009
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / Kafka
>Affects Versions: 1.18.2, 1.20.0, 1.19.1
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Blocker
>
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-kafka: Compilation failure: 
> Compilation failure: 
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[214,24]
>  
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  is not abstract and does not override abstract method 
> getTransitivePredecessorsInternal() in org.apache.flink.api.dag.Transformation
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[220,44]
>  getTransitivePredecessors() in 
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  cannot override getTransitivePredecessors() in 
> org.apache.flink.api.dag.Transformation
> Error:overridden method is final
> {code}
> Example: 
> https://github.com/apache/flink-connector-kafka/actions/runs/8494349338/job/23269406762#step:15:167



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35009) Change on getTransitivePredecessors breaks connectors

2024-04-08 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-35009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834799#comment-17834799
 ] 

Zhu Zhu commented on FLINK-35009:
-

Thanks for looking into the issue. [~Weijie Guo]
So it is just a test class which uses Flink `@Internal` classes is broken. And 
that test class is even not used. 
I think it's better to just remove `MockTransformation` from kafka connector.
WDYT? [~martijnvisser]

> Change on getTransitivePredecessors breaks connectors
> -
>
> Key: FLINK-35009
> URL: https://issues.apache.org/jira/browse/FLINK-35009
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Connectors / Kafka
>Affects Versions: 1.18.2, 1.20.0, 1.19.1
>Reporter: Martijn Visser
>Priority: Blocker
>
> {code:java}
> Error:  Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:testCompile 
> (default-testCompile) on project flink-connector-kafka: Compilation failure: 
> Compilation failure: 
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[214,24]
>  
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  is not abstract and does not override abstract method 
> getTransitivePredecessorsInternal() in org.apache.flink.api.dag.Transformation
> Error:  
> /home/runner/work/flink-connector-kafka/flink-connector-kafka/flink-connector-kafka/src/test/java/org/apache/flink/streaming/connectors/kafka/testutils/DataGenerators.java:[220,44]
>  getTransitivePredecessors() in 
> org.apache.flink.streaming.connectors.kafka.testutils.DataGenerators.InfiniteStringsGenerator.MockTransformation
>  cannot override getTransitivePredecessors() in 
> org.apache.flink.api.dag.Transformation
> Error:overridden method is final
> {code}
> Example: 
> https://github.com/apache/flink-connector-kafka/actions/runs/8494349338/job/23269406762#step:15:167



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-04-08 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834798#comment-17834798
 ] 

Zhu Zhu edited comment on FLINK-32513 at 4/8/24 6:27 AM:
-

{{Transformation}} is not a public interface, it is an @Internal class. 
Ideally, kafka connectors should not directly manipulate {{Transformation}}.
Yet we may try to find a workaround to avoid break existing kafka connectors.


was (Author: zhuzh):
{{Transformation}} is not a public interface, it is an @Internal class. 
Ideally, kafka connectors should not directly manipulate {{Transformation}}.
Production-wise, we may take a look whether there is a workaround to avoid 
break existing kafka connectors.

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>

[jira] [Commented] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-04-08 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834798#comment-17834798
 ] 

Zhu Zhu commented on FLINK-32513:
-

{{Transformation}} is not a public interface, it is an @Internal class. 
Ideally, kafka connectors should not directly manipulate {{Transformation}}.
Production-wise, we may take a look whether there is a workaround to avoid 
break existing kafka connectors.

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
>

[jira] [Assigned] (FLINK-34661) TaskExecutor supports retain partitions after JM crashed.

2024-04-07 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34661:
---

Assignee: Junrui Li

> TaskExecutor supports retain partitions after JM crashed.
> -
>
> Key: FLINK-34661
> URL: https://issues.apache.org/jira/browse/FLINK-34661
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-33983) Introduce JobEvent and JobEventStore for Batch Job Recovery

2024-04-07 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33983:
---

Assignee: Junrui Li

> Introduce JobEvent and JobEventStore for Batch Job Recovery
> ---
>
> Key: FLINK-33983
> URL: https://issues.apache.org/jira/browse/FLINK-33983
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-33986) Extend shuffleMaster to support batch snapshot.

2024-04-07 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33986:
---

Assignee: Junrui Li

> Extend shuffleMaster to support batch snapshot.
> ---
>
> Key: FLINK-33986
> URL: https://issues.apache.org/jira/browse/FLINK-33986
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>
> Extend shuffleMaster to support batch snapshot as follows:
>  # Add method supportsBatchSnapshot to identify whether the shuffle master 
> supports taking snapshot in batch scenarios
>  # Add method snapshotState and restoreState to snapshot and restore the 
> shuffle master's state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-33984) Introduce SupportsBatchSnapshot for operator coordinator

2024-04-07 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33984.
---
Fix Version/s: 1.20.0
   Resolution: Done

master:
38255652406becbfbcb7cbec557aa5ba9a1ebbb3
558ca75da2fcec875d1e04a8d75a24fd0ad42ccc

> Introduce SupportsBatchSnapshot for operator coordinator
> 
>
> Key: FLINK-33984
> URL: https://issues.apache.org/jira/browse/FLINK-33984
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-34945) Support recover shuffle descriptor and partition metrics from tiered storage

2024-04-07 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34945:
---

Assignee: Junrui Li

> Support recover shuffle descriptor and partition metrics from tiered storage
> 
>
> Key: FLINK-34945
> URL: https://issues.apache.org/jira/browse/FLINK-34945
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-33985) Support obtain all partitions existing in cluster through ShuffleMaster.

2024-04-02 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33985.
---
Fix Version/s: 1.20.0
   Resolution: Done

master: a44709662956b306fe686623d00358a6b076f637

> Support obtain all partitions existing in cluster through ShuffleMaster.
> 
>
> Key: FLINK-33985
> URL: https://issues.apache.org/jira/browse/FLINK-33985
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Support obtain all partitions existing in cluster through ShuffleMaster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-33985) Support obtain all partitions existing in cluster through ShuffleMaster.

2024-04-02 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33985:

Component/s: Runtime / Coordination

> Support obtain all partitions existing in cluster through ShuffleMaster.
> 
>
> Key: FLINK-33985
> URL: https://issues.apache.org/jira/browse/FLINK-33985
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Support obtain all partitions existing in cluster through ShuffleMaster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-33982) Introduce new config options for Batch Job Recovery

2024-04-02 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33982.
---
Fix Version/s: 1.20.0
   Resolution: Done

master: ec1311c8eb805f91b3b8d7d7cbe192e8cad05a76

> Introduce new config options for Batch Job Recovery
> ---
>
> Key: FLINK-33982
> URL: https://issues.apache.org/jira/browse/FLINK-33982
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34565) Enhance flink kubernetes configMap to accommodate additional configuration files

2024-03-27 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-34565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831336#comment-17831336
 ] 

Zhu Zhu commented on FLINK-34565:
-

IIUC, the requirement is to ship more user files, which may be needed by user 
code, to the pod. Supporting configuration files is just a special case of it. 
Shipping them via ConfigMap sounds a bit tricky to me.
cc [~wangyang0918]

> Enhance flink kubernetes configMap to accommodate additional configuration 
> files
> 
>
> Key: FLINK-34565
> URL: https://issues.apache.org/jira/browse/FLINK-34565
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Reporter: Surendra Singh Lilhore
>Priority: Major
>  Labels: pull-request-available
>
> Flink kubernetes client currently supports a fixed number of files 
> (flink-conf.yaml, logback-console.xml, log4j-console.properties) in the JM 
> and TM Pod ConfigMap. In certain scenarios, particularly in app mode, 
> additional configuration files are required for jobs to run successfully. 
> Presently, users must resort to workarounds to include dynamic configuration 
> files in the JM and TM. This proposed improvement allows users to easily add 
> extra files by configuring the 
> '{*}kubernetes.flink.configmap.additional.resources{*}' property. Users can 
> provide a semicolon-separated list of local files in the client Flink config 
> directory that should be included in the Flink ConfigMap.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-33984) Introduce SupportsBatchSnapshot for operator coordinator

2024-03-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33984:
---

Assignee: Junrui Li

> Introduce SupportsBatchSnapshot for operator coordinator
> 
>
> Key: FLINK-33984
> URL: https://issues.apache.org/jira/browse/FLINK-33984
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-33985) Support obtain all partitions existing in cluster through ShuffleMaster.

2024-03-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33985:
---

Assignee: Junrui Li

> Support obtain all partitions existing in cluster through ShuffleMaster.
> 
>
> Key: FLINK-33985
> URL: https://issues.apache.org/jira/browse/FLINK-33985
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>
> Support obtain all partitions existing in cluster through ShuffleMaster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-33982) Introduce new config options for Batch Job Recovery

2024-03-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-33982:
---

Assignee: Junrui Li

> Introduce new config options for Batch Job Recovery
> ---
>
> Key: FLINK-33982
> URL: https://issues.apache.org/jira/browse/FLINK-33982
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-33892) FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-03-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33892:

Summary: FLIP-383: Support Job Recovery from JobMaster Failures for Batch 
Jobs  (was: FLIP-383: Support Job Recovery for Batch Jobs)

> FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs
> -
>
> Key: FLINK-33892
> URL: https://issues.apache.org/jira/browse/FLINK-33892
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: Lijie Wang
>Assignee: Junrui Li
>Priority: Major
>
> This is the umbrella ticket for 
> [FLIP-383|https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+for+Batch+Jobs]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-03-22 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-32513.
---
Fix Version/s: 1.18.2
   1.20.0
   1.19.1
   Resolution: Fixed

master: 8dcb0ae9063b66af1d674b7b0b3be76b6d752692
release-1.19: 5ec4bf2f18168001b5cbb9012f331d3405228516
release-1.18: 940b3bbda5b10abe3a41d60467d33fd424c7dae6

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
>

[jira] [Closed] (FLINK-34731) Remove SpeculativeScheduler and incorporate its features into AdaptiveBatchScheduler

2024-03-21 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34731.
---
Resolution: Done

master: cf0d75c4bb324825a057dc72243bb6a2046f8479

> Remove SpeculativeScheduler and incorporate its features into 
> AdaptiveBatchScheduler
> 
>
> Key: FLINK-34731
> URL: https://issues.apache.org/jira/browse/FLINK-34731
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Presently, speculative execution is exposed to users as a feature of the 
> AdaptiveBatchScheduler.
> To streamline our codebase and reduce maintenance overhead, this ticket will 
> consolidate the SpeculativeScheduler into the AdaptiveBatchScheduler, 
> eliminating the need for a separate SpeculativeScheduler class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-32513) Job in BATCH mode with a significant number of transformations freezes on method StreamGraphGenerator.existsUnboundedSource()

2024-03-21 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-32513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-32513:
---

Assignee: Jeyhun Karimov

> Job in BATCH mode with a significant number of transformations freezes on 
> method StreamGraphGenerator.existsUnboundedSource()
> -
>
> Key: FLINK-32513
> URL: https://issues.apache.org/jira/browse/FLINK-32513
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.15.3, 1.16.1, 1.17.1
> Environment: All modes (local, k8s session, k8s application, ...)
> Flink 1.15.3
> Flink 1.16.1
> Flink 1.17.1
>Reporter: Vladislav Keda
>Assignee: Jeyhun Karimov
>Priority: Critical
>  Labels: pull-request-available
> Attachments: image-2023-07-10-17-26-46-544.png
>
>
> Flink job executed in BATCH mode with a significant number of transformations 
> (more than 30 in my case) takes very long time to start due to the method 
> StreamGraphGenerator.existsUnboundedSource(). Also, during the execution of 
> the method, a lot of memory is consumed, which causes the GC to fire 
> frequently.
> Thread Dump:
> {code:java}
> "main@1" prio=5 tid=0x1 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>       at java.util.ArrayList.addAll(ArrayList.java:702)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:224)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.PartitionTransformation.getTransitivePredecessors(PartitionTransformation.java:95)
>       at 
> org.apache.flink.streaming.api.transformations.TwoInputTransformation.getTransitivePredecessors(TwoInputTransformation.java:223)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
> org.apache.flink.streaming.api.transformations.OneInputTransformation.getTransitivePredecessors(OneInputTransformation.java:174)
>       at 
>

[jira] [Closed] (FLINK-34725) Dockerfiles for release publishing has incorrect config.yaml path

2024-03-21 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34725.
---
Resolution: Fixed

master: 3f4a80989fe7243983926f09fac2283f6fa63693
release-1.19: f53c5628e43777b4b924ec81224acc3df938800a

> Dockerfiles for release publishing has incorrect config.yaml path
> -
>
> Key: FLINK-34725
> URL: https://issues.apache.org/jira/browse/FLINK-34725
> Project: Flink
>  Issue Type: Bug
>  Components: flink-docker
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0, 1.19.1
>
>
> An issue found when do docker image publishing, unexpected error msg:
> {code:java}
> sed: can't read /config.yaml: No such file or directory{code}
>  
> also found in flink-docker/master daily Publish SNAPSHOTs  action:
> [https://github.com/apache/flink-docker/actions/runs/8210534289/job/22458150514#step:8:588]
> [https://github.com/apache/flink-docker/actions/runs/8210534289/job/22458150322#step:8:549]
>  
> This related to changes by https://issues.apache.org/jira/browse/FLINK-34205



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-34731) Remove SpeculativeScheduler and incorporate its features into AdaptiveBatchScheduler

2024-03-21 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34731:
---

Assignee: Junrui Li

> Remove SpeculativeScheduler and incorporate its features into 
> AdaptiveBatchScheduler
> 
>
> Key: FLINK-34731
> URL: https://issues.apache.org/jira/browse/FLINK-34731
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
> Fix For: 1.20.0
>
>
> Presently, speculative execution is exposed to users as a feature of the 
> AdaptiveBatchScheduler.
> To streamline our codebase and reduce maintenance overhead, this ticket will 
> consolidate the SpeculativeScheduler into the AdaptiveBatchScheduler, 
> eliminating the need for a separate SpeculativeScheduler class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-03-06 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34105.
---
Fix Version/s: 1.19.0
 Assignee: dizhou cao  (was: Yangze Guo)
   Resolution: Fixed

1.19: 837f8e584850bdcbc586a54f58e3fe953a816be8

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: dizhou cao
>Priority: Critical
> Fix For: 1.19.0
>
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34377) Release Testing: Verify FLINK-33297 Support standard YAML for FLINK configuration

2024-02-25 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-34377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820553#comment-17820553
 ] 

Zhu Zhu commented on FLINK-34377:
-

Thanks for volunteering! [~xiasun]
I have assigned you the ticket.

> Release Testing: Verify FLINK-33297 Support standard YAML for FLINK 
> configuration
> -
>
> Key: FLINK-34377
> URL: https://issues.apache.org/jira/browse/FLINK-34377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Junrui Li
>Assignee: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-366.
> Starting with version 1.19, Flink has officially introduced full support for 
> the standard YAML 1.2 syntax. For detailed information, please refer to the 
> Flink 
> website:https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#flink-configuration-file
> We may need to cover the following two types of test cases:
> Test 1: For newly created jobs, utilize a config.yaml file to set up the 
> Flink cluster. We need to verify that the job runs as expected with this new 
> configuration.
> Test 2: For existing jobs, migrate the legacy flink-conf.yaml to the new 
> config.yaml. Test the job runs just like before post-migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-34377) Release Testing: Verify FLINK-33297 Support standard YAML for FLINK configuration

2024-02-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34377:
---

Assignee: xingbe

> Release Testing: Verify FLINK-33297 Support standard YAML for FLINK 
> configuration
> -
>
> Key: FLINK-34377
> URL: https://issues.apache.org/jira/browse/FLINK-34377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Junrui Li
>Assignee: xingbe
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-366.
> Starting with version 1.19, Flink has officially introduced full support for 
> the standard YAML 1.2 syntax. For detailed information, please refer to the 
> Flink 
> website:https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#flink-configuration-file
> We may need to cover the following two types of test cases:
> Test 1: For newly created jobs, utilize a config.yaml file to set up the 
> Flink cluster. We need to verify that the job runs as expected with this new 
> configuration.
> Test 2: For existing jobs, migrate the legacy flink-conf.yaml to the new 
> config.yaml. Test the job runs just like before post-migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-34383) Modify the comment with incorrect syntax

2024-02-21 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34383.
---
Resolution: Won't Fix

> Modify the comment with incorrect syntax
> 
>
> Key: FLINK-34383
> URL: https://issues.apache.org/jira/browse/FLINK-34383
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: li you
>Priority: Major
>  Labels: pull-request-available
>
> There is an error in the syntax of the comment for the class 
> PermanentBlobCache



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-34356) Release Testing: Verify FLINK-33768 Support dynamic source parallelism inference for batch jobs

2024-02-21 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34356:
---

Assignee: Junrui Li

> Release Testing: Verify FLINK-33768  Support dynamic source parallelism 
> inference for batch jobs 
> -
>
> Key: FLINK-34356
> URL: https://issues.apache.org/jira/browse/FLINK-34356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-379.
> New Source can implement the interface DynamicParallelismInference to enable 
> dynamic parallelism inference. For detailed information, please refer to the 
> documentation.
> We may need to cover the following two types of test cases:
> Test 1: FileSource has implemented the dynamic source parallelism inference. 
> Test the automatic parallelism inference of FileSource.
> Test 2: Test the dynamic source parallelism inference of a custom Source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34356) Release Testing: Verify FLINK-33768 Support dynamic source parallelism inference for batch jobs

2024-02-21 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-34356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819142#comment-17819142
 ] 

Zhu Zhu commented on FLINK-34356:
-

Assigned. Thanks for volunteering! [~JunRuiLi]


> Release Testing: Verify FLINK-33768  Support dynamic source parallelism 
> inference for batch jobs 
> -
>
> Key: FLINK-34356
> URL: https://issues.apache.org/jira/browse/FLINK-34356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>
> This issue aims to verify FLIP-379.
> New Source can implement the interface DynamicParallelismInference to enable 
> dynamic parallelism inference. For detailed information, please refer to the 
> documentation.
> We may need to cover the following two types of test cases:
> Test 1: FileSource has implemented the dynamic source parallelism inference. 
> Test the automatic parallelism inference of FileSource.
> Test 2: Test the dynamic source parallelism inference of a custom Source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-33241) Align config option generation documentation for Flink's config documentation

2024-02-20 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33241.
---
Fix Version/s: 1.19.0
   Resolution: Fixed

Fixed via 8fe005cf1325b7477d7e1808a46bd80798165029

> Align config option generation documentation for Flink's config documentation
> -
>
> Key: FLINK-33241
> URL: https://issues.apache.org/jira/browse/FLINK-33241
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 1.19.0
>
>
> The configuration parameter docs generation is documented in two places in 
> different ways:
> [docs/README.md:62|https://github.com/apache/flink/blob/5c1e9f3b1449cb77276d578b344d9a69c7cf9a3c/docs/README.md#L62]
>  and 
> [flink-docs/README.md:44|https://github.com/apache/flink/blob/7bebd2d9fac517c28afc24c0c034d77cfe2b43a6/flink-docs/README.md#L44].
> We should remove the corresponding command from {{docs/README.md}} and refer 
> to {{flink-docs/README.md}} for the documentation. That way, we only have to 
> maintain a single file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-34247) Document FLIP-366: Support standard YAML for FLINK configuration

2024-02-04 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34247.
---
  Assignee: Junrui Li
Resolution: Done

master/release-1.19:
5b61baadd02ccdfa702834e2e63aeb8d1d9e1250
04dd91f2b6c830b9ac0e445f72938e3d6f479edd
e9bea09510e18c6143e6e14ca17a894abfaf92bf

> Document FLIP-366: Support standard YAML for FLINK configuration
> 
>
> Key: FLINK-34247
> URL: https://issues.apache.org/jira/browse/FLINK-34247
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Reopened] (FLINK-33768) FLIP-379: Support dynamic source parallelism inference for batch jobs

2024-02-01 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reopened FLINK-33768:
-

> FLIP-379: Support dynamic source parallelism inference for batch jobs
> -
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-34145) File source connector support dynamic source parallelism inference in batch jobs

2024-02-01 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34145:

Parent: FLINK-33768
Issue Type: Sub-task  (was: Improvement)

> File source connector support dynamic source parallelism inference in batch 
> jobs
> 
>
> Key: FLINK-34145
> URL: https://issues.apache.org/jira/browse/FLINK-34145
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / FileSystem
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  has introduced support for dynamic source parallelism inference in batch 
> jobs, and we plan to give priority to enabling this feature for the file 
> source connector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-34144) Update the documentation and configuration description about dynamic source parallelism inference

2024-02-01 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34144:

Parent: FLINK-33768
Issue Type: Sub-task  (was: Improvement)

> Update the documentation and configuration description about dynamic source 
> parallelism inference
> -
>
> Key: FLINK-34144
> URL: https://issues.apache.org/jira/browse/FLINK-34144
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource]
>  introduces the new feature of dynamic source parallelism inference, and we 
> plan to update the documentation and configuration items accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-34143) Modify the effective strategy of `execution.batch.adaptive.auto-parallelism.default-source-parallelism`

2024-02-01 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34143:

Parent: FLINK-33768
Issue Type: Sub-task  (was: Improvement)

> Modify the effective strategy of 
> `execution.batch.adaptive.auto-parallelism.default-source-parallelism`
> ---
>
> Key: FLINK-34143
> URL: https://issues.apache.org/jira/browse/FLINK-34143
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, if users do not set the 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  configuration option, the AdaptiveBatchScheduler defaults to a parallelism 
> of 1 for source vertices. In 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource],
>  the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}}
>  will act as the upper bound for inferring dynamic source parallelism, and 
> continuing with the current policy is no longer appropriate.
> We plan to change the effectiveness strategy of 
> `{{{}execution.batch.adaptive.auto-parallelism.default-source-parallelism`{}}};
>  when the user does not set this config option, we will use the value of 
> `{{{}execution.batch.adaptive.auto-parallelism.max-parallelism`{}}} as the 
> upper bound for source parallelism inference. If 
> {{`execution.batch.adaptive.auto-parallelism.max-parallelism`}} is also not 
> configured, the value of `{{{}parallelism.default`{}}} will be used as a 
> fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-33768) FLIP-379: Support dynamic source parallelism inference for batch jobs

2024-02-01 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33768:

Summary: FLIP-379: Support dynamic source parallelism inference for batch 
jobs  (was: [FLIP-379] Support dynamic source parallelism inference for batch 
jobs)

> FLIP-379: Support dynamic source parallelism inference for batch jobs
> -
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-33768) [FLIP-379] Support dynamic source parallelism inference for batch jobs

2024-02-01 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-33768:

Summary: [FLIP-379] Support dynamic source parallelism inference for batch 
jobs  (was: Support dynamic source parallelism inference for batch jobs)

> [FLIP-379] Support dynamic source parallelism inference for batch jobs
> --
>
> Key: FLINK-33768
> URL: https://issues.apache.org/jira/browse/FLINK-33768
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Currently, for JobVertices without parallelism configured, the 
> AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the 
> volume of input data. Specifically, for Source vertices, it uses the value of 
> `{*}execution.batch.adaptive.auto-parallelism.default-source-parallelism{*}` 
> as the fixed parallelism. If this is not set by the user, the default value 
> of {{1}}  is used as the source parallelism, which is actually a temporary 
> implementation solution.
> We aim to support dynamic source parallelism inference for batch jobs. More 
> details see 
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34132) Batch WordCount job fails when run with AdaptiveBatch scheduler

2024-02-01 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-34132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813095#comment-17813095
 ] 

Zhu Zhu commented on FLINK-34132:
-

We should also migrate the existing batch examples from DataSet to DataStream 
so that it can directly work with AdaptiveBatchScheduler. 
This work needs to be done before removing the DataSet API in Flink 2.0.
cc [~Wencong Liu]

> Batch WordCount job fails when run with AdaptiveBatch scheduler
> ---
>
> Key: FLINK-34132
> URL: https://issues.apache.org/jira/browse/FLINK-34132
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.17.1, 1.18.1
>Reporter: Prabhu Joseph
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Batch WordCount job fails when run with AdaptiveBatch scheduler.
> *Repro Steps*
> {code:java}
> flink-yarn-session -Djobmanager.scheduler=adaptive -d
>  flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input 
> s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT
> {code}
> *Error logs*
> {code:java}
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1067)
>   at 
> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:144)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:73)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>   ... 12 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1062)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
>

[jira] [Updated] (FLINK-34132) Batch WordCount job fails when run with AdaptiveBatch scheduler

2024-02-01 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34132:

Component/s: Documentation

> Batch WordCount job fails when run with AdaptiveBatch scheduler
> ---
>
> Key: FLINK-34132
> URL: https://issues.apache.org/jira/browse/FLINK-34132
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.17.1, 1.18.1
>Reporter: Prabhu Joseph
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Batch WordCount job fails when run with AdaptiveBatch scheduler.
> *Repro Steps*
> {code:java}
> flink-yarn-session -Djobmanager.scheduler=adaptive -d
>  flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input 
> s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT
> {code}
> *Error logs*
> {code:java}
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1067)
>   at 
> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:144)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:73)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>   ... 12 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1062)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457)
>   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
>   at 
>

[jira] [Closed] (FLINK-34132) Batch WordCount job fails when run with AdaptiveBatch scheduler

2024-02-01 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34132.
---
Fix Version/s: 1.19.0
 Assignee: Junrui Li
   Resolution: Fixed

The documentation is updated via dd3e60a4b1e473c167837b7c3bc4fb90c0a1f51a

> Batch WordCount job fails when run with AdaptiveBatch scheduler
> ---
>
> Key: FLINK-34132
> URL: https://issues.apache.org/jira/browse/FLINK-34132
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.17.1, 1.18.1
>Reporter: Prabhu Joseph
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Batch WordCount job fails when run with AdaptiveBatch scheduler.
> *Repro Steps*
> {code:java}
> flink-yarn-session -Djobmanager.scheduler=adaptive -d
>  flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input 
> s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT
> {code}
> *Error logs*
> {code:java}
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The main method 
> caused an error: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>   at 
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:105)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:851)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:245)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1095)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
> Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1067)
>   at 
> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:144)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:73)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>   ... 12 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1062)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.runtime.client.JobInitializationException: Could not start 
> the JobMaster.
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457)
>   at

[jira] [Closed] (FLINK-34126) Correct the description of jobmanager.scheduler

2024-01-31 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34126.
---
Fix Version/s: 1.19.0
 Assignee: Junrui Li
   Resolution: Fixed

Fixed via 2be1ea801cf616d0d0a82729829245c205caaad8

> Correct the description of jobmanager.scheduler
> ---
>
> Key: FLINK-34126
> URL: https://issues.apache.org/jira/browse/FLINK-34126
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> Now the config option jobmanager.scheduler has description: 
> _Determines which scheduler implementation is used to schedule tasks. 
> Accepted values are:_
>  * _'Default': Default scheduler_
>  * _'Adaptive': Adaptive scheduler. More details can be found 
> [here|https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/elastic_scaling#adaptive-scheduler]._
>  * _'AdaptiveBatch': Adaptive batch scheduler. More details can be found 
> [here|https://nightlies.apache.org/flink/flink-docs-master/zh/docs/deployment/elastic_scaling#adaptive-batch-scheduler]._
> _Possible values:_
>  * _"Default"_
>  * _"Adaptive"_
>  * _"AdaptiveBatch"_
>  
> However, after FLIP-283 we changed the default scheduler for batch job to 
> AdaptiveBatchScheduler. This config option description will mislead users 
> that the 'DefaultScheduler' is the universal fallback for both batch and 
> streaming jobs.
> We should update this description.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-34206) CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed

2024-01-31 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34206.
---
Fix Version/s: 1.19.0
   Resolution: Fixed

Fixed via b737b71859672e8020881ce2abf998735ee98abb

> CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed
> ---
>
> Key: FLINK-34206
> URL: https://issues.apache.org/jira/browse/FLINK-34206
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: xingbe
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56728=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8763
> {code}
> Jan 23 01:39:48 01:39:48.152 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.24 s <<< FAILURE! -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> Jan 23 01:39:48 01:39:48.152 [ERROR] 
> org.apache.flink.test.streaming.runtime.CacheITCase.testRetryOnCorruptedClusterDataset(Path)
>  -- Time elapsed: 4.755 s <<< ERROR!
> Jan 23 01:39:48 org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
>

[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-29 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812123#comment-17812123
 ] 

Zhu Zhu commented on FLINK-34105:
-

Thanks for the updates! [~guoyangze]

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-34257) Update Flink YAML Parser to Support YAML 1.2 Specification

2024-01-29 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34257.
---
Resolution: Fixed

Fixed via 081051a2cacaddf6dfe613da061f15f28a015a41

>  Update Flink YAML Parser to Support YAML 1.2 Specification
> ---
>
> Key: FLINK-34257
> URL: https://issues.apache.org/jira/browse/FLINK-34257
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Runtime / Configuration
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> FLINK-33364 and FLINK-33577 added snakeyaml and pyyaml dependencies to 
> support a standard YAML parser. However, these parsers support the YAML 1.1 
> specification rather than the YAML 1.2 specification, which is the version 
> referenced by [FLINK official 
> website|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#configuration].
>  Therefore, we need to update these dependencies that support YAML 1.2.
> The updated dependencies are as follows:
> 1. For Java: change from snakeyaml to snakeyaml-engine
> 2. For Python: change from pyyaml to ruamel.yaml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-34257) Update Flink YAML Parser to Support YAML 1.2 Specification

2024-01-29 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34257:

Affects Version/s: 1.19.0

>  Update Flink YAML Parser to Support YAML 1.2 Specification
> ---
>
> Key: FLINK-34257
> URL: https://issues.apache.org/jira/browse/FLINK-34257
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> FLINK-33364 and FLINK-33577 added snakeyaml and pyyaml dependencies to 
> support a standard YAML parser. However, these parsers support the YAML 1.1 
> specification rather than the YAML 1.2 specification, which is the version 
> referenced by [FLINK official 
> website|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#configuration].
>  Therefore, we need to update these dependencies that support YAML 1.2.
> The updated dependencies are as follows:
> 1. For Java: change from snakeyaml to snakeyaml-engine
> 2. For Python: change from pyyaml to ruamel.yaml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34105) Akka timeout happens in TPC-DS benchmarks

2024-01-29 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-34105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811867#comment-17811867
 ] 

Zhu Zhu commented on FLINK-34105:
-

Hi [~lsdy], what's the status of the fix?

> Akka timeout happens in TPC-DS benchmarks
> -
>
> Key: FLINK-34105
> URL: https://issues.apache.org/jira/browse/FLINK-34105
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Zhu Zhu
>Assignee: Yangze Guo
>Priority: Critical
> Attachments: image-2024-01-16-13-59-45-556.png
>
>
> We noticed akka timeout happens in 10TB TPC-DS benchmarks in 1.19. The 
> problem did not happen in 1.18.0.
> After bisecting, we find the problem was introduced in FLINK-33532.
>  !image-2024-01-16-13-59-45-556.png|width=800! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (FLINK-34257) Update Flink YAML Parser to Support YAML 1.2 Specification

2024-01-29 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34257:
---

Assignee: Junrui Li

>  Update Flink YAML Parser to Support YAML 1.2 Specification
> ---
>
> Key: FLINK-34257
> URL: https://issues.apache.org/jira/browse/FLINK-34257
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> FLINK-33364 and FLINK-33577 added snakeyaml and pyyaml dependencies to 
> support a standard YAML parser. However, these parsers support the YAML 1.1 
> specification, not the YAML 1.2 specification. Therefore, we need to update 
> these dependencies that support YAML 1.2.
> The updated dependencies are as follows:
> 1. For Java: change from snakeyaml to snakeyaml-engine
> 2. For Python: change from pyyaml to ruamel.yaml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-34257) Update Flink YAML Parser to Support YAML 1.2 Specification

2024-01-29 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-34257:

Component/s: Runtime / Configuration

>  Update Flink YAML Parser to Support YAML 1.2 Specification
> ---
>
> Key: FLINK-34257
> URL: https://issues.apache.org/jira/browse/FLINK-34257
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, Runtime / Configuration
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> FLINK-33364 and FLINK-33577 added snakeyaml and pyyaml dependencies to 
> support a standard YAML parser. However, these parsers support the YAML 1.1 
> specification, not the YAML 1.2 specification. Therefore, we need to update 
> these dependencies that support YAML 1.2.
> The updated dependencies are as follows:
> 1. For Java: change from snakeyaml to snakeyaml-engine
> 2. For Python: change from pyyaml to ruamel.yaml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-34245) CassandraSinkTest.test_cassandra_sink fails under JDK17 and JDK21 due to InaccessibleObjectException

2024-01-28 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34245.
---
Fix Version/s: 1.19.0
 Assignee: Junrui Li
   Resolution: Fixed

Fixed via ddbf87f2a7aeeeb20a8590578c6d037b239d5593

> CassandraSinkTest.test_cassandra_sink fails under JDK17 and JDK21 due to 
> InaccessibleObjectException
> 
>
> Key: FLINK-34245
> URL: https://issues.apache.org/jira/browse/FLINK-34245
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Connectors / Cassandra
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56942=logs=b53e1644-5cb4-5a3b-5d48-f523f39bcf06=b68c9f5c-04c9-5c75-3862-a3a27aabbce3]
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56942=logs=60960eae-6f09-579e-371e-29814bdd1adc=7a70c083-6a74-5348-5106-30a76c29d8fa=63680]
> {code:java}
> Jan 26 01:29:27 E   py4j.protocol.Py4JJavaError: An error 
> occurred while calling 
> z:org.apache.flink.python.util.PythonConfigUtil.configPythonOperator.
> Jan 26 01:29:27 E   : 
> java.lang.reflect.InaccessibleObjectException: Unable to make field final 
> java.util.Map java.util.Collections$UnmodifiableMap.m accessible: module 
> java.base does not "opens java.util" to unnamed module @17695df3
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.AccessibleObject.throwInaccessibleObjectException(AccessibleObject.java:391)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:367)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:315)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:183)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.Field.setAccessible(Field.java:177)
> Jan 26 01:29:27 E at 
> org.apache.flink.python.util.PythonConfigUtil.registerPythonBroadcastTransformationTranslator(PythonConfigUtil.java:357)
> Jan 26 01:29:27 E at 
> org.apache.flink.python.util.PythonConfigUtil.configPythonOperator(PythonConfigUtil.java:101)
> Jan 26 01:29:27 E at 
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
> Jan 26 01:29:27 E at 
> java.base/java.lang.reflect.Method.invoke(Method.java:580)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
> Jan 26 01:29:27 E at 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
> Jan 26 01:29:27 E at 
> java.base/java.lang.Thread.run(Thread.java:1583) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34200) AutoRescalingITCase#testCheckpointRescalingInKeyedState fails

2024-01-27 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-34200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811633#comment-17811633
 ] 

Zhu Zhu commented on FLINK-34200:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57024=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba

> AutoRescalingITCase#testCheckpointRescalingInKeyedState fails
> -
>
> Key: FLINK-34200
> URL: https://issues.apache.org/jira/browse/FLINK-34200
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
> Attachments: FLINK-34200.failure.log.gz
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56601=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=8200]
> {code:java}
> Jan 19 02:31:53 02:31:53.954 [ERROR] Tests run: 32, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 1050 s <<< FAILURE! -- in 
> org.apache.flink.test.checkpointing.AutoRescalingITCase
> Jan 19 02:31:53 02:31:53.954 [ERROR] 
> org.apache.flink.test.checkpointing.AutoRescalingITCase.testCheckpointRescalingInKeyedState[backend
>  = rocksdb, buffersPerChannel = 2] -- Time elapsed: 59.10 s <<< FAILURE!
> Jan 19 02:31:53 java.lang.AssertionError: expected:<[(0,8000), (0,32000), 
> (0,48000), (0,72000), (1,78000), (1,3), (1,54000), (0,2000), (0,1), 
> (0,5), (0,66000), (0,74000), (0,82000), (1,8), (1,0), (1,16000), 
> (1,24000), (1,4), (1,56000), (1,64000), (0,12000), (0,28000), (0,52000), 
> (0,6), (0,68000), (0,76000), (1,18000), (1,26000), (1,34000), (1,42000), 
> (1,58000), (0,6000), (0,14000), (0,22000), (0,38000), (0,46000), (0,62000), 
> (0,7), (1,4000), (1,2), (1,36000), (1,44000)]> but was:<[(0,8000), 
> (0,32000), (0,48000), (0,72000), (1,78000), (1,3), (1,54000), (0,2000), 
> (0,1), (0,5), (0,66000), (0,74000), (0,82000), (1,8), (1,0), 
> (1,16000), (1,24000), (1,4), (1,56000), (1,64000), (0,12000), (0,28000), 
> (0,52000), (0,6), (0,68000), (0,76000), (0,1000), (0,25000), (0,33000), 
> (0,41000), (1,18000), (1,26000), (1,34000), (1,42000), (1,58000), (0,6000), 
> (0,14000), (0,22000), (0,38000), (0,46000), (0,62000), (0,7), (1,4000), 
> (1,2), (1,36000), (1,44000)]>
> Jan 19 02:31:53   at org.junit.Assert.fail(Assert.java:89)
> Jan 19 02:31:53   at org.junit.Assert.failNotEquals(Assert.java:835)
> Jan 19 02:31:53   at org.junit.Assert.assertEquals(Assert.java:120)
> Jan 19 02:31:53   at org.junit.Assert.assertEquals(Assert.java:146)
> Jan 19 02:31:53   at 
> org.apache.flink.test.checkpointing.AutoRescalingITCase.testCheckpointRescalingKeyedState(AutoRescalingITCase.java:296)
> Jan 19 02:31:53   at 
> org.apache.flink.test.checkpointing.AutoRescalingITCase.testCheckpointRescalingInKeyedState(AutoRescalingITCase.java:196)
> Jan 19 02:31:53   at java.lang.reflect.Method.invoke(Method.java:498)
> Jan 19 02:31:53   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-34145) File source connector support dynamic source parallelism inference in batch jobs

2024-01-26 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34145.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
11631cb59568df60d40933fb13c8433062ed9290

> File source connector support dynamic source parallelism inference in batch 
> jobs
> 
>
> Key: FLINK-34145
> URL: https://issues.apache.org/jira/browse/FLINK-34145
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
>  has introduced support for dynamic source parallelism inference in batch 
> jobs, and we plan to give priority to enabling this feature for the file 
> source connector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34206) CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed

2024-01-26 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-34206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811231#comment-17811231
 ] 

Zhu Zhu commented on FLINK-34206:
-

Disabled {{CacheITCase.testRetryOnCorruptedClusterDataset}} temporarily via 
05ee359ebd564af3dd8ab31975cd479e92ba1785

> CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed
> ---
>
> Key: FLINK-34206
> URL: https://issues.apache.org/jira/browse/FLINK-34206
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: xingbe
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56728=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8763
> {code}
> Jan 23 01:39:48 01:39:48.152 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.24 s <<< FAILURE! -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> Jan 23 01:39:48 01:39:48.152 [ERROR] 
> org.apache.flink.test.streaming.runtime.CacheITCase.testRetryOnCorruptedClusterDataset(Path)
>  -- Time elapsed: 4.755 s <<< ERROR!
> Jan 23 01:39:48 org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
> Jan 23 01:39:48   at 
>

[jira] [Assigned] (FLINK-34206) CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed

2024-01-26 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu reassigned FLINK-34206:
---

Assignee: xingbe

> CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed
> ---
>
> Key: FLINK-34206
> URL: https://issues.apache.org/jira/browse/FLINK-34206
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Assignee: xingbe
>Priority: Blocker
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56728=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8763
> {code}
> Jan 23 01:39:48 01:39:48.152 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.24 s <<< FAILURE! -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> Jan 23 01:39:48 01:39:48.152 [ERROR] 
> org.apache.flink.test.streaming.runtime.CacheITCase.testRetryOnCorruptedClusterDataset(Path)
>  -- Time elapsed: 4.755 s <<< ERROR!
> Jan 23 01:39:48 org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
> Jan 23 01:39:48   at 
> org.apache.pekko.pattern.PromiseActorRef.$bang(AskSupport.scala:629)
> Jan

[jira] [Closed] (FLINK-34223) Introduce a migration tool to transfer legacy config file to new config file

2024-01-26 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34223.
---
  Assignee: Junrui Li
Resolution: Done

master/release-1.19:
8fceb101e7e45f3fdc9357019757230bd8c16aa7

> Introduce a migration tool to transfer legacy config file to new config file
> 
>
> Key: FLINK-34223
> URL: https://issues.apache.org/jira/browse/FLINK-34223
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Scripts
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> As we transition to new configuration files that adhere to the standard YAML 
> format, users are expected to manually migrate their existing config files. 
> However, this process can be error-prone and time-consuming.
> To simplify the migration, we're introducing an automated script. This script 
> leverages BashJavaUtils to efficiently convert old flink-conf.yaml files into 
> the new config file config.yaml, thereby reducing the effort required for 
> migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-34232) Config file unexpectedly lacks support for env.java.home

2024-01-26 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34232.
---
  Assignee: Junrui Li
Resolution: Fixed

Fixed in master/release-1.19:
e623c07f4e56fdef1bd8514ccd02df347af5b122

> Config file unexpectedly lacks support for env.java.home
> 
>
> Key: FLINK-34232
> URL: https://issues.apache.org/jira/browse/FLINK-34232
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> We removed the option to set JAVA_HOME in the config file with commit 
> [24091|https://github.com/apache/flink/pull/24091] to improve how we handle 
> standard YAML with BashJavaUtils. But since setting JAVA_HOME is a publicly 
> documented feature, we need to keep it available for users. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34206) CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed

2024-01-25 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-34206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811084#comment-17811084
 ] 

Zhu Zhu commented on FLINK-34206:
-

We will take a look.

> CacheITCase.testRetryOnCorruptedClusterDataset(Path) failed
> ---
>
> Key: FLINK-34206
> URL: https://issues.apache.org/jira/browse/FLINK-34206
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Blocker
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56728=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=8763
> {code}
> Jan 23 01:39:48 01:39:48.152 [ERROR] Tests run: 6, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.24 s <<< FAILURE! -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> Jan 23 01:39:48 01:39:48.152 [ERROR] 
> org.apache.flink.test.streaming.runtime.CacheITCase.testRetryOnCorruptedClusterDataset(Path)
>  -- Time elapsed: 4.755 s <<< ERROR!
> Jan 23 01:39:48 org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> Jan 23 01:39:48   at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
> Jan 23 01:39:48   at 
> org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
> Jan 23 01:39:48   at 
> org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
> Jan 23 01:39:48   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
> Jan 23 01:39:48   at 
> org.apache.pekko.pattern.PromiseActorRef.$bang(AskSupport.scala:629)
> Jan 23 01:39:48   at

[jira] [Closed] (FLINK-33577) Make "conf.yaml" as the default Flink configuration file

2024-01-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-33577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-33577.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
9721ce835f5a7f28f2ad187346e009633307097b

> Make "conf.yaml" as the default Flink configuration file
> 
>
> Key: FLINK-33577
> URL: https://issues.apache.org/jira/browse/FLINK-33577
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> This update ensures that the flink-dist package in FLINK will include the new 
> configuration file "conf.yaml" by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-34144) Update the documentation and configuration description about dynamic source parallelism inference

2024-01-25 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34144.
---
Fix Version/s: 1.19.0
   Resolution: Done

master/release-1.19:
78e31f0dcc4da65d88e18560f3374a47cc0a7c9b

> Update the documentation and configuration description about dynamic source 
> parallelism inference
> -
>
> Key: FLINK-34144
> URL: https://issues.apache.org/jira/browse/FLINK-34144
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.19.0
>Reporter: xingbe
>Assignee: xingbe
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> [FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs#FLIP379:Dynamicsourceparallelisminferenceforbatchjobs-IntroduceDynamicParallelismInferenceinterfaceforSource]
>  introduces the new feature of dynamic source parallelism inference, and we 
> plan to update the documentation and configuration items accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34229) Duplicate entry in InnerClasses attribute in class file FusionStreamOperator

2024-01-25 Thread Zhu Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-34229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810856#comment-17810856
 ] 

Zhu Zhu commented on FLINK-34229:
-

[~FrankZou] the query plan can be different between q35 of `flink-tpcds-test` 
and q35 of a 10TB TPC-DS benchmark. e.g., DPP or runtime filters may not be 
created in `flink-tpcds-test` due to the data size is very small.

> Duplicate entry in InnerClasses attribute in class file FusionStreamOperator
> 
>
> Key: FLINK-34229
> URL: https://issues.apache.org/jira/browse/FLINK-34229
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.19.0
>Reporter: xingbe
>Priority: Major
> Attachments: image-2024-01-24-17-05-47-883.png
>
>
> I noticed a runtime error happens in 10TB TPC-DS (q35.sql) benchmarks in 
> 1.19, the problem did not happen in 1.18.0. This issue may have been newly 
> introduced recently. !image-2024-01-24-17-05-47-883.png|width=589,height=279!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-34205) Update flink-docker's Dockerfile and docker-entrypoint.sh to use BashJavaUtils for Flink configuration management

2024-01-23 Thread Zhu Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-34205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-34205.
---
Fix Version/s: 1.19.0
   Resolution: Done

dev-master:
44f058287cc956a620b12b6f8ed214e44dc3db77

> Update flink-docker's Dockerfile and docker-entrypoint.sh to use 
> BashJavaUtils for Flink configuration management
> -
>
> Key: FLINK-34205
> URL: https://issues.apache.org/jira/browse/FLINK-34205
> Project: Flink
>  Issue Type: Sub-task
>  Components: flink-docker
>Reporter: Junrui Li
>Assignee: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The flink-docker's Dockerfile and docker-entrypoint.sh currently use shell 
> scripting techniques with grep and sed for configuration reading and 
> modification. This method is not suitable for the standard YAML configuration 
> format.
> Following the changes introduced in FLINK-33721, we should update 
> flink-docker's Dockerfile and docker-entrypoint.sh to use BashJavaUtils for 
> Flink configuration reading and writing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2416 matches

Mail list logo