[ https://issues.apache.org/jira/browse/FLINK-35603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weijie Guo updated FLINK-35603: ------------------------------- Description: Follow up the test for https://issues.apache.org/jira/browse/FLINK-35533 In 1.20, we introduced a batch job recovery mechanism to enable batch jobs to recover as much progress as possible after a JobMaster failover, avoiding the need to rerun tasks that have already been finished. More information about this feature and how to enable it could be found in: [https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/recovery_from_job_master_failure/] We may need the following tests: # Start a batch job with High Availability (HA) enabled, and after it has progressed to a certain point, kill the JobManager (jm), then observe whether the job recovers its progress normally. # Use a custom source and ensure that its SplitEnumerator implements the SupportsBatchSnapshot interface, submit the job, and after it has progressed to a certain point, kill the JobManager (jm), then observe whether the job recovers its progress normally. Follow up the test for https://issues.apache.org/jira/browse/FLINK-33892 was:Follow up the test for https://issues.apache.org/jira/browse/FLINK-35533 > Release Testing Instructions: Verify FLINK-35533(FLIP-459): Support Flink > hybrid shuffle integration with Apache Celeborn > ------------------------------------------------------------------------------------------------------------------------- > > Key: FLINK-35603 > URL: https://issues.apache.org/jira/browse/FLINK-35603 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Network > Reporter: Rui Fan > Assignee: Yuxin Tan > Priority: Blocker > Labels: release-testing > Fix For: 1.20.0 > > > Follow up the test for https://issues.apache.org/jira/browse/FLINK-35533 > In 1.20, we introduced a batch job recovery mechanism to enable batch jobs to > recover as much progress as possible after a JobMaster failover, avoiding the > need to rerun tasks that have already been finished. > More information about this feature and how to enable it could be found in: > [https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/recovery_from_job_master_failure/] > We may need the following tests: > # Start a batch job with High Availability (HA) enabled, and after it has > progressed to a certain point, kill the JobManager (jm), then observe whether > the job recovers its progress normally. > # Use a custom source and ensure that its SplitEnumerator implements the > SupportsBatchSnapshot interface, submit the job, and after it has progressed > to a certain point, kill the JobManager (jm), then observe whether the job > recovers its progress normally. > > Follow up the test for https://issues.apache.org/jira/browse/FLINK-33892 -- This message was sent by Atlassian Jira (v8.20.10#820010)