Zakelly Lan created FLINK-35624: ----------------------------------- Summary: Release Testing: Verify FLIP-306 Unified File Merging Mechanism for Checkpoints Key: FLINK-35624 URL: https://issues.apache.org/jira/browse/FLINK-35624 Project: Flink Issue Type: Sub-task Components: Runtime / Checkpointing Reporter: Zakelly Lan Fix For: 1.20.0
Follow up the test for https://issues.apache.org/jira/browse/FLINK-32070 1.20 is the MVP version for FLIP-306. It is a little bit complex and should be tested carefully. The main idea of FLIP-306 is to merge checkpoint files in TM side, and provide new {{{}StateHandle{}}}s to the JM. There will be a TM-managed directory under the 'shared' checkpoint directory for each subtask, and a TM-managed directory under the 'taskowned' checkpoint directory for each Task Manager. Under those new introduced directories, the checkpoint files will be merged into smaller file set. The following scenarios need to be tested, including but not limited to: # With the file merging enabled, periodic checkpoints perform properly, and the failover, restore and rescale would also work well. # Switch the file merging on and off across jobs, checkpoints and recovery also work properly. # There will be no left-over TM-managed directory, especially when there is no cp complete before the job cancellation. # File merging takes no effect in (native) savepoints. Besides the behaviors above, it is better to validate the function of space amplification control and metrics. All the config options can be found under 'execution.checkpointing.file-merging'. -- This message was sent by Atlassian Jira (v8.20.10#820010)