Zakelly Lan created FLINK-35624:
-----------------------------------

             Summary: Release Testing: Verify FLIP-306 Unified File Merging 
Mechanism for Checkpoints
                 Key: FLINK-35624
                 URL: https://issues.apache.org/jira/browse/FLINK-35624
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Checkpointing
            Reporter: Zakelly Lan
             Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-32070

 

1.20 is the MVP version for FLIP-306. It is a little bit complex and should be 
tested carefully. The main idea of FLIP-306 is to merge checkpoint files in TM 
side, and provide new {{{}StateHandle{}}}s to the JM. There will be a 
TM-managed directory under the 'shared' checkpoint directory for each subtask, 
and a TM-managed directory under the 'taskowned' checkpoint directory for each 
Task Manager. Under those new introduced directories, the checkpoint files will 
be merged into smaller file set. The following scenarios need to be tested, 
including but not limited to:
 # With the file merging enabled, periodic checkpoints perform properly, and 
the failover, restore and rescale would also work well.
 # Switch the file merging on and off across jobs, checkpoints and recovery 
also work properly.
 # There will be no left-over TM-managed directory, especially when there is no 
cp complete before the job cancellation.
 # File merging takes no effect in (native) savepoints.

Besides the behaviors above, it is better to validate the function of space 
amplification control and metrics. All the config options can be found under 
'execution.checkpointing.file-merging'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to