[ https://issues.apache.org/jira/browse/FLINK-37375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937775#comment-17937775 ]
Zakelly Lan commented on FLINK-37375: ------------------------------------- [~hejufang001] But it still required to be finished before the checkpoint marked complete, right? Otherwise the it won't affect Flink in any ways, there is no need to introduce such method. > Checkpoint supports the Operator to customize asynchronous snapshot state > ------------------------------------------------------------------------- > > Key: FLINK-37375 > URL: https://issues.apache.org/jira/browse/FLINK-37375 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.20.1 > Reporter: Jufang He > Priority: Major > Labels: pull-request-available > > In some Flink task operators, slow operations such as file uploads or data > flushing may be performed during the synchronous phase of Checkpoint. Due to > performance issues with external storage components, the synchronous phase > may take too long to execute, significantly impacting the job's throughput. > For example, during our internal use of Paimon, we observed that uploading > files to HDFS during the Checkpoint synchronous phase could encounter random > HDFS slow node issues, leading to a substantial negative impact on task > throughput. > To address this issue, I propose supporting a generic operator custom > asynchronous snapshot feature, allowing users to move time-consuming logic to > the asynchronous phase of Checkpoint, thereby minimizing the blocking of the > main thread and improving task throughput. For instance, the Paimon writer > operator could write data locally during the Checkpoint synchronous phase and > upload files to remote storage during the asynchronous phase. Beyond the > Paimon data upload scenario, other operator logic may also experience slow > execution during the synchronous phase. This approach aims to uniformly > optimize such issues. > I drafted a flip for this issue: > [https://docs.google.com/document/d/1lwxLEQjD6jVhZUBMRGhzQNWKSvdbPbYNQsV265gR4kw] > -- This message was sent by Atlassian Jira (v8.20.10#820010)