[
https://issues.apache.org/jira/browse/FLINK-32070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087875#comment-18087875
]
Rui Fan commented on FLINK-32070:
---------------------------------
Hey [~zakelly] [~masteryhx] [~lijinzhong] cc [~ym]
while reading the discussion of [DISCUSS] FLIP-XXX: Independent Checkpoint
Based On Pipeline Region
[https://lists.apache.org/thread/qpztk0jdpcmhomszjx63l53xv26xnmwf] . I am
thinking if Unified File Merging Mechanism is stable? and could
execution.checkpointing.unaligned.max-subtasks-per-channel-state-file(FLINK-26803)
be deprecated or removed?
During reading the code, I noticed a {{TODO}} in the
[code|https://github.com/apache/flink/blob/f03c904426853ad3a62883d196b4f6b07c7ef365/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/filemerging/SubtaskFileMergingManagerRestoreOperation.java#L92]
that does not seem to have any JIRA tracking it, and I'd like to confirm
whether it is a known gap. In
{{SubtaskFileMergingManagerRestoreOperation#restore()}} there is:
{{}}
{code:java}
{code}
{{// TODO support channel state restore for unaligned checkpoint.}}
This TODO was introduced by FLINK-32080. Meanwhile FLINK-32084 ("Migrate
current file merging of channel state into the file merging framework") is
already Closed/Resolved, but this restore-side gap is still left open in the
code with no follow-up JIRA. Is there a ticket tracking it that I missed?
Following is a Claude analysis of what happens when {{file-merging.enabled =
true}} for unaligned checkpoints:
{code:java}
- Channel state goes through file merging on the write path
(ChannelStateCheckpointWriter → SegmentFileStateHandle), so its segments can
share the same EXCLUSIVE physical file as keyed/operator state.
- On restore, SubtaskFileMergingManagerRestoreOperation#restore() registers
only keyed/operator handles and filters out channel state (the TODO), so the
physical file's reference count is under-counted.
- Reading still works initially, but once a later checkpoint discards the
keyed/operator handles in that file, the ref count can drop to zero and the
file gets deleted while channel state still references it — breaking a
subsequent restore. {code}
My questions:
# Is this a real bug? If yes, is there a JIRA tracking it — or should we open
one?
# If it's not a bug and channel state restore is actually stable, can
FLINK-26803
({{{}execution.checkpointing.unaligned.max-subtasks-per-channel-state-file{}}})
be deprecated/removed?
# If there's no risk, should {{file-merging.enabled}} be turned on by default
in a future release since it has been introduced for a couple of years?
Please correct me directly if the analysis is wrong. Thanks!
> FLIP-306 Unified File Merging Mechanism for Checkpoints
> -------------------------------------------------------
>
> Key: FLINK-32070
> URL: https://issues.apache.org/jira/browse/FLINK-32070
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Checkpointing, Runtime / State Backends
> Reporter: Zakelly Lan
> Assignee: Zakelly Lan
> Priority: Major
> Fix For: 2.4.0
>
>
> The FLIP:
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-306%3A+Unified+File+Merging+Mechanism+for+Checkpoints]
>
> The creation of multiple checkpoint files can lead to a 'file flood' problem,
> in which a large number of files are written to the checkpoint storage in a
> short amount of time. This can cause issues in large clusters with high
> workloads, such as the creation and deletion of many files increasing the
> amount of file meta modification on DFS, leading to single-machine hotspot
> issues for meta maintainers (e.g. NameNode in HDFS). Additionally, the
> performance of object storage (e.g. Amazon S3 and Alibaba OSS) can
> significantly decrease when listing objects, which is necessary for object
> name de-duplication before creating an object, further affecting the
> performance of directory manipulation in the file system's perspective of
> view (See [hadoop-aws module
> documentation|https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#:~:text=an%20intermediate%20state.-,Warning%20%232%3A%20Directories%20are%20mimicked,-The%20S3A%20clients],
> section 'Warning #2: Directories are mimicked').
> While many solutions have been proposed for individual types of state files
> (e.g. FLINK-11937 for keyed state (RocksDB) and FLINK-26803 for channel
> state), the file flood problems from each type of checkpoint file are similar
> and lack systematic view and solution. Therefore, the goal of this FLIP is to
> establish a unified file merging mechanism to address the file flood problem
> during checkpoint creation for all types of state files, including keyed,
> non-keyed, channel, and changelog state. This will significantly improve the
> system stability and availability of fault tolerance in Flink.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)