Zakelly commented on code in PR #24766:
URL: https://github.com/apache/flink/pull/24766#discussion_r1601510494


##########
docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md:
##########
@@ -292,4 +292,25 @@ The final checkpoint would be triggered immediately after 
all operators have rea
 without waiting for periodic triggering, but the job will need to wait for 
this final checkpoint 
 to be completed.
 
+## Unify file merging mechanism for checkpoints
+
+The unified file merging mechanism for checkpointing is introduced to Flink 
1.20 as an MVP ("minimum viable product") feature, 
+which allows scattered small checkpoint files to be written into a single 
file, reducing the number of file creations 
+and file deletions, helping to alleviate the pressure of file system metadata 
management and file flooding problem. 
+The unified fie merging mechanism can be enabled by setting the property 
`state.checkpoints.file-merging.enabled` to `true`.
+**Note** that enabling this mechanism may lead to space amplification, that 
is, the actual occupation on the file system
+will be larger than actual state size. 
`state.checkpoints.file-merging.max-space-amplification` 
+can be used to limit the upper bound of space amplification.
+
+This mechanism is applicable to keyed state, operator state and channel state 
in Flink. Subtask level granular merging is 
+provided for shared scope state; TaskManager-level granular merging is 
provided for private scope state. The maximum number of subtasks
+allowed to be written to a single file can be configured through the 
`state.checkpoints.file-merging.max-subtasks-per-file` option.
+
+The unified fie merging mechanism also supports file merging across 
checkpoints, which can be enabled by setting
+`state.checkpoints.file-merging.across-checkpoint-boundary` to `true`.
+
+This mechanism introduces a file pool to handle concurrent writing scenarios. 
The blocking mode can be

Review Comment:
   ```suggestion
   This mechanism introduces a file pool to handle concurrent writing 
scenarios. There are two modes....... The blocking mode...... while the 
non-blocking modes...... . This can be configured via ``.
   ```
   Add some description to mode? instead of talking about enabling the option.



##########
docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md:
##########
@@ -292,4 +292,25 @@ The final checkpoint would be triggered immediately after 
all operators have rea
 without waiting for periodic triggering, but the job will need to wait for 
this final checkpoint 
 to be completed.
 
+## Unify file merging mechanism for checkpoints
+
+The unified file merging mechanism for checkpointing is introduced to Flink 
1.20 as an MVP ("minimum viable product") feature, 
+which allows scattered small checkpoint files to be written into a single 
file, reducing the number of file creations 
+and file deletions, helping to alleviate the pressure of file system metadata 
management and file flooding problem. 

Review Comment:
   ```suggestion
   and file deletions, which alleviates the pressure of file system metadata 
management raised by the file flooding problem during checkpoints.
   ```



##########
docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md:
##########
@@ -292,4 +292,25 @@ The final checkpoint would be triggered immediately after 
all operators have rea
 without waiting for periodic triggering, but the job will need to wait for 
this final checkpoint 
 to be completed.
 
+## Unify file merging mechanism for checkpoints
+
+The unified file merging mechanism for checkpointing is introduced to Flink 
1.20 as an MVP ("minimum viable product") feature, 
+which allows scattered small checkpoint files to be written into a single 
file, reducing the number of file creations 
+and file deletions, helping to alleviate the pressure of file system metadata 
management and file flooding problem. 
+The unified fie merging mechanism can be enabled by setting the property 
`state.checkpoints.file-merging.enabled` to `true`.

Review Comment:
   ```suggestion
   The mechanism can be enabled by setting 
`state.checkpoints.file-merging.enabled` to `true`.
   ```



##########
docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md:
##########
@@ -292,4 +292,25 @@ The final checkpoint would be triggered immediately after 
all operators have rea
 without waiting for periodic triggering, but the job will need to wait for 
this final checkpoint 
 to be completed.
 
+## Unify file merging mechanism for checkpoints

Review Comment:
   How about adding `(Experimental)` in title.



##########
docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md:
##########
@@ -292,4 +292,25 @@ The final checkpoint would be triggered immediately after 
all operators have rea
 without waiting for periodic triggering, but the job will need to wait for 
this final checkpoint 
 to be completed.
 
+## Unify file merging mechanism for checkpoints
+
+The unified file merging mechanism for checkpointing is introduced to Flink 
1.20 as an MVP ("minimum viable product") feature, 
+which allows scattered small checkpoint files to be written into a single 
file, reducing the number of file creations 

Review Comment:
   ```suggestion
   which allows scattered small checkpoint files to be written into larger 
files, reducing the number of file creations 
   ```



##########
docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md:
##########
@@ -292,4 +292,25 @@ The final checkpoint would be triggered immediately after 
all operators have rea
 without waiting for periodic triggering, but the job will need to wait for 
this final checkpoint 
 to be completed.
 
+## Unify file merging mechanism for checkpoints
+
+The unified file merging mechanism for checkpointing is introduced to Flink 
1.20 as an MVP ("minimum viable product") feature, 
+which allows scattered small checkpoint files to be written into a single 
file, reducing the number of file creations 
+and file deletions, helping to alleviate the pressure of file system metadata 
management and file flooding problem. 
+The unified fie merging mechanism can be enabled by setting the property 
`state.checkpoints.file-merging.enabled` to `true`.
+**Note** that enabling this mechanism may lead to space amplification, that 
is, the actual occupation on the file system
+will be larger than actual state size. 
`state.checkpoints.file-merging.max-space-amplification` 
+can be used to limit the upper bound of space amplification.
+
+This mechanism is applicable to keyed state, operator state and channel state 
in Flink. Subtask level granular merging is 

Review Comment:
   
   ```suggestion
   This mechanism is applicable to keyed state, operator state and channel 
state in Flink. Merging at subtask level is 
   ```



##########
docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md:
##########
@@ -292,4 +292,25 @@ The final checkpoint would be triggered immediately after 
all operators have rea
 without waiting for periodic triggering, but the job will need to wait for 
this final checkpoint 
 to be completed.
 
+## Unify file merging mechanism for checkpoints
+
+The unified file merging mechanism for checkpointing is introduced to Flink 
1.20 as an MVP ("minimum viable product") feature, 
+which allows scattered small checkpoint files to be written into a single 
file, reducing the number of file creations 
+and file deletions, helping to alleviate the pressure of file system metadata 
management and file flooding problem. 
+The unified fie merging mechanism can be enabled by setting the property 
`state.checkpoints.file-merging.enabled` to `true`.
+**Note** that enabling this mechanism may lead to space amplification, that 
is, the actual occupation on the file system

Review Comment:
   ```suggestion
   **Note** that as a trade-off, enabling this mechanism may lead to space 
amplification, that is, the actual occupation on the file system
   ```



##########
docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md:
##########
@@ -292,4 +292,25 @@ The final checkpoint would be triggered immediately after 
all operators have rea
 without waiting for periodic triggering, but the job will need to wait for 
this final checkpoint 
 to be completed.
 
+## Unify file merging mechanism for checkpoints
+
+The unified file merging mechanism for checkpointing is introduced to Flink 
1.20 as an MVP ("minimum viable product") feature, 
+which allows scattered small checkpoint files to be written into a single 
file, reducing the number of file creations 
+and file deletions, helping to alleviate the pressure of file system metadata 
management and file flooding problem. 
+The unified fie merging mechanism can be enabled by setting the property 
`state.checkpoints.file-merging.enabled` to `true`.
+**Note** that enabling this mechanism may lead to space amplification, that 
is, the actual occupation on the file system
+will be larger than actual state size. 
`state.checkpoints.file-merging.max-space-amplification` 
+can be used to limit the upper bound of space amplification.
+
+This mechanism is applicable to keyed state, operator state and channel state 
in Flink. Subtask level granular merging is 
+provided for shared scope state; TaskManager-level granular merging is 
provided for private scope state. The maximum number of subtasks
+allowed to be written to a single file can be configured through the 
`state.checkpoints.file-merging.max-subtasks-per-file` option.
+
+The unified fie merging mechanism also supports file merging across 
checkpoints, which can be enabled by setting
+`state.checkpoints.file-merging.across-checkpoint-boundary` to `true`.

Review Comment:
   ```suggestion
   This feature also supports merging files across checkpoints. To enable this, 
set 
   `state.checkpoints.file-merging.across-checkpoint-boundary` to `true`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to