[PR] [FLINK-33057] Add options to disable creating job-id subdirectories under the checkpoint directory [flink]

2023-10-11 Thread via GitHub


Zakelly opened a new pull request, #23509:
URL: https://github.com/apache/flink/pull/23509

   ## What is the purpose of the change
   
   By default, Flink creates subdirectories named by UUID (job id) under 
checkpoint directory for each job. It's a good means to avoid collision. 
However, it also bring in some effort to remember/find the right directory when 
recovering from previous checkpoint, as well as cleaning the old 
subdirectories. According to previous discussion ([Yun 
Tang's](https://issues.apache.org/jira/browse/FLINK-11789?focusedCommentId=16782314&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16782314)
 and [Stephan 
Ewen's](https://issues.apache.org/jira/browse/FLINK-9043?focusedCommentId=16409254&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16409254)
 ), I think it would be useful to add an option to disable creating the UUID 
subdirectories under the checkpoint directory. For compatibility 
considerations, we create the subdirectories by default.
   
   
   ## Brief change log
   
- Provide a new configuration named 'state.checkpoints.create-subdir' to 
control the behavior of creating job-id subdirectories under the checkpoint 
directory or not. This is an advanced option, which is well documented and 
warned.
   
   
   ## Verifying this change
   
   This change is already covered by newly added tests, a parameterized 
AbstractFileCheckpointStorageAccessTestBase#testCreateCheckpointSubDirs .
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), **Checkpointing**, Kubernetes/Yarn, ZooKeeper: (**yes** / no / 
don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (**yes** / no)
 - If yes, how is the feature documented? (not applicable / **docs** / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33057] Add options to disable creating job-id subdirectories under the checkpoint directory [flink]

2023-12-13 Thread via GitHub


Zakelly commented on code in PR #23509:
URL: https://github.com/apache/flink/pull/23509#discussion_r1425456803


##
flink-runtime/src/main/java/org/apache/flink/runtime/state/filesystem/FsStateBackend.java:
##
@@ -141,6 +141,12 @@ public class FsStateBackend extends 
AbstractFileStateBackend implements Configur
  */
 private final int writeBufferSize;
 
+/**
+ * Switch to create checkpoint sub-directory with name of jobId. A value 
of 'undefined' means
+ * not yet configured, in which case the default will be used.
+ */
+private TernaryBoolean createCheckpointSubDirs = TernaryBoolean.UNDEFINED;

Review Comment:
   Good point, I removed the TernaryBoolean here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33057] Add options to disable creating job-id subdirectories under the checkpoint directory [flink]

2023-12-13 Thread via GitHub


Zakelly commented on code in PR #23509:
URL: https://github.com/apache/flink/pull/23509#discussion_r1425459440


##
flink-runtime/src/main/java/org/apache/flink/runtime/state/memory/MemoryBackendCheckpointStorageAccess.java:
##
@@ -72,6 +73,7 @@ public MemoryBackendCheckpointStorageAccess(
 JobID jobId,
 @Nullable Path checkpointsBaseDirectory,
 @Nullable Path defaultSavepointLocation,
+boolean createCheckpointSubDirs,

Review Comment:
   I'd rather keep a single constructor and not providing such constructor, 
since it is more friendly for further developing. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33057] Add options to disable creating job-id subdirectories under the checkpoint directory [flink]

2023-12-13 Thread via GitHub


Zakelly commented on code in PR #23509:
URL: https://github.com/apache/flink/pull/23509#discussion_r1425468192


##
flink-core/src/main/java/org/apache/flink/configuration/CheckpointingOptions.java:
##
@@ -228,6 +228,32 @@ public class CheckpointingOptions {
 + "in a Flink supported filesystem. The 
storage path must be accessible from all participating processes/nodes"
 + "(i.e. all TaskManagers and 
JobManagers).");
 
+/**
+ * Whether to create sub-directories named by job id to store the data 
files and meta data of
+ * checkpoints. The default value is true to enable user could run several 
jobs with the same
+ * checkpoint directory at the same time. If this value is set to false, 
pay attention not to
+ * run several jobs with the same directory simultaneously.
+ */
+@Documentation.Section(Documentation.Sections.EXPERT_STATE_BACKENDS)
+public static final ConfigOption CREATE_CHECKPOINT_SUB_DIS =
+ConfigOptions.key("state.checkpoints.create-subdir")
+.booleanType()
+.defaultValue(true)
+.withDescription(
+Description.builder()
+.text(
+"Whether to create sub-directories 
named by job id under the '%s' to store the data files and meta data "
++ "of checkpoints. The 
default value is true to enable user could run several jobs with the same "
++ "checkpoint directory at 
the same time. If this value is set to false, pay attention not to "
++ "run several jobs with 
the same directory simultaneously. ",
+
TextElement.code(CHECKPOINTS_DIRECTORY.key()))
+.linebreak()
+.text(
+"WARNING: This is an advanced 
configuration. If set to false, users must ensure that no multiple jobs are run 
"

Review Comment:
   I'm afraid there are some misunderstanding here. The text here is trying to 
tell user to keep this directory for only one job use, no matter what the 
recovery mode is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33057] Add options to disable creating job-id subdirectories under the checkpoint directory [flink]

2023-12-13 Thread via GitHub


Zakelly commented on PR #23509:
URL: https://github.com/apache/flink/pull/23509#issuecomment-1854062776

   @masteryhx  @Myasuka Thanks for your review! I have addressed your comments, 
would you please take another look? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33057] Add options to disable creating job-id subdirectories under the checkpoint directory [flink]

2024-01-14 Thread via GitHub


Myasuka commented on code in PR #23509:
URL: https://github.com/apache/flink/pull/23509#discussion_r1451744345


##
flink-runtime/src/main/java/org/apache/flink/runtime/state/memory/MemoryBackendCheckpointStorageAccess.java:
##
@@ -72,6 +73,7 @@ public MemoryBackendCheckpointStorageAccess(
 JobID jobId,
 @Nullable Path checkpointsBaseDirectory,
 @Nullable Path defaultSavepointLocation,
+boolean createCheckpointSubDirs,

Review Comment:
   I have no strong against here. We can keep the current design.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33057] Add options to disable creating job-id subdirectories under the checkpoint directory [flink]

2023-10-31 Thread via GitHub


Zakelly commented on PR #23509:
URL: https://github.com/apache/flink/pull/23509#issuecomment-1786800877

   @masteryhx   @Myasuka   What do you think about adding this option?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33057] Add options to disable creating job-id subdirectories under the checkpoint directory [flink]

2023-11-06 Thread via GitHub


masteryhx commented on code in PR #23509:
URL: https://github.com/apache/flink/pull/23509#discussion_r1382969491


##
flink-core/src/main/java/org/apache/flink/configuration/CheckpointingOptions.java:
##
@@ -228,6 +228,32 @@ public class CheckpointingOptions {
 + "in a Flink supported filesystem. The 
storage path must be accessible from all participating processes/nodes"
 + "(i.e. all TaskManagers and 
JobManagers).");
 
+/**
+ * Whether to create sub-directories named by job id to store the data 
files and meta data of
+ * checkpoints. The default value is true to enable user could run several 
jobs with the same
+ * checkpoint directory at the same time. If this value is set to false, 
pay attention not to
+ * run several jobs with the same directory simultaneously.
+ */
+@Documentation.Section(Documentation.Sections.EXPERT_STATE_BACKENDS)
+public static final ConfigOption CREATE_CHECKPOINT_SUB_DIS =

Review Comment:
   ```suggestion
   public static final ConfigOption CREATE_CHECKPOINT_SUB_DIR =
   ```



##
flink-core/src/main/java/org/apache/flink/configuration/CheckpointingOptions.java:
##
@@ -228,6 +228,32 @@ public class CheckpointingOptions {
 + "in a Flink supported filesystem. The 
storage path must be accessible from all participating processes/nodes"
 + "(i.e. all TaskManagers and 
JobManagers).");
 
+/**
+ * Whether to create sub-directories named by job id to store the data 
files and meta data of
+ * checkpoints. The default value is true to enable user could run several 
jobs with the same
+ * checkpoint directory at the same time. If this value is set to false, 
pay attention not to
+ * run several jobs with the same directory simultaneously.
+ */
+@Documentation.Section(Documentation.Sections.EXPERT_STATE_BACKENDS)
+public static final ConfigOption CREATE_CHECKPOINT_SUB_DIS =
+ConfigOptions.key("state.checkpoints.create-subdir")
+.booleanType()
+.defaultValue(true)
+.withDescription(
+Description.builder()
+.text(
+"Whether to create sub-directories 
named by job id under the '%s' to store the data files and meta data "
++ "of checkpoints. The 
default value is true to enable user could run several jobs with the same "
++ "checkpoint directory at 
the same time. If this value is set to false, pay attention not to "
++ "run several jobs with 
the same directory simultaneously. ",
+
TextElement.code(CHECKPOINTS_DIRECTORY.key()))
+.linebreak()
+.text(
+"WARNING: This is an advanced 
configuration. If set to false, users must ensure that no multiple jobs are run 
"

Review Comment:
   I'd suggest to also describe this together with restore mode which will make 
it clearer.
   If in claim mode, it's fine to go with this option.
   If in no-claim mode, What will happen ? Maybe some files are left forever 
together with running checkpoint which seems confusing and unclear.
   If starting without checkpoint,  the job may fail due to same chk file so it 
should be handled by users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33057] Add options to disable creating job-id subdirectories under the checkpoint directory [flink]

2023-12-11 Thread via GitHub


Myasuka commented on code in PR #23509:
URL: https://github.com/apache/flink/pull/23509#discussion_r1423537299


##
flink-runtime/src/main/java/org/apache/flink/runtime/state/memory/MemoryBackendCheckpointStorageAccess.java:
##
@@ -72,6 +73,7 @@ public MemoryBackendCheckpointStorageAccess(
 JobID jobId,
 @Nullable Path checkpointsBaseDirectory,
 @Nullable Path defaultSavepointLocation,
+boolean createCheckpointSubDirs,

Review Comment:
   How about introducing another constructor to make `createCheckpointSubDirs` 
as default `true`? If so, we can avoid touching many test code in this PR.



##
flink-runtime/src/main/java/org/apache/flink/runtime/state/filesystem/FsStateBackend.java:
##
@@ -141,6 +141,12 @@ public class FsStateBackend extends 
AbstractFileStateBackend implements Configur
  */
 private final int writeBufferSize;
 
+/**
+ * Switch to create checkpoint sub-directory with name of jobId. A value 
of 'undefined' means
+ * not yet configured, in which case the default will be used.
+ */
+private TernaryBoolean createCheckpointSubDirs = TernaryBoolean.UNDEFINED;

Review Comment:
   I wonder do we really need to introduce an undefined Boolean here in such 
way.
   Currently, we only have one way to set this value within a private 
constructor:
   ~~~java
   this.createCheckpointSubDirs =
   original.createCheckpointSubDirs.resolveUndefined(
   configuration.get(CheckpointingOptions.CREATE_CHECKPOINT_SUB_DIS));
   ~~~
   
   However, there is no public constructor or setter to set the 
`createCheckpointSubDirs` in the `original` one, that is to say we will always 
get a `TernaryBoolean.UNDEFINED` `createCheckpointSubDirs` in the original.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33057] Add options to disable creating job-id subdirectories under the checkpoint directory [flink]

2023-10-11 Thread via GitHub


flinkbot commented on PR #23509:
URL: https://github.com/apache/flink/pull/23509#issuecomment-1757349070

   
   ## CI report:
   
   * f6a18abafcb7272bc78559d52c6525fcdc526915 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org