[GitHub] [hadoop] steveloughran commented on a diff in pull request #6006: HADOOP-18797: Support Concurrent Writes With S3A Magic Committer

2023-09-19 Thread via GitHub


steveloughran commented on code in PR #6006:
URL: https://github.com/apache/hadoop/pull/6006#discussion_r1329969310


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/impl/CommitUtilsWithMR.java:
##
@@ -49,10 +50,11 @@ private CommitUtilsWithMR() {
   /**
* Get the location of magic job attempts.
* @param out the base output directory.
+   * @param jobUUID unique Job ID.
* @return the location of magic job attempts.
*/
-  public static Path getMagicJobAttemptsPath(Path out) {
-return new Path(out, MAGIC);
+  public static Path getMagicJobAttemptsPath(Path out, String jobUUID) {
+return new Path(out, MAGIC_PATH_PREFIX + jobUUID);

Review Comment:
   let's add a Precondition to check that the uuid isn't null/empty. I know 
it's unlikely, but if something goes very wrong then duplicate paths get in
   ```java
   Preconditions.checkArgument(jobUUID != null && !(jobUUID.isEmpty()),
"Invalid job ID: %s", jobUUID);
   ```
   



##
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md:
##
@@ -1694,7 +1694,7 @@ must be used, which means: the V2 classes.
 
 **Magic Committer: Name of directory**

Review Comment:
   replace with
   ```
   **Magic Committer: Directory Naming**
   ```
   to reflect changes in text



##
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md:
##
@@ -1694,7 +1694,7 @@ must be used, which means: the V2 classes.
 
 **Magic Committer: Name of directory**
 
-The design proposes the name `__magic` for the directory. HDFS and
+The design proposes the name ``"MAGIC PATH"`` for the directory. HDFS and

Review Comment:
   change this to 
   ```
   proposes the prefix  "__magic_job-" as the prefix for the magic paths of 
different jobs. 
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on a diff in pull request #6006: HADOOP-18797: Support Concurrent Writes With S3A Magic Committer

2023-09-14 Thread via GitHub


steveloughran commented on code in PR #6006:
URL: https://github.com/apache/hadoop/pull/6006#discussion_r1326267233


##
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/magic/ITestMagicCommitProtocol.java:
##
@@ -94,11 +95,11 @@ public MagicS3GuardCommitter createFailingCommitter(
   }
 
   protected void validateTaskAttemptPathDuringWrite(Path p,
-  final long expectedLength) throws IOException {
+  final long expectedLength, String jobId) throws IOException {
 String pathStr = p.toString();
 Assertions.assertThat(pathStr)
 .describedAs("Magic path")
-.contains(MAGIC);
+.contains(MAGIC_PATH_PREFIX + jobId);

Review Comment:
   we should really look for / at both ends of this string, for strictness, 
shouldn't we?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on a diff in pull request #6006: HADOOP-18797: Support Concurrent Writes With S3A Magic Committer

2023-09-12 Thread via GitHub


steveloughran commented on code in PR #6006:
URL: https://github.com/apache/hadoop/pull/6006#discussion_r1323354035


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/MagicCommitPaths.java:
##
@@ -96,7 +101,18 @@ public static boolean containsBasePath(List 
elements) {
* @throws IllegalArgumentException if there is no magic element
*/
   public static int magicElementIndex(List elements) {
-int index = elements.indexOf(MAGIC);
+int index = 0;

Review Comment:
   actually, you could coalesce this with a L114-115 some method to get the 
index without raising an exception; isMagicPath(elements) calls this and 
returns true if the index > 0. or you do very fancy java8 Optional stuff...



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/MagicCommitPaths.java:
##
@@ -208,8 +215,8 @@ public static List finalDestination(List 
elements) {
 if (isMagicPath(elements)) {
   List destDir = magicPathParents(elements);
   List children = magicPathChildren(elements);
-  checkArgument(!children.isEmpty(), "No path found under " +
-  MAGIC);
+  checkArgument(!children.isEmpty(), "No path found under the prefix" +

Review Comment:
   needs a space at the end



##
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/integration/ITestS3ACommitterMRJob.java:
##
@@ -586,7 +586,7 @@ private MagicCommitterTestBinding() {
 protected void validateResult(final Path destPath,
 final SuccessData successData)
 throws Exception {
-  Path magicDir = new Path(destPath, MAGIC);
+  Path magicDir = new Path(destPath, MAGIC_PATH_PREFIX);

Review Comment:
   is a job id getting down? it should be, for completeness



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/MagicCommitPaths.java:
##
@@ -76,7 +76,12 @@ public static List splitPathToElements(Path path) {
* @return true if a path is considered magic
*/
   public static boolean isMagicPath(List elements) {
-return elements.contains(MAGIC);
+for (String element : elements) {

Review Comment:
   or use some java8 stream thing?



##
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/ITestCommitOperationCost.java:
##
@@ -151,10 +151,10 @@ public void testMagicMkdir() throws Throwable {
*/
   @Test
   public void testMagicMkdirs() throws Throwable {
-describe("Mkdirs __magic/subdir always skips dir marker deletion");
+describe("Mkdirs __magic_job-/subdir always skips dir marker deletion");
 S3AFileSystem fs = getFileSystem();
 Path baseDir = methodPath();
-Path magicDir = new Path(baseDir, MAGIC);
+Path magicDir = new Path(baseDir, MAGIC_PATH_PREFIX);

Review Comment:
   for this test use the base prefix so we can see that it is still recognised



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java:
##
@@ -131,16 +134,19 @@ protected ActiveCommit listPendingUploadsToCommit(
* Delete the magic directory.
*/
   public void cleanupStagingDirs() {
-final Path out = getOutputPath();
-Path path = magicSubdir(out);
-try(DurationInfo ignored = new DurationInfo(LOG, true,
-"Deleting magic directory %s", path)) {
-  Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", 
path.toString(),
-  () -> deleteWithWarning(getDestFS(), path, true));
-  // and the job temp directory with manifests
-  Invoker.ignoreIOExceptions(LOG, "cleanup job directory", path.toString(),
-  () -> deleteWithWarning(getDestFS(),
-  new Path(out, TEMP_DATA), true));
+if (cleanupMagicDirectory) {
+  final Path out = getOutputPath();
+  Path path = getMagicJobPath(getUUID(), out);
+  try (DurationInfo ignored = new DurationInfo(LOG, true,
+  "Deleting magic directory %s", path)) {
+Invoker.ignoreIOExceptions(LOG, "cleanup magic directory", 
path.toString(),
+() -> deleteWithWarning(getDestFS(), path, true));
+// and the job temp directory with manifests
+Invoker.ignoreIOExceptions(LOG, "cleanup job directory", 
path.toString(),

Review Comment:
   job dir cleanup MUST always happen; make the root path cleanup the optional 
one



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org