Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-03-26 Thread via GitHub


hadoop-yetus commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-2020947224

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 51s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  49m  2s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m  8s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  38m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 20s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/8/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 1 new + 13 unchanged - 0 fixed 
= 14 total (was 13)  |
   | +1 :green_heart: |  mvnsite  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m  9s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  38m 51s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 49s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 38s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 144m 22s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6468 |
   | JIRA Issue | HADOOP-19047 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux 6a4507cb7348 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 8d739beced49d1469593671cd7412b4662ca2f04 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/8/testReport/ |
   | Max. process+thread count | 527 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/8/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 

Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-03-26 Thread via GitHub


steveloughran merged PR #6468:
URL: https://github.com/apache/hadoop/pull/6468


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-03-26 Thread via GitHub


shameersss1 commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-2020548007

   @steveloughran  - I have addressed your comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-03-26 Thread via GitHub


hadoop-yetus commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-2020230242

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  49m 35s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 33s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m  8s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  38m 34s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 15s | 
[/patch-mvninstall-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/artifact/out/patch-mvninstall-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch failed.  |
   | -1 :x: |  compile  |   0m 16s | 
[/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/artifact/out/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  hadoop-aws in the patch failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | -1 :x: |  javac  |   0m 16s | 
[/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/artifact/out/patch-compile-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  hadoop-aws in the patch failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | -1 :x: |  compile  |   0m 16s | 
[/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/artifact/out/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  hadoop-aws in the patch failed with JDK Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.  |
   | -1 :x: |  javac  |   0m 16s | 
[/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/artifact/out/patch-compile-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  hadoop-aws in the patch failed with JDK Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 15s | 
[/buildtool-patch-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/artifact/out/buildtool-patch-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  The patch fails to run checkstyle in hadoop-aws  |
   | -1 :x: |  mvnsite  |   0m 17s | 
[/patch-mvnsite-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/artifact/out/patch-mvnsite-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch failed.  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | -1 :x: |  javadoc  |   0m 17s | 
[/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/7/artifact/out/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  hadoop-aws in the patch failed with JDK Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.  |
   | -1 :x: |  spotbugs  |   0m 16s | 

Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-03-26 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1539051935


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/impl/CommitOperations.java:
##
@@ -584,7 +584,7 @@ public SinglePendingCommit uploadFileToPendingCommit(File 
localFile,
   destKey,
   uploadId,
   partNumber,
-  size).build();
+  size).build();x

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-03-26 Thread via GitHub


steveloughran commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1539027620


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/impl/CommitOperations.java:
##
@@ -584,7 +584,7 @@ public SinglePendingCommit uploadFileToPendingCommit(File 
localFile,
   destKey,
   uploadId,
   partNumber,
-  size).build();
+  size).build();x

Review Comment:
   typo



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-03-26 Thread via GitHub


shameersss1 commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-2019942477

   @steveloughran  - Thanks a lot for the detailed review. I have addressed 
your comments. 
   > note, you will need to a followup in the docs -but we can get this in and 
tested while you do that...
   What docs are we mentioning here? I have added the details in the 
committer.md file though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-03-26 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1538875809


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/InMemoryMagicCommitTracker.java:
##
@@ -113,15 +122,26 @@ public boolean aboutToComplete(String uploadId,
 return false;
   }
 
-  public static Map> 
getTaskAttemptIdToMpuMetdadataMap() {
-return taskAttemptIdToMpuMetdadataMap;
+  @Override
+  public String toString() {
+final StringBuilder sb = new StringBuilder(
+"InMemoryMagicCommitTracker{");
+sb.append(", Number of 
taskAttempts=").append(TASK_ATTEMPT_ID_TO_MPU_METDADATA.size());
+sb.append(", Number of files=").append(PATH_TO_BYTES_WRITTEN.size());
+sb.append('}');
+return sb.toString();
+  }
+
+
+  public static Map> 
getTaskAttemptIdToMpuMetdadata() {

Review Comment:
   ack.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-03-25 Thread via GitHub


steveloughran commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1537910244


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/InMemoryMagicCommitTracker.java:
##
@@ -113,15 +122,26 @@ public boolean aboutToComplete(String uploadId,
 return false;
   }
 
-  public static Map> 
getTaskAttemptIdToMpuMetdadataMap() {
-return taskAttemptIdToMpuMetdadataMap;
+  @Override
+  public String toString() {
+final StringBuilder sb = new StringBuilder(
+"InMemoryMagicCommitTracker{");
+sb.append(", Number of 
taskAttempts=").append(TASK_ATTEMPT_ID_TO_MPU_METDADATA.size());
+sb.append(", Number of files=").append(PATH_TO_BYTES_WRITTEN.size());
+sb.append('}');
+return sb.toString();
+  }
+
+
+  public static Map> 
getTaskAttemptIdToMpuMetdadata() {

Review Comment:
   nit: typo in method name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-03-21 Thread via GitHub


shameersss1 commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-2011393345

   @steveloughran  - Gentle reminder for review
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-20 Thread via GitHub


shameersss1 commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1954651307

   @steveloughran  - I am glad to assist with the testing. Is there any release 
candidate branch for the same? Could you please share the wiki on what tests 
needs to be done?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-20 Thread via GitHub


steveloughran commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1954463412

   @shameersss1 I'm working on assisting getting the 3.4.0 release out right 
now. Anything you can do to assist testing would be wonderful, as I'm only 
worrying about release blockers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-18 Thread via GitHub


shameersss1 commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1951725768

   @steveloughran - Gentle reminder to review the changes.
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-12 Thread via GitHub


hadoop-yetus commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1938307370

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 51s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  46m 39s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  37m 54s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 25s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  37m 17s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 48s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 138m 33s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6468 |
   | JIRA Issue | HADOOP-19047 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux 29f0f609a7d0 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d8db729e5568df5dc920604eff8167a575a5894c |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/6/testReport/ |
   | Max. process+thread count | 526 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/6/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 

Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-11 Thread via GitHub


shameersss1 commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1938151034

   @steveloughran - Thanks a lot for review comments, I have addressed the 
comments with the new commit `d8db729e5568df5dc920604eff8167a575a5894c`
   
   1. Added test in ITestTerasortOnS3A
   2. Fixed import ordering
   3. Addressed other minor comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-11 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1485758682


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/InMemoryMagicCommitTracker.java:
##
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit.magic;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+
+import software.amazon.awssdk.services.s3.model.CompletedPart;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.WriteOperationHelper;
+import org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit;
+import org.apache.hadoop.fs.s3a.statistics.PutTrackerStatistics;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSnapshot;
+import org.apache.hadoop.util.Preconditions;
+
+import static 
org.apache.hadoop.fs.s3a.commit.magic.MagicCommitTrackerUtils.extractTaskAttemptIdFromPath;
+
+/**
+ * InMemoryMagicCommitTracker stores the commit data in memory.
+ * The commit data and related data stores are flushed out from
+ * the memory when the task is committed or aborted.
+ */
+public class InMemoryMagicCommitTracker extends MagicCommitTracker {
+
+  // stores taskAttemptId to commit data mapping
+  private static Map>

Review Comment:
   ack for final.
   
   >  I do think they should use weak/soft references,
   
   Is this required ? Given that we proactively remove the entries from HashMap 
when the task commits or aborts. Since it is not referenced any where, when gc 
happens it reclaims the memory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-11 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1485746226


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTrackerUtils.java:
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit.magic;
+
+import org.apache.hadoop.conf.Configuration;

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-11 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1485745785


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java:
##
@@ -118,76 +103,21 @@ public boolean outputImmediatelyVisible() {
 
   /**
* Complete operation: generate the final commit data, put it.
-   * @param uploadId Upload ID
-   * @param parts list of parts
+   *
+   * @param uploadId Upload ID
+   * @param partslist of parts
* @param bytesWritten bytes written
* @param iostatistics nullable IO statistics
* @return false, indicating that the commit must fail.
-   * @throws IOException any IO problem.
+   * @throws IOException  any IO problem.

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-11 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1485744738


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/InMemoryMagicCommitTracker.java:
##
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit.magic;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+
+import software.amazon.awssdk.services.s3.model.CompletedPart;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.WriteOperationHelper;
+import org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit;
+import org.apache.hadoop.fs.s3a.statistics.PutTrackerStatistics;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSnapshot;
+import org.apache.hadoop.util.Preconditions;
+
+import static 
org.apache.hadoop.fs.s3a.commit.magic.MagicCommitTrackerUtils.extractTaskAttemptIdFromPath;
+
+/**
+ * InMemoryMagicCommitTracker stores the commit data in memory.
+ * The commit data and related data stores are flushed out from
+ * the memory when the task is committed or aborted.
+ */
+public class InMemoryMagicCommitTracker extends MagicCommitTracker {
+
+  // stores taskAttemptId to commit data mapping

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-11 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1485743070


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/InMemoryMagicCommitTracker.java:
##
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit.magic;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+
+import software.amazon.awssdk.services.s3.model.CompletedPart;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.WriteOperationHelper;
+import org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit;
+import org.apache.hadoop.fs.s3a.statistics.PutTrackerStatistics;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSnapshot;
+import org.apache.hadoop.util.Preconditions;
+
+import static 
org.apache.hadoop.fs.s3a.commit.magic.MagicCommitTrackerUtils.extractTaskAttemptIdFromPath;
+
+/**
+ * InMemoryMagicCommitTracker stores the commit data in memory.
+ * The commit data and related data stores are flushed out from
+ * the memory when the task is committed or aborted.
+ */
+public class InMemoryMagicCommitTracker extends MagicCommitTracker {

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-11 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1485740943


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/MagicCommitIntegration.java:
##
@@ -20,17 +20,19 @@
 
 import java.util.List;
 
+import org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker;

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-11 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1485740785


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/CommitConstants.java:
##
@@ -242,6 +242,13 @@ private CommitConstants() {
*/
   public static final int DEFAULT_COMMITTER_THREADS = 32;
 
+
+  public static final String 
FS_S3A_COMMITTER_MAGIC_TRACK_COMMITS_IN_MEMORY_ENABLED =

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-11 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1485740055


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##
@@ -3906,6 +3908,21 @@ public void access(final Path f, final FsAction mode)
   @Retries.RetryTranslated
   public FileStatus getFileStatus(final Path f) throws IOException {
 Path path = qualify(f);
+if (isTrackMagicCommitsInMemoryEnabled(getConf()) && 
isMagicCommitPath(path)) {
+  // Some downstream apps might call getFileStatus for a magic path to get 
the file size.
+  // when commit data is stored in memory construct the dummy 
S3AFileStatus with correct
+  // file size fetched from the memory.
+  if 
(InMemoryMagicCommitTracker.getTaskAttemptIdToBytesWritten().containsKey(path)) 
{
+long len = 
InMemoryMagicCommitTracker.getTaskAttemptIdToBytesWritten().get(path);
+return new S3AFileStatus(len,

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-09 Thread via GitHub


steveloughran commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1484762085


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/CommitConstants.java:
##
@@ -242,6 +242,13 @@ private CommitConstants() {
*/
   public static final int DEFAULT_COMMITTER_THREADS = 32;
 
+
+  public static final String 
FS_S3A_COMMITTER_MAGIC_TRACK_COMMITS_IN_MEMORY_ENABLED =

Review Comment:
   javadocs here and below, use {@value} for value insertion



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##
@@ -3906,6 +3908,21 @@ public void access(final Path f, final FsAction mode)
   @Retries.RetryTranslated
   public FileStatus getFileStatus(final Path f) throws IOException {
 Path path = qualify(f);
+if (isTrackMagicCommitsInMemoryEnabled(getConf()) && 
isMagicCommitPath(path)) {
+  // Some downstream apps might call getFileStatus for a magic path to get 
the file size.
+  // when commit data is stored in memory construct the dummy 
S3AFileStatus with correct
+  // file size fetched from the memory.
+  if 
(InMemoryMagicCommitTracker.getTaskAttemptIdToBytesWritten().containsKey(path)) 
{
+long len = 
InMemoryMagicCommitTracker.getTaskAttemptIdToBytesWritten().get(path);
+return new S3AFileStatus(len,

Review Comment:
   how about we add a special etag here, like "pending", declared in 
CommitterConstants. That way toString() on the status hints that it is pending 
so there's more diagnostics in list operations etc. And it could be used in 
tests. 



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/InMemoryMagicCommitTracker.java:
##
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit.magic;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+
+import software.amazon.awssdk.services.s3.model.CompletedPart;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.WriteOperationHelper;
+import org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit;
+import org.apache.hadoop.fs.s3a.statistics.PutTrackerStatistics;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSnapshot;
+import org.apache.hadoop.util.Preconditions;
+
+import static 
org.apache.hadoop.fs.s3a.commit.magic.MagicCommitTrackerUtils.extractTaskAttemptIdFromPath;
+
+/**
+ * InMemoryMagicCommitTracker stores the commit data in memory.
+ * The commit data and related data stores are flushed out from
+ * the memory when the task is committed or aborted.
+ */
+public class InMemoryMagicCommitTracker extends MagicCommitTracker {
+
+  // stores taskAttemptId to commit data mapping

Review Comment:
   make javadocs



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/InMemoryMagicCommitTracker.java:
##
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit.magic;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import 

Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-09 Thread via GitHub


steveloughran commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1484756242


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java:
##
@@ -264,9 +326,14 @@ public void abortTask(TaskAttemptContext context) throws 
IOException {
 try (DurationInfo d = new DurationInfo(LOG,
 "Abort task %s", context.getTaskAttemptID());
 CommitContext commitContext = initiateTaskOperation(context)) {
-  getCommitOperations().abortAllSinglePendingCommits(attemptPath,
-  commitContext,
-  true);
+  if (isTrackMagicCommitsInMemoryEnabled(context.getConfiguration())) {
+List pendingCommits = 
loadPendingCommitsFromMemory(context);
+for (SinglePendingCommit singleCommit : pendingCommits) {
+  commitContext.abortSingleCommit(singleCommit);
+}
+  } else {
+getCommitOperations().abortAllSinglePendingCommits(attemptPath, 
commitContext, true);

Review Comment:
   looked at the hadoop stuff and yes, it does it in the same process too
   ```
   this is called from a task's process to clean 
   up a single task's output that can not yet been committed. This may be
   called multiple times for the same task, but for different task attempts.
   ```
   so nothing to worry about
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-08 Thread via GitHub


shameersss1 commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1934214757

   @steveloughran  - Gentle reminder for review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-05 Thread via GitHub


hadoop-yetus commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1926764689

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 51s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  46m 54s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m  5s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/5/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 1 new + 5 unchanged - 0 fixed 
= 6 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  38m 24s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 55s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 141m 44s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6468 |
   | JIRA Issue | HADOOP-19047 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux f10aab5720a0 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7d213000e14bcd1ee7aa6c1369f996a0815347e8 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/5/testReport/ |
   | Max. process+thread count | 528 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/5/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was 

Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-05 Thread via GitHub


shameersss1 commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1926512939

   @steveloughran  - Thanks a lot a detailed review as well as amazing follow 
up question. I have addressed your comments, Please let me know your thoughts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-05 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1477845326


##
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/magic/ITestMagicCommitProtocol.java:
##
@@ -71,6 +79,26 @@ public void setup() throws Exception {
 CommitUtils.verifyIsMagicCommitFS(getFileSystem());
   }
 
+  @Parameterized.Parameters(name = "track-commit-in-memory-{0}")
+  public static Collection params() {
+return Arrays.asList(new Object[][]{
+{false},
+{true}
+});
+  }
+
+  public ITestMagicCommitProtocol(boolean trackCommitsInMemory) {
+this.trackCommitsInMemory = trackCommitsInMemory;
+  }
+
+  @Override
+  protected Configuration createConfiguration() {
+Configuration conf = super.createConfiguration();
+conf.setBoolean(FS_S3A_COMMITTER_MAGIC_TRACK_COMMITS_IN_MEMORY_ENABLED, 
trackCommitsInMemory);

Review Comment:
   ack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-05 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1477808239


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##
@@ -52,6 +52,7 @@
 import java.util.concurrent.atomic.AtomicBoolean;
 import javax.annotation.Nullable;
 
+import org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker;

Review Comment:
   Ack.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-05 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1477804569


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/S3MagicCommitTracker.java:
##
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit.magic;
+
+import org.apache.commons.lang3.StringUtils;

Review Comment:
   Ack. I will import `Code formatter xml is present here: 
https://github.com/apache/hadoop/tree/trunk/dev-support/code-formatter . 
IntelliJ users can directly import hadoop_idea_formatter.xml`



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/InMemoryMagicCommitTracker.java:
##
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit.magic;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.WriteOperationHelper;
+import org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit;
+import org.apache.hadoop.fs.s3a.statistics.PutTrackerStatistics;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSnapshot;
+import org.apache.hadoop.thirdparty.com.google.common.base.Preconditions;
+import software.amazon.awssdk.services.s3.model.CompletedPart;

Review Comment:
   Ack. I will import `Code formatter xml is present here: 
https://github.com/apache/hadoop/tree/trunk/dev-support/code-formatter . 
IntelliJ users can directly import hadoop_idea_formatter.xml`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-05 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1477795947


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java:
##
@@ -248,6 +236,80 @@ private PendingSet innerCommitTask(
 return pendingSet;
   }
 
+  /**
+   * Loads pending commits from either memory or from the remote store (S3) 
based on the config.
+   * @param context TaskAttemptContext
+   * @return All pending commit data for the given TaskAttemptContext
+   * @throws IOException
+   *   if there is an error trying to read the commit data
+   */
+  protected PendingSet loadPendingCommits(TaskAttemptContext context) throws 
IOException {
+PendingSet pendingSet = new PendingSet();
+if (isTrackMagicCommitsInMemoryEnabled(context.getConfiguration())) {
+  // load from memory
+  List pendingCommits = 
loadPendingCommitsFromMemory(context);
+
+  for (SinglePendingCommit singleCommit : pendingCommits) {
+// aggregate stats
+pendingSet.getIOStatistics()
+.aggregate(singleCommit.getIOStatistics());
+// then clear so they aren't marshalled again.
+singleCommit.getIOStatistics().clear();
+  }
+  pendingSet.setCommits(pendingCommits);
+} else {
+  // Load from remote store
+  CommitOperations actions = getCommitOperations();
+  Path taskAttemptPath = getTaskAttemptPath(context);
+  try (CommitContext commitContext = initiateTaskOperation(context)) {
+Pair>> loaded =
+actions.loadSinglePendingCommits(taskAttemptPath, true, 
commitContext);
+pendingSet = loaded.getKey();
+List> failures = 
loaded.getValue();
+if (!failures.isEmpty()) {
+  // At least one file failed to load
+  // revert all which did; report failure with first exception
+  LOG.error("At least one commit file could not be read: failing");
+  abortPendingUploads(commitContext, pendingSet.getCommits(), true);
+  throw failures.get(0).getValue();
+}
+  }
+}
+return pendingSet;
+  }
+
+  private List 
loadPendingCommitsFromMemory(TaskAttemptContext context)

Review Comment:
   ack.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-04 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1477749474


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java:
##
@@ -264,9 +326,14 @@ public void abortTask(TaskAttemptContext context) throws 
IOException {
 try (DurationInfo d = new DurationInfo(LOG,
 "Abort task %s", context.getTaskAttemptID());
 CommitContext commitContext = initiateTaskOperation(context)) {
-  getCommitOperations().abortAllSinglePendingCommits(attemptPath,
-  commitContext,
-  true);
+  if (isTrackMagicCommitsInMemoryEnabled(context.getConfiguration())) {
+List pendingCommits = 
loadPendingCommitsFromMemory(context);
+for (SinglePendingCommit singleCommit : pendingCommits) {
+  commitContext.abortSingleCommit(singleCommit);
+}
+  } else {
+getCommitOperations().abortAllSinglePendingCommits(attemptPath, 
commitContext, true);

Review Comment:
   AFIK, Spark calls abortTask from the same process (executor), When the job 
fails, The abortJob operation is called which basically lists all the pending 
uploads and aborts it as mentioned in the comment 
[here](https://github.com/apache/hadoop/pull/6468#issuecomment-1926348440)
   
   I am not sure why would a different process call abortTask, The driver 
process should ideally call abortJob if a job fails.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-04 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1477747332


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##
@@ -3906,6 +3908,21 @@ public void access(final Path f, final FsAction mode)
   @Retries.RetryTranslated
   public FileStatus getFileStatus(final Path f) throws IOException {
 Path path = qualify(f);
+if (isTrackMagicCommitsInMemoryEnabled(getConf()) && 
isMagicCommitPath(path)) {

Review Comment:
   I agree this an ugly hack. I couldn't find any better alternative. This will 
be used by downstream application like Spark which wants to get the file size 
of the file written by the task. This is supposed to be used in the same 
process which writes the file/initiated and upload the MPU.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-04 Thread via GitHub


shameersss1 commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1926348440

   >1. static map of path to metadata. This will grow without constraint on a 
long live process.
   
   The entries to the Map are removed during commitTask or abortTask operation 
to keep memory under control.
   
   ---
   
   > 2. Two jobs writing to same path will it corrupt the Map ?
   
   No, The path (complete) is guaranteed to be unique The paths stored here as 
part of `private static Map> taskAttemptIdToPath = new 
ConcurrentHashMap<>();` is the magic path, Eventhough the file name might be 
same, The magic path for two different jobs will be different since the jobId 
is included in the path.
   
   -
   
   >3. the static map would be a weak ref to something held strongly by the 
actual committer (see WeakReferenceMap). Once the actual task attempt is gc'd,
   
   Since the entries from the HashMap are removed during commitTask or 
abortTask operation is WeakHashMap still required?
   
   
   
   >4.  static structures should be per fs instances, so when an fs is cleaned 
up
   
   I am not sure why it should be scoped under fs object. For a simiar 
behaviour with storing in s3, Shouldn't the static structure be available to 
the whole JVM ? I mean shouldn't we able to access static structure 
irrespective of the fs object.
   
   
   
   >5. 'm also worried about how a job could abort a task attempt on a 
different process which has failed. Before worrying about that too much, why 
don't you look in spark to see how it calls abort. I'm not worried about 
MapReduce except for testing -so how do itself calls the committee isn't so 
important. For example: we don't care about recovery from a failed attempt as 
spark itself cannot do this.
   
   I have covered this as part of the comment 
[here](https://github.com/apache/hadoop/pull/6468#issuecomment-1926304528).
   
   ---


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-04 Thread via GitHub


shameersss1 commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1926304528

   @steveloughran  - Thanks a lot a for a detailed review and some amazing 
question, The following are my thoughts on the different asks.
   
   > 1. Marker files at the end of each path so that spark status reporting on 
different processes can get an update on an active job.
   
   As far i know (Please correct me if i am wrong)
   1. 0-size marker files was specially added for Spark's use case.
   2. After writing files, Spark tries to get the size of the files written for 
the statistic purpose (like showing the output bytes written) in the Spark 
History server UI.
   3. This operation is being done as part of the 
[BasicWriteStatsTracker](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala#L86)
 class in Spark.
   4. As i could see in my experiment, BasicWriteStatsTracker#getFileSize is 
called in the executor process itself.
   
   That being said, Since the same process is calling 
BasicWriteStatsTracker#getFileSize is it still required to have 0 marker file? 
I have solved this by adding a check in FileStatus method by returing the file 
size corresponding to the magic path/file.
   
   --
   
   > 2. A way to abort all uploads of a failed task attempt -even from a 
different process. Probably also a way to abort the entire job.
   
   Thinking from Spark's perspective,
   1. When a taskAttempt fails (gracefully), 
[abortTask](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala#L176)
 operation is called. This is operation is called within the same process and 
hence we can fetch the MPU metadata from the memory itself.
   
   2. If a taskAttempt fails (ungracefully and all retries) are exhausted, When 
[abortJob](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala#L69)
 operation is called which will internally invoke 
[cleanup](https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L965)
 which lists all the pending multi part upload and aborts them.
   
   That being said, I am not sure if there is any such use case of abortingTask 
from another process. In such cases, The abortJob will handle it i guess.
   
   
   
   >3. Confidence that the inner memory store of pending uploads Will not grow 
it definitely.
   
   1. The static map entry is removed is during taskCommit or abortTask 
operations and hence it guaranteed that there is no memory leak (unless there 
is some unexplored corner case).
   
   2. The only case when it grows large, is when there are large number of 
concurrent jobs reusing the same executor JVM, Since we don't enable the 
"inmemory" by default we should be good. That being said, maybe we should call 
this out in the documentation.
   
   
   
   > 4. Two jobs writing to same path will it corrupt the Map ?
   
   The paths stored here as part of `private static Map> 
taskAttemptIdToPath = new ConcurrentHashMap<>();` is the magic path, Eventhough 
the file name might be same, The magic path for two different jobs will be 
different since the jobId is included in the path.
   
   
   Doe it make sense? Or am i missing anything?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-03 Thread via GitHub


steveloughran commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1925340023

   I've thought about this some more. Here are some things which I believe we 
need
   
   1. Marker files at the end of each path so that spark status reporting on 
different processes can get an update on an active job.
   1. A way to abort all uploads of a failed task attempt -even from a 
different process. Probably also a way to abort the entire job.
   1. Confidence that the inner memory store of pending uploads Will not grow 
it definitely.
   
   Ignoring item number #3 for now, remember that we have #1 solved by adding a 
0 byte marker with a header of "final length"; spark has some special handling 
zero byte files to use getXattr() and fall back to the probe for this -at the 
expense of a second HEAD request. Generating a modified FileStatus response 
from a single HEAD/getObjectMetadata() call Wood actually eliminate the need 
for that I wish I'd thought of it myself. Yes, we do break that guarantee that 
files listed are the same size as the files opened… but magic paths are, well, 
magic. We break a lot of guarantees there already.
   
   The existing design should be retained even in memory; the calculation of 
final length something which can be done for all.
   
   But: we do not need to save the .pending files just for task abort. All we 
need to do is be able to enumerate the upload IDs of all the files from that 
task attempt and cancel them. We can do that just by adding another header to 
the marker file. Task committee uses the memory data; task abort will need a 
deep scan of the task attempt, and all zero bite files with the proposed new 
header used to initiate water operations. This is only for task board an 
outlier case. For normal task commit there is no need to Scan the directory 
pause the pending files then generate a new pending set file for later pause 
commit. It is probably the Jason on the marshalling which is as much a 
performance killer here as the listing operation.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-02-02 Thread via GitHub


steveloughran commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1476480590


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##
@@ -52,6 +52,7 @@
 import java.util.concurrent.atomic.AtomicBoolean;
 import javax.annotation.Nullable;
 
+import org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker;

Review Comment:
   move to same group as rest of apache imports



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/S3MagicCommitTracker.java:
##
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit.magic;
+
+import org.apache.commons.lang3.StringUtils;

Review Comment:
   import ordering and grouping. best to set the ide for the code style -but 
don't rearrange imports on existing classes as it ruins cherrypicking



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java:
##
@@ -248,6 +236,80 @@ private PendingSet innerCommitTask(
 return pendingSet;
   }
 
+  /**
+   * Loads pending commits from either memory or from the remote store (S3) 
based on the config.
+   * @param context TaskAttemptContext
+   * @return All pending commit data for the given TaskAttemptContext
+   * @throws IOException
+   *   if there is an error trying to read the commit data
+   */
+  protected PendingSet loadPendingCommits(TaskAttemptContext context) throws 
IOException {
+PendingSet pendingSet = new PendingSet();
+if (isTrackMagicCommitsInMemoryEnabled(context.getConfiguration())) {
+  // load from memory
+  List pendingCommits = 
loadPendingCommitsFromMemory(context);
+
+  for (SinglePendingCommit singleCommit : pendingCommits) {
+// aggregate stats
+pendingSet.getIOStatistics()
+.aggregate(singleCommit.getIOStatistics());
+// then clear so they aren't marshalled again.
+singleCommit.getIOStatistics().clear();
+  }
+  pendingSet.setCommits(pendingCommits);
+} else {
+  // Load from remote store
+  CommitOperations actions = getCommitOperations();
+  Path taskAttemptPath = getTaskAttemptPath(context);
+  try (CommitContext commitContext = initiateTaskOperation(context)) {
+Pair>> loaded =
+actions.loadSinglePendingCommits(taskAttemptPath, true, 
commitContext);
+pendingSet = loaded.getKey();
+List> failures = 
loaded.getValue();
+if (!failures.isEmpty()) {
+  // At least one file failed to load
+  // revert all which did; report failure with first exception
+  LOG.error("At least one commit file could not be read: failing");
+  abortPendingUploads(commitContext, pendingSet.getCommits(), true);
+  throw failures.get(0).getValue();
+}
+  }
+}
+return pendingSet;
+  }
+
+  private List 
loadPendingCommitsFromMemory(TaskAttemptContext context)

Review Comment:
   nit, javadocs



##
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/magic/ITestMagicCommitProtocol.java:
##
@@ -71,6 +79,26 @@ public void setup() throws Exception {
 CommitUtils.verifyIsMagicCommitFS(getFileSystem());
   }
 
+  @Parameterized.Parameters(name = "track-commit-in-memory-{0}")
+  public static Collection params() {
+return Arrays.asList(new Object[][]{
+{false},
+{true}
+});
+  }
+
+  public ITestMagicCommitProtocol(boolean trackCommitsInMemory) {
+this.trackCommitsInMemory = trackCommitsInMemory;
+  }
+
+  @Override
+  protected Configuration createConfiguration() {
+Configuration conf = super.createConfiguration();
+conf.setBoolean(FS_S3A_COMMITTER_MAGIC_TRACK_COMMITS_IN_MEMORY_ENABLED, 
trackCommitsInMemory);

Review Comment:
   look for uses of `removeBaseAndBucketOverrides()` to see how to avoid 
per-bucket settings breaking your test.



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##
@@ -3906,6 +3908,21 @@ public void access(final Path f, final FsAction mode)
   @Retries.RetryTranslated
 

Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-01-31 Thread via GitHub


shameersss1 commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1919397323

   @steveloughran - Could you please review the changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-01-27 Thread via GitHub


hadoop-yetus commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1913253726

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 47s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  46m 42s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m  7s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  37m 40s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 26s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  37m 55s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 53s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 138m 54s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6468 |
   | JIRA Issue | HADOOP-19047 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux 629db3c4380a 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 02fc9f187779a14e1645d7866c0f00423474c97c |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/4/testReport/ |
   | Max. process+thread count | 527 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/4/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 

Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-01-26 Thread via GitHub


hadoop-yetus commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1911683899

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 53s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  45m 57s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m  8s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  37m 58s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 26s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | -1 :x: |  javadoc  |   0m 24s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/3/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08 with 
JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 generated 1 new + 0 unchanged 
- 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  spotbugs  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m  5s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   3m  4s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 140m  2s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6468 |
   | JIRA Issue | HADOOP-19047 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux 3dde19b6ae38 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / a5b27c54f2bc3d75947fefcc5d040345231f91ce |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/3/testReport/ |
   | Max. process+thread count | 616 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/3/console |
   | versions | 

Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-01-25 Thread via GitHub


hadoop-yetus commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1911508281

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  48m 33s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  37m 22s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 26s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 20s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/2/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 12 new + 5 unchanged - 0 fixed 
= 17 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | -1 :x: |  javadoc  |   0m 24s | 
[/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/2/artifact/out/patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  hadoop-aws in the patch failed with JDK Private 
Build-1.8.0_392-8u392-ga-1~20.04-b08.  |
   | -1 :x: |  spotbugs  |   1m 10s | 
[/new-spotbugs-hadoop-tools_hadoop-aws.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/2/artifact/out/new-spotbugs-hadoop-tools_hadoop-aws.html)
 |  hadoop-tools/hadoop-aws generated 3 new + 0 unchanged - 0 fixed = 3 total 
(was 0)  |
   | +1 :green_heart: |  shadedclient  |  37m 49s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   2m 54s | 
[/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6468/2/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt)
 |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 140m 50s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-tools/hadoop-aws |
   |  |  
org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker.taskAttemptIdToBytesWritten
 isn't final but should be  At InMemoryMagicCommitTracker.java:be  At 
InMemoryMagicCommitTracker.java:[line 52] |
   |  |  
org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker.taskAttemptIdToMpuMetdadataMap
 isn't final but should be  At InMemoryMagicCommitTracker.java:be  At 
InMemoryMagicCommitTracker.java:[line 49] |
   |  |  
org.apache.hadoop.fs.s3a.commit.magic.InMemoryMagicCommitTracker.taskAttemptIdToPath
 isn't final but should be  At InMemoryMagicCommitTracker.java:be  At 
InMemoryMagicCommitTracker.java:[line 55] |
   | Failed junit tests | hadoop.fs.s3a.commit.TestMagicCommitTrackerUtils |
   
   
   | 

Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-01-25 Thread via GitHub


shameersss1 commented on PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#issuecomment-1911341093

   @steveloughran  - I have converted draft PR to final one. Could you please 
review the changes.
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-01-25 Thread via GitHub


shameersss1 commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1467185040


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/S3MagicCommitTracker.java:
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit.magic;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.Retries;
+import org.apache.hadoop.fs.s3a.S3ADataBlocks;
+import org.apache.hadoop.fs.s3a.WriteOperationHelper;
+import org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit;
+import org.apache.hadoop.fs.s3a.impl.PutObjectOptions;
+import org.apache.hadoop.fs.s3a.statistics.PutTrackerStatistics;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSnapshot;
+import org.apache.hadoop.util.Preconditions;
+import software.amazon.awssdk.services.s3.model.CompletedPart;
+import software.amazon.awssdk.services.s3.model.PutObjectRequest;
+
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hadoop.fs.s3a.Statistic.COMMITTER_MAGIC_MARKER_PUT;
+import static 
org.apache.hadoop.fs.s3a.commit.CommitConstants.X_HEADER_MAGIC_MARKER;
+import static 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation;
+
+public class S3MagicCommitTracker extends MagicCommitTracker {
+
+  public S3MagicCommitTracker(Path path,
+  String bucket,
+  String originalDestKey,
+  String destKey,
+  String pendingsetKey,
+  WriteOperationHelper writer,
+  PutTrackerStatistics trackerStatistics) {
+super(path, bucket, originalDestKey, destKey, pendingsetKey, writer, 
trackerStatistics);
+  }
+
+  @Override
+  public boolean aboutToComplete(String uploadId,
+  List parts,
+  long bytesWritten,
+  final IOStatistics iostatistics)
+  throws IOException {
+Preconditions.checkArgument(StringUtils.isNotEmpty(uploadId),
+"empty/null upload ID: "+ uploadId);
+Preconditions.checkArgument(parts != null,
+"No uploaded parts list");
+Preconditions.checkArgument(!parts.isEmpty(),
+"No uploaded parts to save");
+
+// put a 0-byte file with the name of the original under-magic path
+// Add the final file length as a header
+// this is done before the task commit, so its duration can be
+// included in the statistics
+Map headers = new HashMap<>();
+headers.put(X_HEADER_MAGIC_MARKER, Long.toString(bytesWritten));
+PutObjectRequest originalDestPut = writer.createPutObjectRequest(
+originalDestKey,
+0,
+new PutObjectOptions(true, null, headers), false);
+upload(originalDestPut, new ByteArrayInputStream(EMPTY));
+
+// build the commit summary
+SinglePendingCommit commitData = new SinglePendingCommit();
+commitData.touch(System.currentTimeMillis());
+commitData.setDestinationKey(getDestKey());
+commitData.setBucket(bucket);
+commitData.setUri(path.toUri().toString());
+commitData.setUploadId(uploadId);
+commitData.setText("");
+commitData.setLength(bytesWritten);
+commitData.bindCommitData(parts);
+commitData.setIOStatistics(
+new IOStatisticsSnapshot(iostatistics));
+
+byte[] bytes = commitData.toBytes(SinglePendingCommit.serializer());

Review Comment:
   Thanks for bringing up this. I intend to do this separately in a different 
Jira (clubbing with option to transfer commit data over network to driver - 
This might be useful for Spark where taskcommitMessage is propagated) and 
keeping the scope of this change only to in-memory commit data.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] HADOOP-19047: Support InMemory Tracking Of S3A Magic Commits [hadoop]

2024-01-21 Thread via GitHub


steveloughran commented on code in PR #6468:
URL: https://github.com/apache/hadoop/pull/6468#discussion_r1461054737


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/S3MagicCommitTracker.java:
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.commit.magic;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.Retries;
+import org.apache.hadoop.fs.s3a.S3ADataBlocks;
+import org.apache.hadoop.fs.s3a.WriteOperationHelper;
+import org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit;
+import org.apache.hadoop.fs.s3a.impl.PutObjectOptions;
+import org.apache.hadoop.fs.s3a.statistics.PutTrackerStatistics;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.fs.statistics.IOStatisticsSnapshot;
+import org.apache.hadoop.util.Preconditions;
+import software.amazon.awssdk.services.s3.model.CompletedPart;
+import software.amazon.awssdk.services.s3.model.PutObjectRequest;
+
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hadoop.fs.s3a.Statistic.COMMITTER_MAGIC_MARKER_PUT;
+import static 
org.apache.hadoop.fs.s3a.commit.CommitConstants.X_HEADER_MAGIC_MARKER;
+import static 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation;
+
+public class S3MagicCommitTracker extends MagicCommitTracker {
+
+  public S3MagicCommitTracker(Path path,
+  String bucket,
+  String originalDestKey,
+  String destKey,
+  String pendingsetKey,
+  WriteOperationHelper writer,
+  PutTrackerStatistics trackerStatistics) {
+super(path, bucket, originalDestKey, destKey, pendingsetKey, writer, 
trackerStatistics);
+  }
+
+  @Override
+  public boolean aboutToComplete(String uploadId,
+  List parts,
+  long bytesWritten,
+  final IOStatistics iostatistics)
+  throws IOException {
+Preconditions.checkArgument(StringUtils.isNotEmpty(uploadId),
+"empty/null upload ID: "+ uploadId);
+Preconditions.checkArgument(parts != null,
+"No uploaded parts list");
+Preconditions.checkArgument(!parts.isEmpty(),
+"No uploaded parts to save");
+
+// put a 0-byte file with the name of the original under-magic path
+// Add the final file length as a header
+// this is done before the task commit, so its duration can be
+// included in the statistics
+Map headers = new HashMap<>();
+headers.put(X_HEADER_MAGIC_MARKER, Long.toString(bytesWritten));
+PutObjectRequest originalDestPut = writer.createPutObjectRequest(
+originalDestKey,
+0,
+new PutObjectOptions(true, null, headers), false);
+upload(originalDestPut, new ByteArrayInputStream(EMPTY));
+
+// build the commit summary
+SinglePendingCommit commitData = new SinglePendingCommit();
+commitData.touch(System.currentTimeMillis());
+commitData.setDestinationKey(getDestKey());
+commitData.setBucket(bucket);
+commitData.setUri(path.toUri().toString());
+commitData.setUploadId(uploadId);
+commitData.setText("");
+commitData.setLength(bytesWritten);
+commitData.bindCommitData(parts);
+commitData.setIOStatistics(
+new IOStatisticsSnapshot(iostatistics));
+
+byte[] bytes = commitData.toBytes(SinglePendingCommit.serializer());

Review Comment:
   you know, the other thing to consider here is moving from json 
serialization; IOStatisticsSnapshot already implements Serializable; adding 
Hadoop Writable to it would make for faster ser/deser and marshalling than 
through jackson



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: