Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-05-13 Thread via GitHub


steveloughran merged PR #6716:
URL: https://github.com/apache/hadoop/pull/6716


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-05-09 Thread via GitHub


hadoop-yetus commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2102258570

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 05s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m 05s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 05s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  shellcheck  |   0m 05s |  |  Shellcheck was not available.  |
   | +0 :ok: |  shelldocs  |   0m 05s |  |  Shelldocs was not available.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  markdownlint  |   0m 01s |  |  markdownlint was not available.  
|
   | +0 :ok: |  xmllint  |   0m 01s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 11 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 35s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  87m 02s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  38m 10s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   6m 01s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |  16m 14s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |  14m 05s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 165m 47s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 13s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  11m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  36m 11s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |  36m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 01s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   5m 43s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |  16m 13s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |  13m 47s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 174m 45s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   6m 07s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 532m 21s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6716 |
   | Optional Tests | dupname asflicense codespell detsecrets shellcheck 
shelldocs compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs 
checkstyle markdownlint xmllint |
   | uname | MINGW64_NT-10.0-17763 54377fa890b8 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 68dff782a05bdaaf1fcc1e4999e6fd404f07796b |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6716/4/testReport/
 |
   | modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project hadoop-tools/hadoop-azure U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6716/4/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-05-07 Thread via GitHub


steveloughran commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2098947031

   @saxenapranav what do you think of the patch now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-05-05 Thread via GitHub


saxenapranav commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1590568348


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/AbstractJobOrTaskStage.java:
##
@@ -582,19 +611,111 @@ protected final Path directoryMustExist(
* Save a task manifest or summary. This will be done by
* writing to a temp path and then renaming.
* If the destination path exists: Delete it.
+   * This will retry so that a rename failure from abfs load or IO errors
+   * will not fail the task.
* @param manifestData the manifest/success file
* @param tempPath temp path for the initial save
* @param finalPath final path for rename.
-   * @throws IOException failure to load/parse
+   * @return the manifest saved.
+   * @throws IOException failure to rename after retries.
*/
   @SuppressWarnings("unchecked")
-  protected final  void save(T manifestData,
+  protected final  T save(
+  final T manifestData,
   final Path tempPath,
   final Path finalPath) throws IOException {
-LOG.trace("{}: save('{}, {}, {}')", getName(), manifestData, tempPath, 
finalPath);
-trackDurationOfInvocation(getIOStatistics(), OP_SAVE_TASK_MANIFEST, () ->
-operations.save(manifestData, tempPath, true));
-renameFile(tempPath, finalPath);
+return saveManifest(() -> manifestData, tempPath, finalPath, 
OP_SAVE_TASK_MANIFEST);
+  }
+
+  /**
+   * Generate and save a task manifest or summary file.
+   * This is be done by writing to a temp path and then renaming.
+   * 
+   * If the destination path exists: Delete it before the rename.
+   * 
+   * This will retry so that a rename failure from abfs load or IO errors
+   * such as delete or save failure will not fail the task.
+   * 
+   * The {@code manifestSource} supplier is invoked to get the manifest data
+   * on every attempt.
+   * This permits statistics to be updated, including those of failures.
+   * @param manifestSource supplier the manifest/success file
+   * @param tempPath temp path for the initial save
+   * @param finalPath final path for rename.
+   * @param statistic statistic to use for timing
+   * @return the manifest saved.
+   * @throws IOException failure to save/delete/rename after retries.
+   */
+  @SuppressWarnings("unchecked")
+  protected final  T saveManifest(
+  final Supplier manifestSource,
+  final Path tempPath,
+  final Path finalPath,
+  String statistic) throws IOException {
+
+AtomicInteger retryCount = new AtomicInteger(0);
+RetryPolicy retryPolicy = retryUpToMaximumCountWithProportionalSleep(
+getStageConfig().getManifestSaveAttempts(),
+SAVE_SLEEP_INTERVAL,
+TimeUnit.MILLISECONDS);
+
+// loop until returning a value or raising an exception
+while (true) {
+  try {
+T manifestData = requireNonNull(manifestSource.get());
+trackDurationOfInvocation(getIOStatistics(), statistic, () -> {
+  LOG.info("{}: save manifest to {} then rename as {}'); retry 
count={}",
+  getName(), tempPath, finalPath, retryCount);
+
+  // delete temp path.
+  // even though this is written with overwrite=true, this extra 
recursive
+  // delete also handles a directory being there.
+  deleteRecursive(tempPath, OP_DELETE);
+
+  // save the temp file, overwriting any which remains from an earlier 
attempt
+  operations.save(manifestData, tempPath, true);
+
+  // delete the destination in case it exists either from a failed 
previous
+  // attempt or from a concurrent task commit.
+  delete(finalPath, true, OP_DELETE);
+
+  // rename temp to final
+  renameFile(tempPath, finalPath);

Review Comment:
   This shall be better, as if the recovery also fail, we would not do 
additional HEAD calls for `escalateRenameFailure`. This looks good!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-30 Thread via GitHub


steveloughran commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2086367792

   I've now moved to commitFile() to rename the task manifest, after doing a 
getFileStatus() call first...which means its iO cost is the same as a rename 
with recovery enabled. it does let us see what happened, which we log at WARN. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-30 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1584877464


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/manifest_committer.md:
##
@@ -523,7 +509,7 @@ And optional settings for debugging/performance analysis
 
 ```
 spark.hadoop.mapreduce.outputcommitter.factory.scheme.abfs 
org.apache.hadoop.fs.azurebfs.commit.AzureManifestCommitterFactory
-spark.hadoop.fs.azure.io.rate.limit 1
+spark.hadoop.fs.azure.io.rate.limit 1000

Review Comment:
   good q. The default allocation for the entire cluster is 10K; this ensures 
that a single commit only uses at most 10% of it during renames. in contrast, 
s3 throttles by shard so only things down the same directory tree were 
impacted, and if they were only reading data, not throttled at all.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-30 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1584841337


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/AbstractJobOrTaskStage.java:
##
@@ -582,19 +611,111 @@ protected final Path directoryMustExist(
* Save a task manifest or summary. This will be done by
* writing to a temp path and then renaming.
* If the destination path exists: Delete it.
+   * This will retry so that a rename failure from abfs load or IO errors
+   * will not fail the task.
* @param manifestData the manifest/success file
* @param tempPath temp path for the initial save
* @param finalPath final path for rename.
-   * @throws IOException failure to load/parse
+   * @return the manifest saved.
+   * @throws IOException failure to rename after retries.
*/
   @SuppressWarnings("unchecked")
-  protected final  void save(T manifestData,
+  protected final  T save(
+  final T manifestData,
   final Path tempPath,
   final Path finalPath) throws IOException {
-LOG.trace("{}: save('{}, {}, {}')", getName(), manifestData, tempPath, 
finalPath);
-trackDurationOfInvocation(getIOStatistics(), OP_SAVE_TASK_MANIFEST, () ->
-operations.save(manifestData, tempPath, true));
-renameFile(tempPath, finalPath);
+return saveManifest(() -> manifestData, tempPath, finalPath, 
OP_SAVE_TASK_MANIFEST);
+  }
+
+  /**
+   * Generate and save a task manifest or summary file.
+   * This is be done by writing to a temp path and then renaming.
+   * 
+   * If the destination path exists: Delete it before the rename.
+   * 
+   * This will retry so that a rename failure from abfs load or IO errors
+   * such as delete or save failure will not fail the task.
+   * 
+   * The {@code manifestSource} supplier is invoked to get the manifest data
+   * on every attempt.
+   * This permits statistics to be updated, including those of failures.
+   * @param manifestSource supplier the manifest/success file
+   * @param tempPath temp path for the initial save
+   * @param finalPath final path for rename.
+   * @param statistic statistic to use for timing
+   * @return the manifest saved.
+   * @throws IOException failure to save/delete/rename after retries.
+   */
+  @SuppressWarnings("unchecked")
+  protected final  T saveManifest(
+  final Supplier manifestSource,
+  final Path tempPath,
+  final Path finalPath,
+  String statistic) throws IOException {
+
+AtomicInteger retryCount = new AtomicInteger(0);
+RetryPolicy retryPolicy = retryUpToMaximumCountWithProportionalSleep(
+getStageConfig().getManifestSaveAttempts(),
+SAVE_SLEEP_INTERVAL,
+TimeUnit.MILLISECONDS);
+
+// loop until returning a value or raising an exception
+while (true) {
+  try {
+T manifestData = requireNonNull(manifestSource.get());
+trackDurationOfInvocation(getIOStatistics(), statistic, () -> {
+  LOG.info("{}: save manifest to {} then rename as {}'); retry 
count={}",
+  getName(), tempPath, finalPath, retryCount);
+
+  // delete temp path.
+  // even though this is written with overwrite=true, this extra 
recursive
+  // delete also handles a directory being there.
+  deleteRecursive(tempPath, OP_DELETE);
+
+  // save the temp file, overwriting any which remains from an earlier 
attempt
+  operations.save(manifestData, tempPath, true);
+
+  // delete the destination in case it exists either from a failed 
previous
+  // attempt or from a concurrent task commit.
+  delete(finalPath, true, OP_DELETE);
+
+  // rename temp to final
+  renameFile(tempPath, finalPath);

Review Comment:
   you mean use commitFile() after creating a file entry, so pushing more of 
the recovery down? we could do that. we won't have the etag of the create file 
though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-24 Thread via GitHub


hadoop-yetus commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2074254789

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 05s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m 05s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 05s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  shellcheck  |   0m 05s |  |  Shellcheck was not available.  |
   | +0 :ok: |  shelldocs  |   0m 05s |  |  Shelldocs was not available.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  markdownlint  |   0m 01s |  |  markdownlint was not available.  
|
   | +0 :ok: |  xmllint  |   0m 01s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 10 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 13s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  89m 13s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  39m 10s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   6m 15s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |  16m 30s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |  14m 00s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 170m 25s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 13s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  11m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  37m 21s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |  37m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 01s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   5m 47s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |  16m 05s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |  13m 42s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 178m 33s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 29s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 543m 45s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6716 |
   | Optional Tests | dupname asflicense codespell detsecrets shellcheck 
shelldocs compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs 
checkstyle markdownlint xmllint |
   | uname | MINGW64_NT-10.0-17763 b033f3aa81e5 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 2b38434c46b494f7689acc125a52935d6cb17870 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6716/1/testReport/
 |
   | modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project hadoop-tools/hadoop-azure U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6716/1/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-23 Thread via GitHub


saxenapranav commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1577303182


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/CleanupJobStage.java:
##
@@ -142,64 +154,104 @@ protected Result executeStage(
 }
 
 Outcome outcome = null;
-IOException exception;
+IOException exception = null;
+boolean baseDirDeleted = false;
 
 
 // to delete.
 LOG.info("{}: Deleting job directory {}", getName(), baseDir);
+final long directoryCount = args.directoryCount;
+if (directoryCount > 0) {
+  // log the expected directory count, which drives duration in GCS
+  // and may cause timeouts on azure if the count is too high for a
+  // timely permissions tree scan.
+  LOG.info("{}: Expected directory count: {}", getName(), directoryCount);
+}
 
+progress();
+// check and maybe execute parallel delete of task attempt dirs.
 if (args.deleteTaskAttemptDirsInParallel) {
-  // Attempt to do a parallel delete of task attempt dirs;
-  // don't overreact if a delete fails, but stop trying
-  // to delete the others, and fall back to deleting the
-  // job dir.
-  Path taskSubDir
-  = getStageConfig().getJobAttemptTaskSubDir();
-  try (DurationInfo info = new DurationInfo(LOG,
-  "parallel deletion of task attempts in %s",
-  taskSubDir)) {
-RemoteIterator dirs =
-RemoteIterators.filteringRemoteIterator(
-listStatusIterator(taskSubDir),
-FileStatus::isDirectory);
-TaskPool.foreach(dirs)
-.executeWith(getIOProcessors())
-.stopOnFailure()
-.suppressExceptions(false)
-.run(this::rmTaskAttemptDir);
-getIOStatistics().aggregate((retrieveIOStatistics(dirs)));
-
-if (getLastDeleteException() != null) {
-  // one of the task attempts failed.
-  throw getLastDeleteException();
+
+
+  if (args.parallelDeleteAttemptBaseDeleteFirst) {
+// attempt to delete the base dir first.
+// This can reduce ABFS delete load but may time out
+// (which the fallback to parallel delete will handle).
+// on GCS it is slow.
+try (DurationInfo info = new DurationInfo(LOG, true,
+"Initial delete of %s", baseDir)) {
+  exception = deleteOneDir(baseDir);
+  if (exception == null) {
+// success: record this as the outcome,
+outcome = Outcome.DELETED;
+// and will skip the parallel delete
+baseDirDeleted = true;
+  } else {
+// failure: log and continue
+LOG.warn("{}: Exception on initial attempt at deleting base dir {}"
++ " with directory count {}. Falling back to parallel 
delete",
+getName(), baseDir, directoryCount, exception);
+  }
+}
+  }
+  if (!baseDirDeleted) {
+// no base delete attempted or it failed.
+// Attempt to do a parallel delete of task attempt dirs;
+// don't overreact if a delete fails, but stop trying
+// to delete the others, and fall back to deleting the
+// job dir.
+Path taskSubDir
+= getStageConfig().getJobAttemptTaskSubDir();
+try (DurationInfo info = new DurationInfo(LOG, true,
+"parallel deletion of task attempts in %s",
+taskSubDir)) {
+  RemoteIterator dirs =
+  RemoteIterators.filteringRemoteIterator(
+  listStatusIterator(taskSubDir),
+  FileStatus::isDirectory);
+  TaskPool.foreach(dirs)
+  .executeWith(getIOProcessors())
+  .stopOnFailure()
+  .suppressExceptions(false)
+  .run(this::rmTaskAttemptDir);
+  getIOStatistics().aggregate((retrieveIOStatistics(dirs)));
+
+  if (getLastDeleteException() != null) {
+// one of the task attempts failed.
+throw getLastDeleteException();

Review Comment:
   This will also register an exception raised in `deleteOneDir` in line 183. 
So, although there was no task failure, but would get recorded as task-failure 
in logs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-23 Thread via GitHub


saxenapranav commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1577262719


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/AbstractJobOrTaskStage.java:
##
@@ -582,19 +611,111 @@ protected final Path directoryMustExist(
* Save a task manifest or summary. This will be done by
* writing to a temp path and then renaming.
* If the destination path exists: Delete it.
+   * This will retry so that a rename failure from abfs load or IO errors
+   * will not fail the task.
* @param manifestData the manifest/success file
* @param tempPath temp path for the initial save
* @param finalPath final path for rename.
-   * @throws IOException failure to load/parse
+   * @return the manifest saved.
+   * @throws IOException failure to rename after retries.
*/
   @SuppressWarnings("unchecked")
-  protected final  void save(T manifestData,
+  protected final  T save(
+  final T manifestData,
   final Path tempPath,
   final Path finalPath) throws IOException {
-LOG.trace("{}: save('{}, {}, {}')", getName(), manifestData, tempPath, 
finalPath);
-trackDurationOfInvocation(getIOStatistics(), OP_SAVE_TASK_MANIFEST, () ->
-operations.save(manifestData, tempPath, true));
-renameFile(tempPath, finalPath);
+return saveManifest(() -> manifestData, tempPath, finalPath, 
OP_SAVE_TASK_MANIFEST);
+  }
+
+  /**
+   * Generate and save a task manifest or summary file.
+   * This is be done by writing to a temp path and then renaming.
+   * 
+   * If the destination path exists: Delete it before the rename.
+   * 
+   * This will retry so that a rename failure from abfs load or IO errors
+   * such as delete or save failure will not fail the task.
+   * 
+   * The {@code manifestSource} supplier is invoked to get the manifest data
+   * on every attempt.
+   * This permits statistics to be updated, including those of failures.
+   * @param manifestSource supplier the manifest/success file
+   * @param tempPath temp path for the initial save
+   * @param finalPath final path for rename.
+   * @param statistic statistic to use for timing
+   * @return the manifest saved.
+   * @throws IOException failure to save/delete/rename after retries.
+   */
+  @SuppressWarnings("unchecked")
+  protected final  T saveManifest(
+  final Supplier manifestSource,
+  final Path tempPath,
+  final Path finalPath,
+  String statistic) throws IOException {
+
+AtomicInteger retryCount = new AtomicInteger(0);
+RetryPolicy retryPolicy = retryUpToMaximumCountWithProportionalSleep(
+getStageConfig().getManifestSaveAttempts(),
+SAVE_SLEEP_INTERVAL,
+TimeUnit.MILLISECONDS);
+
+// loop until returning a value or raising an exception
+while (true) {
+  try {
+T manifestData = requireNonNull(manifestSource.get());
+trackDurationOfInvocation(getIOStatistics(), statistic, () -> {
+  LOG.info("{}: save manifest to {} then rename as {}'); retry 
count={}",
+  getName(), tempPath, finalPath, retryCount);
+
+  // delete temp path.
+  // even though this is written with overwrite=true, this extra 
recursive
+  // delete also handles a directory being there.
+  deleteRecursive(tempPath, OP_DELETE);
+
+  // save the temp file, overwriting any which remains from an earlier 
attempt
+  operations.save(manifestData, tempPath, true);
+
+  // delete the destination in case it exists either from a failed 
previous
+  // attempt or from a concurrent task commit.
+  delete(finalPath, true, OP_DELETE);
+
+  // rename temp to final
+  renameFile(tempPath, finalPath);

Review Comment:
   Got your point for the directory case. 
   
   For the first point, I now understand that `executeRenamingOperation` would 
call `escalateRenameFailure` on fs.rename() failure which would raise 
PathIOException. I was thinking if instead of calling `renameFile` if we can do 
`operation.renameFile()` directly and raise exception from there. Reason being, 
`escalateRenameFailure` does a getFileStatus on both src and dst for logging. 
We can save 2 filesystem calls if we know the renameFile for the saveManifest 
has failed. Would like to know your view. But, I am good with this comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@h

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-23 Thread via GitHub


hadoop-yetus commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2073739524

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  12m 17s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  shelldocs  |   0m  1s |  |  Shelldocs was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 10 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 14s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  32m 12s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 30s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  16m 28s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 23s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 35s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   3m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 56s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   6m 38s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m  4s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 34s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  5s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  16m 51s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  15m 57s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  15m 57s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 23s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m 25s |  |  the patch passed  |
   | +1 :green_heart: |  shellcheck  |   0m 24s |  |  No new issues.  |
   | +1 :green_heart: |  javadoc  |   3m  6s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   3m  1s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   7m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 33s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   8m 27s |  |  hadoop-mapreduce-client-core in 
the patch passed.  |
   | +1 :green_heart: |  unit  | 162m  7s |  |  hadoop-mapreduce-project in the 
patch passed.  |
   | +1 :green_heart: |  unit  |   2m 56s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m 18s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 422m 41s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/11/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6716 |
   | Optional Tests | dupname asflicense codespell detsecrets shellcheck 
shelldocs compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs 
checkstyle markdownlint xmllint |
   | uname | Linux 8700e5c7281c 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2b38434c46b494f7689acc125a52935d6cb17870 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/11/testReport/ |
   |

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-23 Thread via GitHub


steveloughran commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2072467711

   One thing I'm considering here, make that "initial attempt at base dir 
delete" a numeric threshold.
   
   good: agile
   bad: harder to test, less consistent; harder to replicate
   
   for now, leaving a simple switch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-23 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1576225937


##
hadoop-tools/hadoop-azure/pom.xml:
##
@@ -45,6 +45,8 @@
 7200
 
10
 
1000
+
+100

Review Comment:
   this is for maven test runs only. s3a fs now defaults to 500, though we have 
to worry about TTL, stale connections etc. as abfs moves to httpclient, these 
things will surface there too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-23 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1576221197


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/AbstractJobOrTaskStage.java:
##
@@ -582,19 +611,111 @@ protected final Path directoryMustExist(
* Save a task manifest or summary. This will be done by
* writing to a temp path and then renaming.
* If the destination path exists: Delete it.
+   * This will retry so that a rename failure from abfs load or IO errors
+   * will not fail the task.
* @param manifestData the manifest/success file
* @param tempPath temp path for the initial save
* @param finalPath final path for rename.
-   * @throws IOException failure to load/parse
+   * @return the manifest saved.
+   * @throws IOException failure to rename after retries.
*/
   @SuppressWarnings("unchecked")
-  protected final  void save(T manifestData,
+  protected final  T save(
+  final T manifestData,
   final Path tempPath,
   final Path finalPath) throws IOException {
-LOG.trace("{}: save('{}, {}, {}')", getName(), manifestData, tempPath, 
finalPath);
-trackDurationOfInvocation(getIOStatistics(), OP_SAVE_TASK_MANIFEST, () ->
-operations.save(manifestData, tempPath, true));
-renameFile(tempPath, finalPath);
+return saveManifest(() -> manifestData, tempPath, finalPath, 
OP_SAVE_TASK_MANIFEST);
+  }
+
+  /**
+   * Generate and save a task manifest or summary file.
+   * This is be done by writing to a temp path and then renaming.
+   * 
+   * If the destination path exists: Delete it before the rename.
+   * 
+   * This will retry so that a rename failure from abfs load or IO errors
+   * such as delete or save failure will not fail the task.
+   * 
+   * The {@code manifestSource} supplier is invoked to get the manifest data
+   * on every attempt.
+   * This permits statistics to be updated, including those of failures.
+   * @param manifestSource supplier the manifest/success file
+   * @param tempPath temp path for the initial save
+   * @param finalPath final path for rename.
+   * @param statistic statistic to use for timing
+   * @return the manifest saved.
+   * @throws IOException failure to save/delete/rename after retries.
+   */
+  @SuppressWarnings("unchecked")
+  protected final  T saveManifest(
+  final Supplier manifestSource,
+  final Path tempPath,
+  final Path finalPath,
+  String statistic) throws IOException {
+
+AtomicInteger retryCount = new AtomicInteger(0);
+RetryPolicy retryPolicy = retryUpToMaximumCountWithProportionalSleep(
+getStageConfig().getManifestSaveAttempts(),
+SAVE_SLEEP_INTERVAL,
+TimeUnit.MILLISECONDS);
+
+// loop until returning a value or raising an exception
+while (true) {
+  try {
+T manifestData = requireNonNull(manifestSource.get());
+trackDurationOfInvocation(getIOStatistics(), statistic, () -> {
+  LOG.info("{}: save manifest to {} then rename as {}'); retry 
count={}",
+  getName(), tempPath, finalPath, retryCount);
+
+  // delete temp path.
+  // even though this is written with overwrite=true, this extra 
recursive
+  // delete also handles a directory being there.
+  deleteRecursive(tempPath, OP_DELETE);
+
+  // save the temp file, overwriting any which remains from an earlier 
attempt
+  operations.save(manifestData, tempPath, true);
+
+  // delete the destination in case it exists either from a failed 
previous
+  // attempt or from a concurrent task commit.
+  delete(finalPath, true, OP_DELETE);
+
+  // rename temp to final
+  renameFile(tempPath, finalPath);

Review Comment:
   1. renameFile javadocs `throws PathIOException – if the rename() call 
returned false.`. so no need to check the result here
   2. directory deletion, maybe: but what is going to create a directory here? 
nothing in the committer will, and if some other process is doing stuff in the 
job attempt dir you are doomed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-21 Thread via GitHub


saxenapranav commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1574200780


##
hadoop-tools/hadoop-azure/pom.xml:
##
@@ -45,6 +45,8 @@
 7200
 
10
 
1000
+
+100

Review Comment:
   Hope its for only test runs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-21 Thread via GitHub


saxenapranav commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1572484303


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/AbstractJobOrTaskStage.java:
##
@@ -582,19 +611,111 @@ protected final Path directoryMustExist(
* Save a task manifest or summary. This will be done by
* writing to a temp path and then renaming.
* If the destination path exists: Delete it.
+   * This will retry so that a rename failure from abfs load or IO errors
+   * will not fail the task.
* @param manifestData the manifest/success file
* @param tempPath temp path for the initial save
* @param finalPath final path for rename.
-   * @throws IOException failure to load/parse
+   * @return the manifest saved.
+   * @throws IOException failure to rename after retries.
*/
   @SuppressWarnings("unchecked")
-  protected final  void save(T manifestData,
+  protected final  T save(
+  final T manifestData,
   final Path tempPath,
   final Path finalPath) throws IOException {
-LOG.trace("{}: save('{}, {}, {}')", getName(), manifestData, tempPath, 
finalPath);
-trackDurationOfInvocation(getIOStatistics(), OP_SAVE_TASK_MANIFEST, () ->
-operations.save(manifestData, tempPath, true));
-renameFile(tempPath, finalPath);
+return saveManifest(() -> manifestData, tempPath, finalPath, 
OP_SAVE_TASK_MANIFEST);
+  }
+
+  /**
+   * Generate and save a task manifest or summary file.
+   * This is be done by writing to a temp path and then renaming.
+   * 
+   * If the destination path exists: Delete it before the rename.
+   * 
+   * This will retry so that a rename failure from abfs load or IO errors
+   * such as delete or save failure will not fail the task.
+   * 
+   * The {@code manifestSource} supplier is invoked to get the manifest data
+   * on every attempt.
+   * This permits statistics to be updated, including those of failures.
+   * @param manifestSource supplier the manifest/success file
+   * @param tempPath temp path for the initial save
+   * @param finalPath final path for rename.
+   * @param statistic statistic to use for timing
+   * @return the manifest saved.
+   * @throws IOException failure to save/delete/rename after retries.
+   */
+  @SuppressWarnings("unchecked")
+  protected final  T saveManifest(
+  final Supplier manifestSource,
+  final Path tempPath,
+  final Path finalPath,
+  String statistic) throws IOException {
+
+AtomicInteger retryCount = new AtomicInteger(0);
+RetryPolicy retryPolicy = retryUpToMaximumCountWithProportionalSleep(
+getStageConfig().getManifestSaveAttempts(),
+SAVE_SLEEP_INTERVAL,
+TimeUnit.MILLISECONDS);
+
+// loop until returning a value or raising an exception
+while (true) {
+  try {
+T manifestData = requireNonNull(manifestSource.get());
+trackDurationOfInvocation(getIOStatistics(), statistic, () -> {
+  LOG.info("{}: save manifest to {} then rename as {}'); retry 
count={}",
+  getName(), tempPath, finalPath, retryCount);
+
+  // delete temp path.
+  // even though this is written with overwrite=true, this extra 
recursive
+  // delete also handles a directory being there.
+  deleteRecursive(tempPath, OP_DELETE);
+
+  // save the temp file, overwriting any which remains from an earlier 
attempt
+  operations.save(manifestData, tempPath, true);
+
+  // delete the destination in case it exists either from a failed 
previous
+  // attempt or from a concurrent task commit.
+  delete(finalPath, true, OP_DELETE);
+
+  // rename temp to final
+  renameFile(tempPath, finalPath);

Review Comment:
   There can be a parallel process which might create a directory in between 
line 680 and 683, should we check post line 683, if finalPath is a file or not?



##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/AbstractJobOrTaskStage.java:
##
@@ -582,19 +611,111 @@ protected final Path directoryMustExist(
* Save a task manifest or summary. This will be done by
* writing to a temp path and then renaming.
* If the destination path exists: Delete it.
+   * This will retry so that a rename failure from abfs load or IO errors
+   * will not fail the task.
* @param manifestData the manifest/success file
* @param tempPath temp path for the initial save
* @param finalPath final path for rename.
-   * @throws IOException failure to load/parse
+   * @return the manifest saved.
+   * @throws IOException failure to rename after retries.
*/
   @SuppressWarnings("unchecked")
-  protected final  void save(T manifestData,
+  protected final  T save(
+  final 

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-19 Thread via GitHub


hadoop-yetus commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2067365776

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 34s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  shelldocs  |   0m  0s |  |  Shelldocs was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 10 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m  7s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  32m  9s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  16m 11s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 22s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 31s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   3m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 58s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   6m 39s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m  6s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  7s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  16m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 14s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  16m 14s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   4m 17s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/10/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 2 new + 22 unchanged - 0 fixed = 24 total (was 
22)  |
   | +1 :green_heart: |  mvnsite  |   3m 23s |  |  the patch passed  |
   | +1 :green_heart: |  shellcheck  |   0m 23s |  |  No new issues.  |
   | +1 :green_heart: |  javadoc  |   3m  7s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   3m  1s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   7m 18s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 19s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   7m 54s |  |  hadoop-mapreduce-client-core in 
the patch passed.  |
   | +1 :green_heart: |  unit  | 161m 19s |  |  hadoop-mapreduce-project in the 
patch passed.  |
   | +1 :green_heart: |  unit  |   2m 44s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m 17s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 408m 51s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/10/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6716 |
   | Optional Tests | dupname asflicense codespell detsecrets shellcheck 
shelldocs compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs 
checkstyle markdownlint xmllint |
   | uname | Linux 1cd1cb0cbe32 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / cd40e7ff0138412044f977f5d4d079494c6d7dd0 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-18 Thread via GitHub


hadoop-yetus commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2065738026

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 34s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  shelldocs  |   0m  0s |  |  Shelldocs was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 10 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 57s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  32m 39s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  16m 20s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 19s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 33s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   3m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   3m  1s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   6m 39s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m 11s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   1m 58s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 49s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  16m 49s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 15s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  16m 15s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/9/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   4m 17s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/9/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 10 new + 22 unchanged - 0 fixed = 32 total (was 
22)  |
   | +1 :green_heart: |  mvnsite  |   3m 24s |  |  the patch passed  |
   | +1 :green_heart: |  shellcheck  |   0m 23s |  |  No new issues.  |
   | +1 :green_heart: |  javadoc  |   3m  9s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   3m  2s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   7m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 28s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   7m 59s |  |  hadoop-mapreduce-client-core in 
the patch passed.  |
   | +1 :green_heart: |  unit  | 160m 38s |  |  hadoop-mapreduce-project in the 
patch passed.  |
   | +1 :green_heart: |  unit  |   2m 43s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m 14s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 410m 46s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6716 |
   | Optional Tests | dupname asflicense codespell detsecrets shellcheck 
shelldocs compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs 
checkstyle markdownlint xmllint |
   | uname | Linux 2d173658385e 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / abed2feab2c31e845bf95bc4868c2cfd0

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-18 Thread via GitHub


steveloughran commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2064541193

   @snvijaya we actually know the total number of subdirs for the deletion!
   
   it is propagated via the manifests: each TA manifest includes the #of dirs 
as an IOStatistic, the aggregate summary adds these all up.
   
   the number of paths under the job dir is that number (counter 
committer_task_directory_count ) + any of failed task attempts.
   
   which means we could actually have a threshold of how many subdirectories 
will trigger an automatic switch to parallel delete.
   
   I'm just going to pass this down and log immediately before the cleanup 
kicks off, so if there are problems we will get the diagnostics adjacent to the 
error.
   
   Note that your details on retry timings imply that on a mapreduce job 
(rather than spark one) the progress() callback will not take place -so there's 
a risk that the job will actually timeout. I don't think that's an issue in MR 
job actions, the way it is is in task-side actions where a heartbeat back to 
the MapRed AM is required.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-17 Thread via GitHub


hadoop-yetus commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2062530498

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 32s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 7 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 43s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  32m 23s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 35s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  16m  7s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 19s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 55s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   2m 53s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m 12s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 47s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  16m 47s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  16m 25s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   4m 22s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/8/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 2 new + 22 unchanged - 0 fixed = 24 total (was 
22)  |
   | +1 :green_heart: |  mvnsite  |   1m 54s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 15s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 26s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   7m 50s |  |  hadoop-mapreduce-client-core in 
the patch passed.  |
   | +1 :green_heart: |  unit  |   2m 30s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  3s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 228m 23s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6716 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
xmllint |
   | uname | Linux 2eeac626505b 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 3e5e1e61d99b9a784f1f324f5875792da9b49406 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/8/testReport/ |
   | Max. process+thread count | 1368 (vs. ulimit of 5500) |
   | modules | C

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-17 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1569270966


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/CleanupJobStage.java:
##
@@ -142,64 +154,93 @@ protected Result executeStage(
 }
 
 Outcome outcome = null;
-IOException exception;
+IOException exception = null;
+boolean baseDirDeleted = false;
 
 
 // to delete.
 LOG.info("{}: Deleting job directory {}", getName(), baseDir);
 
 if (args.deleteTaskAttemptDirsInParallel) {
-  // Attempt to do a parallel delete of task attempt dirs;
-  // don't overreact if a delete fails, but stop trying
-  // to delete the others, and fall back to deleting the
-  // job dir.
-  Path taskSubDir
-  = getStageConfig().getJobAttemptTaskSubDir();
-  try (DurationInfo info = new DurationInfo(LOG,
-  "parallel deletion of task attempts in %s",
-  taskSubDir)) {
-RemoteIterator dirs =
-RemoteIterators.filteringRemoteIterator(
-listStatusIterator(taskSubDir),
-FileStatus::isDirectory);
-TaskPool.foreach(dirs)
-.executeWith(getIOProcessors())
-.stopOnFailure()
-.suppressExceptions(false)
-.run(this::rmTaskAttemptDir);
-getIOStatistics().aggregate((retrieveIOStatistics(dirs)));
-
-if (getLastDeleteException() != null) {
-  // one of the task attempts failed.
-  throw getLastDeleteException();
+
+  // parallel delete of task attempt dirs.
+
+  if (args.parallelDeleteAttemptBaseDeleteFirst) {
+// attempt to delete the base dir first.
+// This can reduce ABFS delete load but may time out
+// (which the fallback to parallel delete will handle).
+// on GCS it is slow.
+try (DurationInfo info = new DurationInfo(LOG, true,
+"Initial delete of %s", baseDir)) {
+  exception = deleteOneDir(baseDir);
+  if (exception == null) {
+// success: record this as the outcome, which
+// will skip the parallel delete.
+outcome = Outcome.DELETED;
+baseDirDeleted = true;
+  } else {
+// failure: log and continue
+LOG.warn("{}: Exception on initial attempt at deleting base dir 
{}\n"
++ "attempting parallel delete",
+getName(), baseDir, exception);
+  }
+}
+  }
+  if (!baseDirDeleted) {
+// no base delete attempted or it failed.
+// Attempt to do a parallel delete of task attempt dirs;
+// don't overreact if a delete fails, but stop trying
+// to delete the others, and fall back to deleting the
+// job dir.
+Path taskSubDir
+= getStageConfig().getJobAttemptTaskSubDir();
+try (DurationInfo info = new DurationInfo(LOG, true,
+"parallel deletion of task attempts in %s",
+taskSubDir)) {
+  RemoteIterator dirs =
+  RemoteIterators.filteringRemoteIterator(
+  listStatusIterator(taskSubDir),
+  FileStatus::isDirectory);
+  TaskPool.foreach(dirs)
+  .executeWith(getIOProcessors())
+  .stopOnFailure()
+  .suppressExceptions(false)
+  .run(this::rmTaskAttemptDir);
+  getIOStatistics().aggregate((retrieveIOStatistics(dirs)));
+
+  if (getLastDeleteException() != null) {
+// one of the task attempts failed.
+throw getLastDeleteException();

Review Comment:
   good q. 
   rmTaskAttemptDir will not trigger a failure, just note and store the error.
   on L213 one of any such IOEs is rethrown, but that just gets to L222 which 
logs and then goes on to the final root dir attempt again
   
   so it should be recovered from...though if the first delete timed out then 
that root delete on L233 is probably doomed too...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-17 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1569264943


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/CleanupJobStage.java:
##
@@ -142,64 +154,93 @@ protected Result executeStage(
 }
 
 Outcome outcome = null;
-IOException exception;
+IOException exception = null;
+boolean baseDirDeleted = false;
 
 
 // to delete.
 LOG.info("{}: Deleting job directory {}", getName(), baseDir);
 
 if (args.deleteTaskAttemptDirsInParallel) {
-  // Attempt to do a parallel delete of task attempt dirs;
-  // don't overreact if a delete fails, but stop trying
-  // to delete the others, and fall back to deleting the
-  // job dir.
-  Path taskSubDir
-  = getStageConfig().getJobAttemptTaskSubDir();
-  try (DurationInfo info = new DurationInfo(LOG,
-  "parallel deletion of task attempts in %s",
-  taskSubDir)) {
-RemoteIterator dirs =
-RemoteIterators.filteringRemoteIterator(
-listStatusIterator(taskSubDir),
-FileStatus::isDirectory);
-TaskPool.foreach(dirs)
-.executeWith(getIOProcessors())
-.stopOnFailure()
-.suppressExceptions(false)
-.run(this::rmTaskAttemptDir);
-getIOStatistics().aggregate((retrieveIOStatistics(dirs)));
-
-if (getLastDeleteException() != null) {
-  // one of the task attempts failed.
-  throw getLastDeleteException();
+
+  // parallel delete of task attempt dirs.
+
+  if (args.parallelDeleteAttemptBaseDeleteFirst) {
+// attempt to delete the base dir first.
+// This can reduce ABFS delete load but may time out
+// (which the fallback to parallel delete will handle).
+// on GCS it is slow.
+try (DurationInfo info = new DurationInfo(LOG, true,
+"Initial delete of %s", baseDir)) {
+  exception = deleteOneDir(baseDir);
+  if (exception == null) {
+// success: record this as the outcome, which
+// will skip the parallel delete.
+outcome = Outcome.DELETED;
+baseDirDeleted = true;
+  } else {
+// failure: log and continue
+LOG.warn("{}: Exception on initial attempt at deleting base dir 
{}\n"
++ "attempting parallel delete",
+getName(), baseDir, exception);
+  }
+}
+  }
+  if (!baseDirDeleted) {

Review Comment:
   note that the next stage will fail immediately on the list() call, which is 
caught as a FNFE and treated as success



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-17 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1569262867


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/CleanupJobStage.java:
##
@@ -142,64 +154,93 @@ protected Result executeStage(
 }
 
 Outcome outcome = null;
-IOException exception;
+IOException exception = null;
+boolean baseDirDeleted = false;
 
 
 // to delete.
 LOG.info("{}: Deleting job directory {}", getName(), baseDir);
 
 if (args.deleteTaskAttemptDirsInParallel) {
-  // Attempt to do a parallel delete of task attempt dirs;
-  // don't overreact if a delete fails, but stop trying
-  // to delete the others, and fall back to deleting the
-  // job dir.
-  Path taskSubDir
-  = getStageConfig().getJobAttemptTaskSubDir();
-  try (DurationInfo info = new DurationInfo(LOG,
-  "parallel deletion of task attempts in %s",
-  taskSubDir)) {
-RemoteIterator dirs =
-RemoteIterators.filteringRemoteIterator(
-listStatusIterator(taskSubDir),
-FileStatus::isDirectory);
-TaskPool.foreach(dirs)
-.executeWith(getIOProcessors())
-.stopOnFailure()
-.suppressExceptions(false)
-.run(this::rmTaskAttemptDir);
-getIOStatistics().aggregate((retrieveIOStatistics(dirs)));
-
-if (getLastDeleteException() != null) {
-  // one of the task attempts failed.
-  throw getLastDeleteException();
+
+  // parallel delete of task attempt dirs.
+
+  if (args.parallelDeleteAttemptBaseDeleteFirst) {
+// attempt to delete the base dir first.
+// This can reduce ABFS delete load but may time out
+// (which the fallback to parallel delete will handle).
+// on GCS it is slow.
+try (DurationInfo info = new DurationInfo(LOG, true,
+"Initial delete of %s", baseDir)) {
+  exception = deleteOneDir(baseDir);
+  if (exception == null) {
+// success: record this as the outcome, which
+// will skip the parallel delete.
+outcome = Outcome.DELETED;
+baseDirDeleted = true;
+  } else {
+// failure: log and continue
+LOG.warn("{}: Exception on initial attempt at deleting base dir 
{}\n"
++ "attempting parallel delete",
+getName(), baseDir, exception);
+  }
+}
+  }
+  if (!baseDirDeleted) {

Review Comment:
   it gets set on L180; will comment 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-17 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1569233047


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/CleanupJobStage.java:
##
@@ -142,64 +154,93 @@ protected Result executeStage(
 }
 
 Outcome outcome = null;
-IOException exception;
+IOException exception = null;
+boolean baseDirDeleted = false;
 
 
 // to delete.
 LOG.info("{}: Deleting job directory {}", getName(), baseDir);
 
 if (args.deleteTaskAttemptDirsInParallel) {
-  // Attempt to do a parallel delete of task attempt dirs;
-  // don't overreact if a delete fails, but stop trying
-  // to delete the others, and fall back to deleting the
-  // job dir.
-  Path taskSubDir
-  = getStageConfig().getJobAttemptTaskSubDir();
-  try (DurationInfo info = new DurationInfo(LOG,
-  "parallel deletion of task attempts in %s",
-  taskSubDir)) {
-RemoteIterator dirs =
-RemoteIterators.filteringRemoteIterator(
-listStatusIterator(taskSubDir),
-FileStatus::isDirectory);
-TaskPool.foreach(dirs)
-.executeWith(getIOProcessors())
-.stopOnFailure()
-.suppressExceptions(false)
-.run(this::rmTaskAttemptDir);
-getIOStatistics().aggregate((retrieveIOStatistics(dirs)));
-
-if (getLastDeleteException() != null) {
-  // one of the task attempts failed.
-  throw getLastDeleteException();
+
+  // parallel delete of task attempt dirs.
+
+  if (args.parallelDeleteAttemptBaseDeleteFirst) {
+// attempt to delete the base dir first.
+// This can reduce ABFS delete load but may time out
+// (which the fallback to parallel delete will handle).
+// on GCS it is slow.
+try (DurationInfo info = new DurationInfo(LOG, true,
+"Initial delete of %s", baseDir)) {
+  exception = deleteOneDir(baseDir);
+  if (exception == null) {

Review Comment:
   ooh, so it's going to be quite a long time to fall back.
   I'm going to make the option default to false for now.
   
   > AzureManifestCommitterFactory could probably set this config to 0 before 
FileSystem.get() call happens.
   
   it'll come from the cache, we don't want to set it for everything else, but 
a low MAX_RETRIES_RECURSIVE_DELETE might make sense everywhere. something to 
consider later. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-17 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1569225918


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/AbstractJobOrTaskStage.java:
##
@@ -582,19 +607,64 @@ protected final Path directoryMustExist(
* Save a task manifest or summary. This will be done by
* writing to a temp path and then renaming.
* If the destination path exists: Delete it.
+   * This will retry so that a rename failure from abfs load or IO errors
+   * will not fail the task.
* @param manifestData the manifest/success file
* @param tempPath temp path for the initial save
* @param finalPath final path for rename.
-   * @throws IOException failure to load/parse
+   * @throws IOException failure to rename after retries.
*/
   @SuppressWarnings("unchecked")
   protected final  void save(T manifestData,
   final Path tempPath,
   final Path finalPath) throws IOException {
-LOG.trace("{}: save('{}, {}, {}')", getName(), manifestData, tempPath, 
finalPath);
-trackDurationOfInvocation(getIOStatistics(), OP_SAVE_TASK_MANIFEST, () ->
-operations.save(manifestData, tempPath, true));
-renameFile(tempPath, finalPath);
+
+int retryCount = 0;
+RetryPolicy retryPolicy = retryUpToMaximumCountWithProportionalSleep(
+getStageConfig().getManifestSaveAttempts(),
+SAVE_SLEEP_INTERVAL,
+TimeUnit.MILLISECONDS);
+boolean success = false;
+while (!success) {
+  try {
+LOG.info("{}: save manifest to {} then rename as {}'); retry count={}",
+getName(), tempPath, finalPath, retryCount);
+
+trackDurationOfInvocation(getIOStatistics(), OP_SAVE_TASK_MANIFEST, () 
->
+operations.save(manifestData, tempPath, true));

Review Comment:
   so renameFile() has always deleted the destination because we need to do 
that to cope with failures of a previous/concurrent task attempt. Whoever 
commits last wins.
   
   To make this clearer I'm pulling up more of the code into this method and 
adding comments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-17 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1569228824


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/AbstractJobOrTaskStage.java:
##
@@ -582,19 +605,46 @@ protected final Path directoryMustExist(
* Save a task manifest or summary. This will be done by
* writing to a temp path and then renaming.
* If the destination path exists: Delete it.
+   * This will retry so that a rename failure from abfs load or IO errors
+   * will not fail the task.
* @param manifestData the manifest/success file
* @param tempPath temp path for the initial save
* @param finalPath final path for rename.
-   * @throws IOException failure to load/parse
+   * @throws IOException failure to rename after retries.
*/
   @SuppressWarnings("unchecked")
   protected final  void save(T manifestData,
   final Path tempPath,
   final Path finalPath) throws IOException {
-LOG.trace("{}: save('{}, {}, {}')", getName(), manifestData, tempPath, 
finalPath);
-trackDurationOfInvocation(getIOStatistics(), OP_SAVE_TASK_MANIFEST, () ->
-operations.save(manifestData, tempPath, true));
-renameFile(tempPath, finalPath);
+boolean success = false;
+int failures = 0;
+while (!success) {
+  try {
+LOG.trace("{}: attempt {} save('{}, {}, {}')",
+getName(), failures, manifestData, tempPath, finalPath);
+
+trackDurationOfInvocation(getIOStatistics(), OP_SAVE_TASK_MANIFEST, () 
->
+operations.save(manifestData, tempPath, true));
+renameFile(tempPath, finalPath);

Review Comment:
   any error raised during rename triggers fallback of
   * catch IOE
   * save temp file again
   * delete dest path
   * rename temp path to final path
   
   this is attempted a configurable number of times, with a sleep in between.
   no attempt to be clever about which IOEs are unrecoverable (permissions 
etc), just catch, log, retry



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-17 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1569210996


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/AbstractJobOrTaskStage.java:
##
@@ -445,9 +448,29 @@ protected Boolean delete(
   final boolean recursive,
   final String statistic)
   throws IOException {
-return trackDuration(getIOStatistics(), statistic, () -> {
-  return operations.delete(path, recursive);
-});
+if (recursive) {

Review Comment:
   ok, deleteDir will also delete a file. let me highlight that.
   
   I'd done this delete dir/file split to support different capacity requests, 
without that it is a bit over-complex. it does let us collect different 
statistics though, which may be useful



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-17 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1569203962


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/ManifestCommitterConstants.java:
##
@@ -143,6 +145,20 @@ public final class ManifestCommitterConstants {
*/
   public static final boolean OPT_CLEANUP_PARALLEL_DELETE_DIRS_DEFAULT = true;
 
+  /**
+   * Should parallel cleanup try to delete teh base first?
+   * Best for azure as it skips the task attempt deletions unless
+   * the toplevel delete fails.
+   * Value: {@value}.
+   */
+  public static final String OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST =
+  OPT_PREFIX + "cleanup.parallel.delete.base.first";
+
+  /**
+   * Default value of option {@link #OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST}:  
{@value}.
+   */
+  public static final boolean OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST_DEFAULT = 
true;

Review Comment:
   really don't know here. In the docs I try to cover this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-17 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1569201237


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/impl/ManifestStoreOperations.java:
##
@@ -97,6 +97,36 @@ public boolean isFile(Path path) throws IOException {
   public abstract boolean delete(Path path, boolean recursive)
   throws IOException;
 
+  /**
+   * Forward to {@code delete(Path, true)}
+   * unless overridden.
+   * 
+   * If it returns without an error: there is no file at
+   * the end of the path.
+   * @param path path
+   * @return outcome
+   * @throws IOException failure.
+   */
+  public boolean deleteFile(Path path)
+  throws IOException {
+return delete(path, false);

Review Comment:
   no it won't



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-17 Thread via GitHub


steveloughran commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1569200611


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/impl/ManifestStoreOperations.java:
##
@@ -97,6 +97,36 @@ public boolean isFile(Path path) throws IOException {
   public abstract boolean delete(Path path, boolean recursive)
   throws IOException;
 
+  /**
+   * Forward to {@code delete(Path, true)}
+   * unless overridden.
+   * 
+   * If it returns without an error: there is no file at
+   * the end of the path.
+   * @param path path
+   * @return outcome
+   * @throws IOException failure.
+   */
+  public boolean deleteFile(Path path)
+  throws IOException {
+return delete(path, false);
+  }
+
+  /**
+   * Acquire the delete capacity then call {@code FileSystem#delete(Path, 
true)}

Review Comment:
   aah, I'd cut that. leaving delete capacity out of this PR as it'd need rate 
limiting in abfs *and* guessing about size/depth of directory. which we could 
actually add in future task manifests, leaving only the homework of aggregating 
it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-16 Thread via GitHub


snvijaya commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1567404776


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/CleanupJobStage.java:
##
@@ -142,64 +154,93 @@ protected Result executeStage(
 }
 
 Outcome outcome = null;
-IOException exception;
+IOException exception = null;
+boolean baseDirDeleted = false;
 
 
 // to delete.
 LOG.info("{}: Deleting job directory {}", getName(), baseDir);
 
 if (args.deleteTaskAttemptDirsInParallel) {
-  // Attempt to do a parallel delete of task attempt dirs;
-  // don't overreact if a delete fails, but stop trying
-  // to delete the others, and fall back to deleting the
-  // job dir.
-  Path taskSubDir
-  = getStageConfig().getJobAttemptTaskSubDir();
-  try (DurationInfo info = new DurationInfo(LOG,
-  "parallel deletion of task attempts in %s",
-  taskSubDir)) {
-RemoteIterator dirs =
-RemoteIterators.filteringRemoteIterator(
-listStatusIterator(taskSubDir),
-FileStatus::isDirectory);
-TaskPool.foreach(dirs)
-.executeWith(getIOProcessors())
-.stopOnFailure()
-.suppressExceptions(false)
-.run(this::rmTaskAttemptDir);
-getIOStatistics().aggregate((retrieveIOStatistics(dirs)));
-
-if (getLastDeleteException() != null) {
-  // one of the task attempts failed.
-  throw getLastDeleteException();
+
+  // parallel delete of task attempt dirs.
+
+  if (args.parallelDeleteAttemptBaseDeleteFirst) {
+// attempt to delete the base dir first.
+// This can reduce ABFS delete load but may time out
+// (which the fallback to parallel delete will handle).
+// on GCS it is slow.
+try (DurationInfo info = new DurationInfo(LOG, true,
+"Initial delete of %s", baseDir)) {
+  exception = deleteOneDir(baseDir);
+  if (exception == null) {

Review Comment:
   As added by you in this logic, when the directory tree is very large and is 
over OAuth authentication, Azure cloud could fail the baseDir delete due to 
exhaustive ACL permissions checks. But this delete will entry the retry loop as 
it was request timeout and for this scenario all the retries too might fail and 
can take a while to report failure with backoff and retry attempts as per 
AZURE_MAX_IO_RETRIES (default value 30). 
   
   Default max retry count is 30 today just to ensure any 5-10 min 
network/service transient failures do not lead failures of long running 
workloads. 
   
   If this logic to attempt basedir delete before falling back to parallel 
deletes, is optimal only for Azure cloud, we could look for ways to fail fast 
for Delete with recursive.
   
   Would this work - Add a new config MAX_RETRIES_RECURSIVE_DELETE which by 
default will be the same as AZURE_MAX_IO_RETRIES in ABFS driver. 
AzureManifestCommitterFactory could probably set this config to 0 before 
FileSystem.get() call happens.
   
   If this sounds ok, we can look into changes needed in AbfsClient, 
AbfsRestOperation and ExponentialRetry to make MAX_RETRIES_RECURSIVE_DELETE 
config effective. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-16 Thread via GitHub


anmolanmol1234 commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1567132937


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/ManifestCommitterConstants.java:
##
@@ -143,6 +145,20 @@ public final class ManifestCommitterConstants {
*/
   public static final boolean OPT_CLEANUP_PARALLEL_DELETE_DIRS_DEFAULT = true;
 
+  /**
+   * Should parallel cleanup try to delete teh base first?

Review Comment:
   typo: the



##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/CleanupJobStage.java:
##
@@ -142,64 +154,93 @@ protected Result executeStage(
 }
 
 Outcome outcome = null;
-IOException exception;
+IOException exception = null;
+boolean baseDirDeleted = false;
 
 
 // to delete.
 LOG.info("{}: Deleting job directory {}", getName(), baseDir);
 
 if (args.deleteTaskAttemptDirsInParallel) {
-  // Attempt to do a parallel delete of task attempt dirs;
-  // don't overreact if a delete fails, but stop trying
-  // to delete the others, and fall back to deleting the
-  // job dir.
-  Path taskSubDir
-  = getStageConfig().getJobAttemptTaskSubDir();
-  try (DurationInfo info = new DurationInfo(LOG,
-  "parallel deletion of task attempts in %s",
-  taskSubDir)) {
-RemoteIterator dirs =
-RemoteIterators.filteringRemoteIterator(
-listStatusIterator(taskSubDir),
-FileStatus::isDirectory);
-TaskPool.foreach(dirs)
-.executeWith(getIOProcessors())
-.stopOnFailure()
-.suppressExceptions(false)
-.run(this::rmTaskAttemptDir);
-getIOStatistics().aggregate((retrieveIOStatistics(dirs)));
-
-if (getLastDeleteException() != null) {
-  // one of the task attempts failed.
-  throw getLastDeleteException();
+
+  // parallel delete of task attempt dirs.
+
+  if (args.parallelDeleteAttemptBaseDeleteFirst) {
+// attempt to delete the base dir first.
+// This can reduce ABFS delete load but may time out
+// (which the fallback to parallel delete will handle).
+// on GCS it is slow.
+try (DurationInfo info = new DurationInfo(LOG, true,
+"Initial delete of %s", baseDir)) {
+  exception = deleteOneDir(baseDir);
+  if (exception == null) {
+// success: record this as the outcome, which
+// will skip the parallel delete.
+outcome = Outcome.DELETED;
+baseDirDeleted = true;
+  } else {
+// failure: log and continue
+LOG.warn("{}: Exception on initial attempt at deleting base dir 
{}\n"
++ "attempting parallel delete",
+getName(), baseDir, exception);
+  }
+}
+  }
+  if (!baseDirDeleted) {

Review Comment:
   Do we need to change the state of this variable here because if not it will 
go to the next if block



##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/CleanupJobStage.java:
##
@@ -142,64 +154,93 @@ protected Result executeStage(
 }
 
 Outcome outcome = null;
-IOException exception;
+IOException exception = null;
+boolean baseDirDeleted = false;
 
 
 // to delete.
 LOG.info("{}: Deleting job directory {}", getName(), baseDir);
 
 if (args.deleteTaskAttemptDirsInParallel) {
-  // Attempt to do a parallel delete of task attempt dirs;
-  // don't overreact if a delete fails, but stop trying
-  // to delete the others, and fall back to deleting the
-  // job dir.
-  Path taskSubDir
-  = getStageConfig().getJobAttemptTaskSubDir();
-  try (DurationInfo info = new DurationInfo(LOG,
-  "parallel deletion of task attempts in %s",
-  taskSubDir)) {
-RemoteIterator dirs =
-RemoteIterators.filteringRemoteIterator(
-listStatusIterator(taskSubDir),
-FileStatus::isDirectory);
-TaskPool.foreach(dirs)
-.executeWith(getIOProcessors())
-.stopOnFailure()
-.suppressExceptions(false)
-.run(this::rmTaskAttemptDir);
-getIOStatistics().aggregate((retrieveIOStatistics(dirs)));
-
-if (getLastDeleteException() != null) {
-  // one of the task attempts failed.
-  throw getLastDeleteException();
+
+  // parallel delete of task attempt dirs.
+
+  if (args.parallelDeleteAttemptBaseDeleteFirst) {
+// attempt to delete the base dir first.
+// This can reduce ABFS delete load but may time out
+  

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-15 Thread via GitHub


hadoop-yetus commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2058026632

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 54s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 7 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 58s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  32m 39s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  16m  8s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 23s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 57s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   2m 54s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m 10s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 34s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  1s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 43s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  16m 43s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  16m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   4m 17s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/7/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 22 unchanged - 0 fixed = 23 total (was 
22)  |
   | +1 :green_heart: |  mvnsite  |   1m 52s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 20s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   7m 55s |  |  hadoop-mapreduce-client-core in 
the patch passed.  |
   | +1 :green_heart: |  unit  |   2m 40s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  4s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 229m 15s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/7/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6716 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
xmllint |
   | uname | Linux 34da63c6214c 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 9193085fea42ef84e475ed84303fc39c199ffd67 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/7/testReport/ |
   | Max. process+thread count | 1585 (vs. ulimit of 5500) |
   | modules | C

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-15 Thread via GitHub


mukund-thakur commented on code in PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#discussion_r1566388267


##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/AbstractJobOrTaskStage.java:
##
@@ -582,19 +605,46 @@ protected final Path directoryMustExist(
* Save a task manifest or summary. This will be done by
* writing to a temp path and then renaming.
* If the destination path exists: Delete it.
+   * This will retry so that a rename failure from abfs load or IO errors
+   * will not fail the task.
* @param manifestData the manifest/success file
* @param tempPath temp path for the initial save
* @param finalPath final path for rename.
-   * @throws IOException failure to load/parse
+   * @throws IOException failure to rename after retries.
*/
   @SuppressWarnings("unchecked")
   protected final  void save(T manifestData,
   final Path tempPath,
   final Path finalPath) throws IOException {
-LOG.trace("{}: save('{}, {}, {}')", getName(), manifestData, tempPath, 
finalPath);
-trackDurationOfInvocation(getIOStatistics(), OP_SAVE_TASK_MANIFEST, () ->
-operations.save(manifestData, tempPath, true));
-renameFile(tempPath, finalPath);
+boolean success = false;
+int failures = 0;
+while (!success) {
+  try {
+LOG.trace("{}: attempt {} save('{}, {}, {}')",
+getName(), failures, manifestData, tempPath, finalPath);
+
+trackDurationOfInvocation(getIOStatistics(), OP_SAVE_TASK_MANIFEST, () 
->
+operations.save(manifestData, tempPath, true));
+renameFile(tempPath, finalPath);

Review Comment:
   rename may fail in second retry with SourcePathNotFound exception but it 
actually succeeded during the first try only. How are we recovering from that? 



##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/AbstractJobOrTaskStage.java:
##
@@ -445,9 +448,29 @@ protected Boolean delete(
   final boolean recursive,
   final String statistic)
   throws IOException {
-return trackDuration(getIOStatistics(), statistic, () -> {
-  return operations.delete(path, recursive);
-});
+if (recursive) {

Review Comment:
   unable to understand this. How is a recursive flag determining that it is a 
dir or a file?



##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/impl/ManifestStoreOperations.java:
##
@@ -97,6 +97,36 @@ public boolean isFile(Path path) throws IOException {
   public abstract boolean delete(Path path, boolean recursive)
   throws IOException;
 
+  /**
+   * Forward to {@code delete(Path, true)}
+   * unless overridden.
+   * 
+   * If it returns without an error: there is no file at
+   * the end of the path.
+   * @param path path
+   * @return outcome
+   * @throws IOException failure.
+   */
+  public boolean deleteFile(Path path)
+  throws IOException {
+return delete(path, false);
+  }
+
+  /**
+   * Acquire the delete capacity then call {@code FileSystem#delete(Path, 
true)}

Review Comment:
   don't see the ratelimiter acquiring the deletecapacity. 



##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/ManifestCommitterConstants.java:
##
@@ -143,6 +145,20 @@ public final class ManifestCommitterConstants {
*/
   public static final boolean OPT_CLEANUP_PARALLEL_DELETE_DIRS_DEFAULT = true;
 
+  /**
+   * Should parallel cleanup try to delete teh base first?
+   * Best for azure as it skips the task attempt deletions unless
+   * the toplevel delete fails.
+   * Value: {@value}.
+   */
+  public static final String OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST =
+  OPT_PREFIX + "cleanup.parallel.delete.base.first";
+
+  /**
+   * Default value of option {@link #OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST}:  
{@value}.
+   */
+  public static final boolean OPT_CLEANUP_PARALLEL_DELETE_BASE_FIRST_DEFAULT = 
true;

Review Comment:
   As it is bad for GCS, shouldn't the default be false?



##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/manifest_committer.md:
##
@@ -505,7 +491,7 @@ The core set of Azure-optimized options becomes
 
 
   spark.hadoop.fs.azure.io.rate.limit
-  1
+  1000

Review Comment:
   I guess you reduced this for testing? 



##
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/markdown/manifest_committer.md:
##
@@ -523,7 +509,7 @@ And optional settings for debugging/performance analysis
 
 ```
 spark.hadoop.mapreduce.outputcommitter.factory.sc

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-12 Thread via GitHub


hadoop-yetus commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2052490254

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 31s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 7 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m  2s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  32m 11s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  16m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 21s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 56s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   2m 55s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m 34s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  3s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  16m 50s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  16m 28s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 14s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 48s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m 30s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   7m 52s |  |  hadoop-mapreduce-client-core in 
the patch passed.  |
   | +1 :green_heart: |  unit  |   2m 33s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  4s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 230m 36s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6716 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux 412988d12e6f 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / a3117cf5bf7ba3a0cc4ffa0f3b1912241fa13c05 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/5/testReport/ |
   | Max. process+thread count | 1592 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-tools/hadoop-azure U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/5/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Po

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-12 Thread via GitHub


hadoop-yetus commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2052380306

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 19s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 7 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 19s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  19m 45s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   8m 54s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   8m 10s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   2m  4s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 55s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 34s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   8m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   8m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   2m  1s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 10s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   2m  2s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 49s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   5m 44s |  |  hadoop-mapreduce-client-core in 
the patch passed.  |
   | +1 :green_heart: |  unit  |   2m  2s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 138m 37s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6716 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux b03fe1ba958a 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 16e1be4a39e8415ecb45ddd3c43c70c9a35f01b0 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/6/testReport/ |
   | Max. process+thread count | 1530 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-tools/hadoop-azure U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/6/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Po

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-11 Thread via GitHub


hadoop-yetus commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2050651681

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 32s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 7 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 46s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  32m 21s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  16m 22s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 22s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 54s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   2m 53s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m  9s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 47s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  16m 47s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  16m  8s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/3/artifact/out/blanks-eol.txt)
 |  The patch has 3 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   4m 19s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 55s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 17s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 43s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   7m 47s |  |  hadoop-mapreduce-client-core in 
the patch passed.  |
   | +1 :green_heart: |  unit  |   2m 32s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  5s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 228m 34s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6716 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux 5470963506ac 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 3eb8b8742375fbdf056f9241b9f3c847cde5034e |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/3/testReport/ |
   | Max. process+thread count | 1367 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hado

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-11 Thread via GitHub


hadoop-yetus commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2050579864

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   6m 46s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 7 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 30s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  19m 58s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   8m 55s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   8m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   2m  8s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 54s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 35s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   8m 30s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 10s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   8m 10s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/4/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   2m  1s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   2m  5s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 44s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   5m 46s |  |  hadoop-mapreduce-client-core in 
the patch passed.  |
   | +1 :green_heart: |  unit  |   2m  4s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 42s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 145m 11s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6716 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux 8e0cc7ab2749 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0ed84a98c5e630590f376a445eb28c27c2e8e780 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/4/testReport/ |
   | Max. process+thread count | 1620 (vs. ulimit of 5500) |
   | modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hado

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-11 Thread via GitHub


steveloughran commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2050326218

   Reviews invited from @mukund-thakur @anmolanmol1234 @anujmodi2021 
@HarshitGupta11
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-09 Thread via GitHub


hadoop-yetus commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2046034751

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  12m 27s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  45m  9s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 43s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  33m 50s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 31s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 30s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/1/artifact/out/blanks-eol.txt)
 |  The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  33m 49s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   7m 41s | 
[/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/1/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt)
 |  hadoop-mapreduce-client-core in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 38s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 147m 25s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.mapreduce.lib.output.committer.manifest.TestCreateOutputDirectoriesStage 
|
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6716 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 27c1a33d07a2 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4dc5a605e08c88f8529ce7d117ff6033a4e6594c |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6716/1/testReport/ |
   | Max. process+thread count | 1601 (vs. ulimit of 5500) |
   | modules | C: 
hadoop

Re: [PR] MAPREDUCE-7474. Improve Manifest committer resilience [hadoop]

2024-04-09 Thread via GitHub


steveloughran commented on PR #6716:
URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2045879335

   testing: azure cardiff
   -Dparallel-tests=abfs -DtestsThreadCount=8
   lots of tests failed for me, but I've now got an account with very low IO 
threshold. We need to look at those failures in general


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org