steveloughran commented on PR #6716: URL: https://github.com/apache/hadoop/pull/6716#issuecomment-2064541193
@snvijaya we actually know the total number of subdirs for the deletion! it is propagated via the manifests: each TA manifest includes the #of dirs as an IOStatistic, the aggregate summary adds these all up. the number of paths under the job dir is that number (counter committer_task_directory_count ) + any of failed task attempts. which means we could actually have a threshold of how many subdirectories will trigger an automatic switch to parallel delete. I'm just going to pass this down and log immediately before the cleanup kicks off, so if there are problems we will get the diagnostics adjacent to the error. Note that your details on retry timings imply that on a mapreduce job (rather than spark one) the progress() callback will not take place -so there's a risk that the job will actually timeout. I don't think that's an issue in MR job actions, the way it is is in task-side actions where a heartbeat back to the MapRed AM is required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org