[ https://issues.apache.org/jira/browse/MAPREDUCE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687337#comment-13687337 ]
Jason Lowe commented on MAPREDUCE-5317: --------------------------------------- Thanks for the patch, Ravi. For context of other reviewers, the root cause is tasks in-flight can inadvertently recreate the job output directory when creating their temporary output directory because it's a subdirectory of the output directory and parent directories are created by default. Note that I'm not a fan of the let's-wait-for-all-tasks approach, as we had all sorts of problems with hung AMs stuck in the KILL_WAIT state awaiting task completions that would never come. I'd much rather just have the AM die quickly with the correct final status and have the tasks tear themselves down as a result. Therefore I'd rather have the tasks properly *not* create the output directory when creating their temporary output directory and simply fail (as they should) when the output directory is missing. As I understand it, there's an issue with the HDFS interface since there's apparently no way to create a directory without also implicitly creating the parent directories via FileSystem. createNonRecursive would work, but it's deprecated. We could switch to FileContext, but that could be just as risky of a change due to the scope and potential for extra namenode connections or RPC load. So we might end up having to do the wait-for-all-tasks-when-failing method proposed here. If that happens, a few comments on the patch: * the FAIL_ABORT state was originally intended to await the response of the committer thread after it processes the abortJob call, and it's now being reused for a FAIL_WAIT state. This is probably possible, but we need to make sure we don't leave the FAIL_ABORT state until the committer thread has signaled the completion of the abortJob processing. Failure to do so means the RM could kill the AM task while it's mid-abort which is undesirable. Therefore a JOB_TASK_COMPLETED should never lead us to the FAILED state, or we could do just that. The only way to get from FAIL_ABORT to FAILED should be if the committer thread sent JOB_ABORT_COMPLETED. * We should not be sending JOB_ABORT_COMPLETED via a timer, or we could skip calling the committer completely to clean up. That would leave us with the same symptom trying to be fixed, a leftover output directory after a failed job. We can debate whether a timer is useful or not (it masks bookkeeping problems, for better or worse), but if used it should send a separate event to indicate the code should just stop waiting for more tasks to complete and signal the committer to cleanup. > Stale files left behind for failed jobs > --------------------------------------- > > Key: MAPREDUCE-5317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5317 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 3.0.0, 2.0.4-alpha, 0.23.8 > Reporter: Ravi Prakash > Assignee: Ravi Prakash > Attachments: MAPREDUCE-5317.patch > > > Courtesy [~amar_kamat]! > {quote} > We are seeing _temporary files left behind in the output folder if the job > fails. > The job were failed due to hitting quota issue. > I simply ran the randomwriter (from hadoop examples) with the default setting. > That failed and left behind some stray files. > {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira