[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687337#comment-13687337
 ] 

Jason Lowe commented on MAPREDUCE-5317:
---------------------------------------

Thanks for the patch, Ravi.  For context of other reviewers, the root cause is 
tasks in-flight can inadvertently recreate the job output directory when 
creating their temporary output directory because it's a subdirectory of the 
output directory and parent directories are created by default.

Note that I'm not a fan of the let's-wait-for-all-tasks approach, as we had all 
sorts of problems with hung AMs stuck in the KILL_WAIT state awaiting task 
completions that would never come.  I'd much rather just have the AM die 
quickly with the correct final status and have the tasks tear themselves down 
as a result.  Therefore I'd rather have the tasks properly *not* create the 
output directory when creating their temporary output directory and simply fail 
(as they should) when the output directory is missing.

As I understand it, there's an issue with the HDFS interface since there's 
apparently no way to create a directory without also implicitly creating the 
parent directories via FileSystem.  createNonRecursive would work, but it's 
deprecated.  We could switch to FileContext, but that could be just as risky of 
a change due to the scope and potential for extra namenode connections or RPC 
load.  So we might end up having to do the wait-for-all-tasks-when-failing 
method proposed here.  If that happens, a few comments on the patch:

* the FAIL_ABORT state was originally intended to await the response of the 
committer thread after it processes the abortJob call, and it's now being 
reused for a FAIL_WAIT state.  This is probably possible, but we need to make 
sure we don't leave the FAIL_ABORT state until the committer thread has 
signaled the completion of the abortJob processing.  Failure to do so means the 
RM could kill the AM task while it's mid-abort which is undesirable.  Therefore 
a JOB_TASK_COMPLETED should never lead us to the FAILED state, or we could do 
just that.  The only way to get from FAIL_ABORT to FAILED should be if the 
committer thread sent JOB_ABORT_COMPLETED.
* We should not be sending JOB_ABORT_COMPLETED via a timer, or we could skip 
calling the committer completely to clean up.  That would leave us with the 
same symptom trying to be fixed, a leftover output directory after a failed 
job.  We can debate whether a timer is useful or not (it masks bookkeeping 
problems, for better or worse), but if used it should send a separate event to 
indicate the code should just stop waiting for more tasks to complete and 
signal the committer to cleanup.

                
> Stale files left behind for failed jobs
> ---------------------------------------
>
>                 Key: MAPREDUCE-5317
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5317
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 3.0.0, 2.0.4-alpha, 0.23.8
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: MAPREDUCE-5317.patch
>
>
> Courtesy [~amar_kamat]!
> {quote}
> We are seeing _temporary files left behind in the output folder if the job
> fails.
> The job were failed due to hitting quota issue.
> I simply ran the randomwriter (from hadoop examples) with the default setting.
> That failed and left behind some stray files.
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to