[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054562#comment-14054562
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5956:
----------------------------------------------------

Here's what I think are potential solutions and their problems
# YARN informs AM that it is the last retry as part of AM start-up or the 
register API
# YARN informs the AM that this is the last retry as part of AM unregister
# YARN has a way to run a separate cleanup container after it knows for sure 
that the application finished exhausting all its attempts

(1) is not really possible. At best, RM can say that this 
'mayBeTheLastAttempt'. So AM cannot really assume that this is the last retry 
and so cannot do stuff like cleaning the staging directory.

(2) is fine enough for successful code-path. In fact, we already have a way of 
telling the AM that unregister succeeded and that this indeed is the last 
retry. We don't need a new API. If RM crashed/failed-over before that, app will 
have a new retry anyways. Downside of this approach is that, there are so many 
cases where app's last retry may have crashed (say OOM) and so doesn't cleanup 
stale files. In fact, any solution that relies on such RM-AM communication will 
not really solve those corner cases.

(3) is an acknowledgement of the fact that a solution to the problem of cleanup 
of stale-files is not possible without explicit help from RM. The more I think, 
the more it appears to me that this is the right solution. Filing a ticket, but 
this will take a while and so we may have to just do (2) for the time being..

> MapReduce AM should not use maxAttempts to determine if this is the last retry
> ------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5956
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Wangda Tan
>            Priority: Blocker
>
> Found this while reviewing YARN-2074. The problem is that after YARN-2074, we 
> don't count AM preemption towards AM failures on RM side, but MapReduce AM 
> itself checks the attempt id against the max-attempt count to determine if 
> this is the last attempt.
> {code}
>     public void computeIsLastAMRetry() {
>       isLastAMRetry = appAttemptID.getAttemptId() >= maxAppAttempts;
>     }
> {code}
> This causes issues w.r.t deletion of staging directory etc..



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to