[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982375#comment-14982375
 ] 

Junping Du commented on MAPREDUCE-5485:
---------------------------------------

Thanks [~bikassaha] for the comments! I agree it makes more sense to move retry 
logic into committer.commitJob() if it support repeatable. My original thinking 
is to combine this retry for committer.commitJob() with other AM exceptions in 
handleJobCommit (outside of committer), like: failed to write 
endCommitSuccessFile, etc. But now I think we should separate committer retry 
with AM specific handling for the reason you mentioned above. For this case, I 
would prefer we just let AM exit directly instead of fail the job (if commit 
job is repeatable). Most like the same as proposed above by [~nemon], but a 
slightly different is: we should apply AM fail (not job fail) even for 
commiter.commitJob() failed after retry for handling some corner cases, i.e. 
something goes wrong with related to committer in this AM but still get chance 
to success in another AM if we support repeatable in commit job. 
I will update a patch soon.

> Allow repeating job commit by extending OutputCommitter API
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-5485
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.1.0-beta
>            Reporter: Nemon Lou
>            Assignee: Junping Du
>         Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch
>
>
> There are chances MRAppMaster crush during job committing,or NodeManager 
> restart cause the committing AM exit due to container expire.In these cases 
> ,the job will fail.
> However,some jobs can redo commit so failing the job becomes unnecessary.
> Let clients tell AM to allow redo commit or not is a better choice.
> This idea comes from Jason Lowe's comments in MAPREDUCE-4819 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to