[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996979#comment-14996979
 ] 

Junping Du commented on MAPREDUCE-5485:
---------------------------------------

bq. doing ++retries here can remove code duplication for the < check in the 
while?
Sorry. I miss this comment in my patch just uploaded. Will update in next patch.

bq. Even for a non-repeatable committer, if there is a classpath issue (which 
can get fixed by retrying the AM) then the AM should retry, right?
I agree this could be a potentially separated topic. However, it could take 
more time and effort to make sure the retry on non-repeatable committer won't 
bring risk to cause a successl commit which is not right for result and should 
get failed earlier. For repeatable committer, it seems no such risk but it 
could paid price of unnecessary retry in some cases but earn more chance for 
succeed in commit stage in other cases, especially you cannot differentiate the 
case belongs to former or later. Just like the exception of deleting temp 
directory failed, it could due to AM connection with HDFS (we should retry) or 
HDFS down permanently (we shouldn't retry). I would prefer the current 
trade-off: simple and best effort for commit success in repeatable case.

> Allow repeating job commit by extending OutputCommitter API
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-5485
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.1.0-beta
>            Reporter: Nemon Lou
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, 
> MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch
>
>
> There are chances MRAppMaster crush during job committing,or NodeManager 
> restart cause the committing AM exit due to container expire.In these cases 
> ,the job will fail.
> However,some jobs can redo commit so failing the job becomes unnecessary.
> Let clients tell AM to allow redo commit or not is a better choice.
> This idea comes from Jason Lowe's comments in MAPREDUCE-4819 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to