[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000472#comment-15000472
 ] 

Junping Du commented on MAPREDUCE-5485:
---------------------------------------

bq. This introduces duplication of code for checking commit status and can 
cause a bug if the logic changes in either place. And also makes extra RPC 
calls to HDFS for checking file status - which is avoidable. Moving the code to 
the place where earlier we were failing due to in-progress commit, will allow 
this method to do exactly as it name suggests - cleanup in progress commit 
markers. Does that clarify?
Thanks for clarifying. That sounds good. Will update in v5 patch.

bq. 1) Test MR Appmaster new functionality that allows commit to proceed in a 
retried AM if commit is repeatable. 
Theoretically, I agree it is nice to have something fully functional. However, 
I don't think it is easy to have for this case. Do we have other tests on job 
commit (not retry) with launching AppMaster fully functional? If not, I would 
prefer to add it later in another JIRA if we have more ideas on how to do it.

bq. 2) Test in FileOutputCommitter that for repeatable commit - a 
filenotfoundexception is not counted as an error (new behavior).
Can you check FileOutputCommitter#testCommitterRepeatableV1() and 
FileOutputCommitter#testCommitterRepeatableV2()?

> Allow repeating job commit by extending OutputCommitter API
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-5485
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.1.0-beta
>            Reporter: Nemon Lou
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, 
> MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, 
> MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch
>
>
> There are chances MRAppMaster crush during job committing,or NodeManager 
> restart cause the committing AM exit due to container expire.In these cases 
> ,the job will fail.
> However,some jobs can redo commit so failing the job becomes unnecessary.
> Let clients tell AM to allow redo commit or not is a better choice.
> This idea comes from Jason Lowe's comments in MAPREDUCE-4819 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to