[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000916#comment-15000916
 ] 

Junping Du commented on MAPREDUCE-5485:
---------------------------------------

bq. About the overall test. The main overall change is to allow the retry AM to 
continue after seeing an in-progress commit from the previous AM. It seems 
incomplete to not have a test for that. 
I agree that it is better to add as many cases as possible in unit test. But 
due to limitations of our current unit test framework, we could miss many 
functional tests, especially related to MR AM failed/restart, like: in rolling 
upgrade story, we don't have tests to check AM failed over during NM/RM 
restart. Instead, we may have to split the whole functionality into pieces and 
test each piece. Sometime it is sad that this may not be good enough and that's 
why we still need to test/verify the feature works end to end on a real cluster.

bq. However if you think that we dont have existing infra for that code path 
then we should create a follow up jira to add that infra and relevant tests. I 
have not followed the MR AM code changes for a while and so I cannot recall of 
the top of my head about any existing test cases. Maybe other committers may 
have some ideas.
Just filed MAPREDUCE-6545 to track more test effort that comes later.

bq. With that caveat, the latest patch looks good to me. Thanks for your 
patience through the reviews.
Thanks Bikas for your carefully review.

> Allow repeating job commit by extending OutputCommitter API
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-5485
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.1.0-beta
>            Reporter: Nemon Lou
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, 
> MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, 
> MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, 
> MAPREDUCE-5485-v5.patch
>
>
> There are chances MRAppMaster crush during job committing,or NodeManager 
> restart cause the committing AM exit due to container expire.In these cases 
> ,the job will fail.
> However,some jobs can redo commit so failing the job becomes unnecessary.
> Let clients tell AM to allow redo commit or not is a better choice.
> This idea comes from Jason Lowe's comments in MAPREDUCE-4819 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to