[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000916#comment-15000916 ]
Junping Du commented on MAPREDUCE-5485: --------------------------------------- bq. About the overall test. The main overall change is to allow the retry AM to continue after seeing an in-progress commit from the previous AM. It seems incomplete to not have a test for that. I agree that it is better to add as many cases as possible in unit test. But due to limitations of our current unit test framework, we could miss many functional tests, especially related to MR AM failed/restart, like: in rolling upgrade story, we don't have tests to check AM failed over during NM/RM restart. Instead, we may have to split the whole functionality into pieces and test each piece. Sometime it is sad that this may not be good enough and that's why we still need to test/verify the feature works end to end on a real cluster. bq. However if you think that we dont have existing infra for that code path then we should create a follow up jira to add that infra and relevant tests. I have not followed the MR AM code changes for a while and so I cannot recall of the top of my head about any existing test cases. Maybe other committers may have some ideas. Just filed MAPREDUCE-6545 to track more test effort that comes later. bq. With that caveat, the latest patch looks good to me. Thanks for your patience through the reviews. Thanks Bikas for your carefully review. > Allow repeating job commit by extending OutputCommitter API > ----------------------------------------------------------- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 2.1.0-beta > Reporter: Nemon Lou > Assignee: Junping Du > Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, > MAPREDUCE-5485-v5.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)