[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992102#comment-14992102
 ] 

Junping Du commented on MAPREDUCE-5485:
---------------------------------------

Thanks Bikas for review and comments!
bq. If the commit actually failed then there does not seem any reason to assume 
that retrying it will succeed. IMO if the commit reports a failure then AM 
should fail. Similarly, if a commit failure file exists (from a previous AM 
version) then the new version of the AM should respect that and fail since the 
commit has been reported to be failed.
There are still reasons that related to AM specific, i.e. previous AM cannot 
connect to FS (FS or other CloudFS), committer mis-behavior because of getting 
loaded incorrect (due to classpath or other defect), etc. I think it make sense 
to do the best effort to retry the commit failure (like other reason to cause 
AM failure) given the commit is repeatable and all tasks are done successfully.

bq. Javadoc could be improved. Inline
Yes. I will.

bq. num-retries instead of retries? Also, if its num-retries then default 
should be 0. If its num-attempts then default should be 1.
Ok. I will update to attempts. default to be 1 means no retry to keep 
consistent with previous behavior.

bq. Retry count checking code in the catch block subsumes the check retry count 
check in the while block?
I don't think so. Can you take a look at it again?

bq. The previous operation could delete the path after the if check has 
succeeded. So we probably also need to catch FileNotFoundException exception 
class here and ignore it if repeatableCommit is true.
That's good point. Will fix it.

bq. Do testcases need an @Test annotation?
No. The test class extends TestCase, so all method start with "test" will be 
executed automatically.

bq. firstTimeFail is probably a more clear name for what its doing - failing on 
the first attempt. Would be good to have a test that version 2 and retry = 1 
will fail also. Testcases missing for specific changes in FileOutputCommitter 
for create/delete operation changes?
Sounds good. Will fix/add later.



> Allow repeating job commit by extending OutputCommitter API
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-5485
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.1.0-beta
>            Reporter: Nemon Lou
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, 
> MAPREDUCE-5485-v1.patch
>
>
> There are chances MRAppMaster crush during job committing,or NodeManager 
> restart cause the committing AM exit due to container expire.In these cases 
> ,the job will fail.
> However,some jobs can redo commit so failing the job becomes unnecessary.
> Let clients tell AM to allow redo commit or not is a better choice.
> This idea comes from Jason Lowe's comments in MAPREDUCE-4819 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to