[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998995#comment-14998995
 ] 

Junping Du commented on MAPREDUCE-5485:
---------------------------------------

Thanks [~bikassaha] for comments!
bq. Is the above check too early in the code. E.g. IIRC, at this point we have 
not checked whether the previous job commit was succeeded or failed - in which 
case we cannot recover and there is nothing to do.
The cleanupInterruptedCommit() already check previous job commit succeed or 
failed. Am I missing anything here?

bq. Also, we have already changed the startCommit operation to be repeatable 
via the overwrite flag. After that is there a need to delete the files upfront. 
Delete may be an expensive operation on some cloud stores.
I don't see much different with deletion/write a small or empty file with 
overwrite an existing file (updating timestamp, contents) in any cloud stores. 
I just prefer not to add additional if-else cases to existing ones that is 
already complicated to me. If we do observe the performance differences in real 
cluster, we can optimize it then. What do you think?

bq. Mapred javadoc fixes are missing. Also there are some typos in there. E.g.
Nice catch! Will fix it in v4 patch.

bq. This part of the code change could use some tests.
Ok. Add TestFileOutputCommitter for class in Mapred package.

bq. Tests for repeatable success marker file and FileExistsException for 
repeatable deletes would be good to have.
Previously, success marker file is being added as overwrite (fs.create() 
default to be overwrite), so no much different here. Add additional tests on 
duplicated job commit in v4 patch.

> Allow repeating job commit by extending OutputCommitter API
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-5485
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.1.0-beta
>            Reporter: Nemon Lou
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, 
> MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, 
> MAPREDUCE-5485-v3.patch
>
>
> There are chances MRAppMaster crush during job committing,or NodeManager 
> restart cause the committing AM exit due to container expire.In these cases 
> ,the job will fail.
> However,some jobs can redo commit so failing the job becomes unnecessary.
> Let clients tell AM to allow redo commit or not is a better choice.
> This idea comes from Jason Lowe's comments in MAPREDUCE-4819 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to