[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785177#comment-13785177
 ] 

Jason Lowe commented on MAPREDUCE-5547:
---------------------------------------

bq. Therefore, IMHO, it's not good to fix a bug in a rare case at the cost of 
troubling the common case.

Exactly, that's what I'm worried about.

There is no general solution to the 
client-sees-app-succeed-but-app-subsequently-fails-to-unregister problem.  
There are tons of ways a client can be notified of job success besides the 
standard JobClient query (e.g.: _SUCCESS file generated by FileOutputFormat).  
The output committer is user-defined code and can do arbitrary things.  The 
main thing is to ensure the subsequent AM attempt, if there is one, does not 
delete/corrupt the data of the previous successful attempt.  That's why the 
subsequent AM checks the jhist file for a successful commit from the previous 
attempt and if that's the case unregisters with a success code without doing 
much else.  I think that's the best we can do.  Clients trying to check job 
status via JobClient or the proxy URL will be redirected to the history server 
and see that the job succeeded.  The only oddity will be if the issue occurred 
on the last AM attempt then the RM will report the app as failed but the job 
succeeded in the MR sense.  It should be a rare case but can happen, and we 
cannot prevent all cases of a client seeing an MR job succeed but RM reports it 
as failed.

> Job history should not be flushed to JHS until AM gets unregistered
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5547
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5547
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to