[ https://issues.apache.org/jira/browse/MAPREDUCE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785177#comment-13785177 ]
Jason Lowe commented on MAPREDUCE-5547: --------------------------------------- bq. Therefore, IMHO, it's not good to fix a bug in a rare case at the cost of troubling the common case. Exactly, that's what I'm worried about. There is no general solution to the client-sees-app-succeed-but-app-subsequently-fails-to-unregister problem. There are tons of ways a client can be notified of job success besides the standard JobClient query (e.g.: _SUCCESS file generated by FileOutputFormat). The output committer is user-defined code and can do arbitrary things. The main thing is to ensure the subsequent AM attempt, if there is one, does not delete/corrupt the data of the previous successful attempt. That's why the subsequent AM checks the jhist file for a successful commit from the previous attempt and if that's the case unregisters with a success code without doing much else. I think that's the best we can do. Clients trying to check job status via JobClient or the proxy URL will be redirected to the history server and see that the job succeeded. The only oddity will be if the issue occurred on the last AM attempt then the RM will report the app as failed but the job succeeded in the MR sense. It should be a rare case but can happen, and we cannot prevent all cases of a client seeing an MR job succeed but RM reports it as failed. > Job history should not be flushed to JHS until AM gets unregistered > ------------------------------------------------------------------- > > Key: MAPREDUCE-5547 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5547 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Zhijie Shen > Assignee: Zhijie Shen > -- This message was sent by Atlassian JIRA (v6.1#6144)