[ https://issues.apache.org/jira/browse/MAPREDUCE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789251#comment-13789251 ]
Robert Joseph Evans edited comment on MAPREDUCE-5547 at 10/8/13 2:45 PM: ------------------------------------------------------------------------- Why is it harmful for the AM and RM to have different knowledge? I can see it can be confusing, but we already live with it for the kill command. The AM does not tell the RM it was killed so that it can update the state, it looks like is succeeded to the RM. Why is this so critically different? What we want is to preserve as consistent a view for the end user as possible. But there are too many APIs that the end user can query to be totally consistent. There is the MR client API (Probably the most critical), the job end notifier (Which is documented as a best effort), The _SUCCESS file from the output format (Which is *not* optional), the AM/History server web service/web UI, and finally the RM UI. bq. Ideally, the 6 steps should be consistent. The only way to do this is to have a single source of truth, where all other interfaces yield to that source of truth in the case of inconsistency, and do not report a final status until it is in a terminal state. Along with that we would ideally want it to be the last thing done, to be sure that everything happened correctly. The problem is the _SUCCESS file. We have no control over it. It is part of the output format and we cannot change it and preserve compatibility. As such it has to be the true indication of success or failure of a Job. To work around this and minimize any possibility of inconsistency didn't we add in a final status file, when we put in the split brain fix? When the AM restarts after a failure I thought it checked for this and if it were in a terminal state it would not rerun anything, it would just try to copy the history file and do the appropriate unregister. What this does is it gives us more chances to recover from whatever failure may have happened the first time (like not being able to move the history file). This is why I think we want to unregister last, or we will only have one chance before we are in an inconsistent state. I thought we tested that the history server can handle a duplicate history file showing up. So the only time that we really run into a situation where the Job succeeded and failed at the same time is when the AM failed to move the history file 4 times. I don't believe we have seen this inconsistency happen since we put in the split brain fix. The only other thing we can do is deprecate the output committer's ability to signal success; have the AM not report success or failure, but delegate that to the RM/History Server, which will slow down jobs significantly; Move Job End notification to the history server; and have the history server query the RM before it reports back anything (along with storing that result somewhere because the RM forgets things faster then the history server does). was (Author: revans2): Why is it harmful for the AM and RM to have different knowledge? I can see it can be confusing, but we already live with it for the kill command. The AM does not tell the RM it was killed so that it can update the state, it looks like is succeeded to the RM. Why is this so critically different? What we want to preserve as consistent a view for the end user as possible. But there are too many APIs that the end user can query to be totally consistent. There is the MR client API (Probably the most critical), the job end notifier (Which is documented as a best effort), The _SUCCESS file from the output format (Which is *not* optional), the AM/History server web service/web UI, and finally the RM UI. bq. Ideally, the 6 steps should be consistent. The only way to do this is to have a single source of truth, where all other interfaces yield to that source of truth in the case of inconsistency, and do not report a final status until it is in a terminal state. Along with that we would ideally want it to be the last thing done, to be sure that everything happened correctly. The problem is the _SUCCESS file. We have no control over it. It is part of the output format and we cannot change it and preserve compatibility. As such it has to be the true indication of success or failure of a Job. To work around this and minimize any possibility of inconsistency didn't we add in a final status file, when we put in the split brain fix? When the AM restarts after a failure I thought it checked for this and if it were in a terminal state it would not rerun anything, it would just try to copy the history file and do the appropriate unregister. What this does is it gives us more chances to recover from whatever failure may have happened the first time (like not being able to move the history file). This is why I think we want to unregister last, or we will only have one chance before we are in an inconsistent state. I thought we tested that the history server can handle a duplicate history file showing up. So the only time that we really run into a situation where the Job succeeded and failed at the same time is when the AM failed to move the history file 4 times. I don't believe we have seen this inconsistency happen since we put in the split brain fix. The only other thing we can do is deprecate the output committer's ability to signal success; have the AM not report success or failure, but delegate that to the RM/History Server, which will slow down jobs significantly; Move Job End notification to the history server; and have the history server query the RM before it reports back anything (along with storing that result somewhere because the RM forgets things faster then the history server does). > Job history should not be flushed to JHS until AM gets unregistered > ------------------------------------------------------------------- > > Key: MAPREDUCE-5547 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5547 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Reporter: Zhijie Shen > Assignee: Zhijie Shen > -- This message was sent by Atlassian JIRA (v6.1#6144)