[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789251#comment-13789251
 ] 

Robert Joseph Evans edited comment on MAPREDUCE-5547 at 10/8/13 2:45 PM:
-------------------------------------------------------------------------

Why is it harmful for the AM and RM to have different knowledge?  I can see it 
can be confusing, but we already live with it for the kill command.  The AM 
does not tell the RM it was killed so that it can update the state, it looks 
like is succeeded to the RM.  Why is this so critically different?

What we want is to preserve as consistent a view for the end user as possible.  
But there are too many APIs that the end user can query to be totally 
consistent.  There is the MR client API (Probably the most critical), the job 
end notifier (Which is documented as a best effort),  The _SUCCESS file from 
the output format (Which is *not* optional), the AM/History server web 
service/web UI, and finally the RM UI.

bq. Ideally, the 6 steps should be consistent. 

The only way to do this is to have a single source of truth, where all other 
interfaces yield to that source of truth in the case of inconsistency, and do 
not report a final status until it is in a terminal state.  Along with that we 
would ideally want it to be the last thing done, to be sure that everything 
happened correctly.

The problem is the _SUCCESS file. We have no control over it.  It is part of 
the output format and we cannot change it and preserve compatibility.  As such 
it has to be the true indication of success or failure of a Job.  To work 
around this and minimize any possibility of inconsistency didn't we add in a 
final status file, when we put in the split brain fix?  When the AM restarts 
after a failure I thought it checked for this and if it were in a terminal 
state it would not rerun anything, it would just try to copy the history file 
and do the appropriate unregister.  What this does is it gives us more chances 
to recover from whatever failure may have happened the first time (like not 
being able to move the history file).  This is why I think we want to 
unregister last, or we will only have one chance before we are in an 
inconsistent state. I thought we tested that the history server can handle a 
duplicate history file showing up.  So the only time that we really run into a 
situation where the Job succeeded and failed at the same time is when the AM 
failed to move the history file 4 times.  I don't believe we have seen this 
inconsistency happen since we put in the split brain fix.

The only other thing we can do is deprecate the output committer's ability to 
signal success; have the AM not report success or failure, but delegate that to 
the RM/History Server, which will slow down jobs significantly; Move Job End 
notification to the history server; and have the history server query the RM 
before it reports back anything (along with storing that result somewhere 
because the RM forgets things faster then the history server does). 


was (Author: revans2):
Why is it harmful for the AM and RM to have different knowledge?  I can see it 
can be confusing, but we already live with it for the kill command.  The AM 
does not tell the RM it was killed so that it can update the state, it looks 
like is succeeded to the RM.  Why is this so critically different?

What we want to preserve as consistent a view for the end user as possible.  
But there are too many APIs that the end user can query to be totally 
consistent.  There is the MR client API (Probably the most critical), the job 
end notifier (Which is documented as a best effort),  The _SUCCESS file from 
the output format (Which is *not* optional), the AM/History server web 
service/web UI, and finally the RM UI.

bq. Ideally, the 6 steps should be consistent. 

The only way to do this is to have a single source of truth, where all other 
interfaces yield to that source of truth in the case of inconsistency, and do 
not report a final status until it is in a terminal state.  Along with that we 
would ideally want it to be the last thing done, to be sure that everything 
happened correctly.

The problem is the _SUCCESS file. We have no control over it.  It is part of 
the output format and we cannot change it and preserve compatibility.  As such 
it has to be the true indication of success or failure of a Job.  To work 
around this and minimize any possibility of inconsistency didn't we add in a 
final status file, when we put in the split brain fix?  When the AM restarts 
after a failure I thought it checked for this and if it were in a terminal 
state it would not rerun anything, it would just try to copy the history file 
and do the appropriate unregister.  What this does is it gives us more chances 
to recover from whatever failure may have happened the first time (like not 
being able to move the history file).  This is why I think we want to 
unregister last, or we will only have one chance before we are in an 
inconsistent state. I thought we tested that the history server can handle a 
duplicate history file showing up.  So the only time that we really run into a 
situation where the Job succeeded and failed at the same time is when the AM 
failed to move the history file 4 times.  I don't believe we have seen this 
inconsistency happen since we put in the split brain fix.

The only other thing we can do is deprecate the output committer's ability to 
signal success; have the AM not report success or failure, but delegate that to 
the RM/History Server, which will slow down jobs significantly; Move Job End 
notification to the history server; and have the history server query the RM 
before it reports back anything (along with storing that result somewhere 
because the RM forgets things faster then the history server does). 

> Job history should not be flushed to JHS until AM gets unregistered
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5547
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5547
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to