[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249164#comment-13249164
 ] 

Bikas Saha commented on MAPREDUCE-3921:
---------------------------------------

Attached new patch.
1) Cleaned up asserts, logs and minor comments. javac warnings are same as 
pre-existing warnings around the use of raw types for events.
2) Removed newly added TaskEventType.T_ATTEMPT_KILLED_AFTER_SUCCESS with 
existing TaskEventType.T_ATTEMPT_KILLED. The successful attempt was being 
killed and it makes sense to reuse existing code flow. There was some reason 
(which is lost in my notes) for which I had added a new event type but after 
looking at the code I dont see any reason to do so now.
3) All map task completion events (succeeded, killed etc) are being synced with 
the reducers. When a map task is killed because of a bad node, that event will 
be sent to the reducer. Then when it completes, the reducer will know about it. 
Just like any other case of change in map outputs. All of this is pre-existing 
functionality based on my understanding of the code and talking offline with 
Vinod. So your concerns about informing the reducers about the newly killed map 
task are already
addressed by the pre-existing code flow.
4) AM recovery. I was having trouble trying to manually create failures of a 
real cluster. So I went ahead and enhanced the newly added 
TestMRApp.testUpdatedNodes() with AM recovery. The test now checks for 
successful tasks being killed and rerun on node failure. Then the AM is 
restarted and the test verifies that those completed tasks are recovered. While 
that worked and this patch passed the tests, a variant of the test exposed a 
different problem.

In recovery mode, the recovery service assigns a success status to any task 
that has a FINISHED event reported. The only way that status could be changed 
is if there is a FAILED event for that task, in which case a failed status is 
assigned to that task. So once a task is marked with a success status, it 
remains so even when subsequent events kill the successful task attempt and 
mark it invalid. 
Next the recovery service adds all success status tasks into a completedTasks 
collection. Then it proceeds to enumerate the events and process them. When it 
hits a TaskEventType.*_KILLED/FAILED/SUCCEEDED then it removes those attempts 
from the completedTasks. Recovery does not complete until all attempts of all 
completedTasks are removed. Now the following sequence of events can happen for 
Tasks A and B. A1 represents task attempt 1 of A.
CompletedTasks contains A and B. A1 and A2 are succeeded. A2 was a rerun of A1. 
B1 is succeeded and B2 was running when AM crashed.
A1- container request is processed. It uses the nodeid info from A1 to work.
B1- container request is processed. It uses the nodeid info from B1 to work.
A1- Succeeded removes A1
B1- Succeeded removed B1
A2- container request is processed. It uses the nodeid info from A2 to work
B2- container request is processed. It uses the nodeid info from B2 to work. 
But there is no such info as it is populated on task completion. AM crashed 
here while trying to resolve the nodeid.
If AM had not crashed the following would have happened
A2- Succeeded removes A2
There is no FAILED/KILLED/SUCCEEDED event for B2 since it was running when the 
AM crashed. So it seems the AM would never move out of recovery.

If the above is correct, there seems to be 2 problems
1) While recovery is in process, event handling for task attempts that are not 
in a completed state. I am not sure if the recovery design allows this and the 
current crash is simply a case of missing info. 
2) Expecting every task attempt of a completedTask to have a 
KILLED/FAILED/SUCCEEDED entry. This seems to be clearly wrong in the current 
scenario.

                
> MR AM should act on the nodes liveliness information when nodes go 
> up/down/unhealthy
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3921
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3921
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Bikas Saha
>             Fix For: 0.23.2
>
>         Attachments: MAPREDUCE-3921-1.patch, 
> MAPREDUCE-3921-branch-0.23.patch, MAPREDUCE-3921-branch-0.23.patch, 
> MAPREDUCE-3921-branch-0.23.patch, MAPREDUCE-3921.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to