[ https://issues.apache.org/jira/browse/MAPREDUCE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249164#comment-13249164 ]
Bikas Saha commented on MAPREDUCE-3921: --------------------------------------- Attached new patch. 1) Cleaned up asserts, logs and minor comments. javac warnings are same as pre-existing warnings around the use of raw types for events. 2) Removed newly added TaskEventType.T_ATTEMPT_KILLED_AFTER_SUCCESS with existing TaskEventType.T_ATTEMPT_KILLED. The successful attempt was being killed and it makes sense to reuse existing code flow. There was some reason (which is lost in my notes) for which I had added a new event type but after looking at the code I dont see any reason to do so now. 3) All map task completion events (succeeded, killed etc) are being synced with the reducers. When a map task is killed because of a bad node, that event will be sent to the reducer. Then when it completes, the reducer will know about it. Just like any other case of change in map outputs. All of this is pre-existing functionality based on my understanding of the code and talking offline with Vinod. So your concerns about informing the reducers about the newly killed map task are already addressed by the pre-existing code flow. 4) AM recovery. I was having trouble trying to manually create failures of a real cluster. So I went ahead and enhanced the newly added TestMRApp.testUpdatedNodes() with AM recovery. The test now checks for successful tasks being killed and rerun on node failure. Then the AM is restarted and the test verifies that those completed tasks are recovered. While that worked and this patch passed the tests, a variant of the test exposed a different problem. In recovery mode, the recovery service assigns a success status to any task that has a FINISHED event reported. The only way that status could be changed is if there is a FAILED event for that task, in which case a failed status is assigned to that task. So once a task is marked with a success status, it remains so even when subsequent events kill the successful task attempt and mark it invalid. Next the recovery service adds all success status tasks into a completedTasks collection. Then it proceeds to enumerate the events and process them. When it hits a TaskEventType.*_KILLED/FAILED/SUCCEEDED then it removes those attempts from the completedTasks. Recovery does not complete until all attempts of all completedTasks are removed. Now the following sequence of events can happen for Tasks A and B. A1 represents task attempt 1 of A. CompletedTasks contains A and B. A1 and A2 are succeeded. A2 was a rerun of A1. B1 is succeeded and B2 was running when AM crashed. A1- container request is processed. It uses the nodeid info from A1 to work. B1- container request is processed. It uses the nodeid info from B1 to work. A1- Succeeded removes A1 B1- Succeeded removed B1 A2- container request is processed. It uses the nodeid info from A2 to work B2- container request is processed. It uses the nodeid info from B2 to work. But there is no such info as it is populated on task completion. AM crashed here while trying to resolve the nodeid. If AM had not crashed the following would have happened A2- Succeeded removes A2 There is no FAILED/KILLED/SUCCEEDED event for B2 since it was running when the AM crashed. So it seems the AM would never move out of recovery. If the above is correct, there seems to be 2 problems 1) While recovery is in process, event handling for task attempts that are not in a completed state. I am not sure if the recovery design allows this and the current crash is simply a case of missing info. 2) Expecting every task attempt of a completedTask to have a KILLED/FAILED/SUCCEEDED entry. This seems to be clearly wrong in the current scenario. > MR AM should act on the nodes liveliness information when nodes go > up/down/unhealthy > ------------------------------------------------------------------------------------ > > Key: MAPREDUCE-3921 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3921 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am, mrv2 > Affects Versions: 0.23.0 > Reporter: Vinod Kumar Vavilapalli > Assignee: Bikas Saha > Fix For: 0.23.2 > > Attachments: MAPREDUCE-3921-1.patch, > MAPREDUCE-3921-branch-0.23.patch, MAPREDUCE-3921-branch-0.23.patch, > MAPREDUCE-3921-branch-0.23.patch, MAPREDUCE-3921.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira