[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617336#comment-13617336 ]
Arun C Murthy commented on MAPREDUCE-5110: ------------------------------------------ bq. If we want to strictly guarantee serial execution of task attempts (say, when speculative execution is turned off), we want to kill the task first before re-scheduling on another node. [~kkambatl] the premise that we can strictly guarantee the above is basically impossible. There a bunch of other scenarios where we won't be guarantee this, for e.g. you might schedule a task on TT which then is deemed 'lost' 10 mins later without a single HB after the schedule; but in reality that TT is just having trouble talking to JT. This means that multiple tasks will be running simultaneously since the JT will re-schedule all tasks on that TT. In reality, this is the more common case (lost TT) and there is, pretty much, nothing we can do about it. However, there are enough checks/balances to ensure there is consistency for the job in the system (longer writeup). As a result, I'm inclined to close this as 'wont fix'. I think MAPREDUCE-2217 made an important improvement and we should keep it. However, I'm very scared of trying to implement MAPREDUCE-2217 via TT-side changes, particularly, when we are adding complexity to already squiggly code on the TT. Makes sense? > Long task launch delays can lead to multiple parallel attempts of the task > -------------------------------------------------------------------------- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Affects Versions: 1.1.2 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira