[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617336#comment-13617336
 ] 

Arun C Murthy commented on MAPREDUCE-5110:
------------------------------------------

bq. If we want to strictly guarantee serial execution of task attempts (say, 
when speculative execution is turned off), we want to kill the task first 
before re-scheduling on another node.

[~kkambatl] the premise that we can strictly guarantee the above is basically 
impossible. There a bunch of other scenarios where we won't be guarantee this, 
for e.g. you might schedule a task on TT which then is deemed 'lost' 10 mins 
later without a single HB after the schedule; but in reality that TT is just 
having trouble talking to JT. This means that multiple tasks will be running 
simultaneously since the JT will re-schedule all tasks on that TT. In reality, 
this is the more common case (lost TT) and there is, pretty much, nothing we 
can do about it.

However, there are enough checks/balances to ensure there is consistency for 
the job in the system (longer writeup).

As a result, I'm inclined to close this as 'wont fix'. I think MAPREDUCE-2217 
made an important improvement and we should keep it. However, I'm very scared 
of trying to implement MAPREDUCE-2217 via TT-side changes, particularly, when 
we are adding complexity to already squiggly code on the TT.

Makes sense?
                
> Long task launch delays can lead to multiple parallel attempts of the task
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5110
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 1.1.2
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>         Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch
>
>
> If a task takes too long to launch, the JT expires the task and schedules 
> another attempt. The earlier attempt can start after the later attempt 
> leading to two parallel attempts running at the same time. This is 
> particularly an issue if the user turns off speculation and expects a single 
> attempt of a task to run at any point in time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to