[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632543#comment-13632543
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5110:
----------------------------------------------------

Trying to understand this, mostly agree with what Arun said. To summarize:
 - Strictly guaranteeing serial execution of task attempts is not possible in 
general and is a non-requirement
 - JT already deals with all kinds of slow-ness with tasks and irrespective of 
this patch, clients have to deal with the slowness.

bq. Where possible (i.e., not transient network partitions), run a single task 
attempt for a task when speculation is turned off
Seems an arbitrary non-requirement, don't see what we gain from this.

The JIRA started with the above goal which isn't worth pursing from what I see, 
but now it seems to have transformed into something more benign. Looked at the 
patch. It looks like you want quicker failure when tasks are getting 
launched/localized to meet some kind of SLAs? If that is the case, instead of 
calling it a 'TT-side implementation', if we call it an aggressive timeout 
enforced on TTs for tasks, and make it job-configurable, that should do. Right?
                
> Long task launch delays can lead to multiple parallel attempts of the task
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5110
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 1.1.2
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>         Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, 
> mr-5110-tt-only.patch
>
>
> If a task takes too long to launch, the JT expires the task and schedules 
> another attempt. The earlier attempt can start after the later attempt 
> leading to two parallel attempts running at the same time. This is 
> particularly an issue if the user turns off speculation and expects a single 
> attempt of a task to run at any point in time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to