[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task

Karthik Kambatla (JIRA) Fri, 05 Apr 2013 11:43:17 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623925#comment-13623925
 ]


Karthik Kambatla commented on MAPREDUCE-5110:
---------------------------------------------

Hey Arun, sorry for the delay. I was trying to figure out the root cause behind 
these occasional launch delays, we encounter them once in a while on a highly 
loaded cluster. It looks like a node-specific hardware/OS issue. When this 
happens, the task in question delays the entire job. 

I still believe limiting the task launch time is helpful, particularly in the 
case of node-specific hardware issues - failing disks, slow networks etc. Also, 
I discussed this offline with Alejandro and Tom, and they suggested we might 
not want to introduce a new config for this, but may be use half of the 
mapred.task.timeout. What do you think of that? 
                
> Long task launch delays can lead to multiple parallel attempts of the task
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5110
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 1.1.2
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>         Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, 
> mr-5110-tt-only.patch
>
>
> If a task takes too long to launch, the JT expires the task and schedules 
> another attempt. The earlier attempt can start after the later attempt 
> leading to two parallel attempts running at the same time. This is 
> particularly an issue if the user turns off speculation and expects a single 
> attempt of a task to run at any point in time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task

Reply via email to