[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623925#comment-13623925 ]
Karthik Kambatla commented on MAPREDUCE-5110: --------------------------------------------- Hey Arun, sorry for the delay. I was trying to figure out the root cause behind these occasional launch delays, we encounter them once in a while on a highly loaded cluster. It looks like a node-specific hardware/OS issue. When this happens, the task in question delays the entire job. I still believe limiting the task launch time is helpful, particularly in the case of node-specific hardware issues - failing disks, slow networks etc. Also, I discussed this offline with Alejandro and Tom, and they suggested we might not want to introduce a new config for this, but may be use half of the mapred.task.timeout. What do you think of that? > Long task launch delays can lead to multiple parallel attempts of the task > -------------------------------------------------------------------------- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Affects Versions: 1.1.2 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, > mr-5110-tt-only.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira