[jira] [Updated] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-5110: Attachment: mr-5110-half-tt-expiry.patch [~vinodkv], here is a new patch that uses half the tt-expriry-interval as the timeout for task launch. Do you think this is a resonable way to go about it, or do you think it is better to add a job-specific parameter? I ll validate the patch we finalize on a cluster. Long task launch delays can lead to multiple parallel attempts of the task -- Key: MAPREDUCE-5110 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.2 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: expose-mr-5110.patch, mr-5110-half-tt-expiry.patch, mr-5110.patch, mr-5110.patch, mr-5110-tt-only.patch If a task takes too long to launch, the JT expires the task and schedules another attempt. The earlier attempt can start after the later attempt leading to two parallel attempts running at the same time. This is particularly an issue if the user turns off speculation and expects a single attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-5110: - Status: Open (was: Patch Available) Long task launch delays can lead to multiple parallel attempts of the task -- Key: MAPREDUCE-5110 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.2 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, mr-5110-tt-only.patch If a task takes too long to launch, the JT expires the task and schedules another attempt. The earlier attempt can start after the later attempt leading to two parallel attempts running at the same time. This is particularly an issue if the user turns off speculation and expects a single attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-5110: Attachment: mr-5110.patch Uploading a patch that fixes the issue. It # adds a config parameter {{mapred.tasktracker.task.launch.timeout}} with a default value of 2 minutes. Also, adds this to mapred-default.xml # updates {{TT#markUnresponsiveTasks()}} to address tasks in UNASSIGNED state for longer than the timeout above. # modifies JT to not expire UNASSIGNED tasks; MAPREDUCE-2217 added this to address the case where the task launch would hang, but that doesn't help in the case where task launch just takes really long. Leaving the check there can lead to inappropriate error messages for the tasks. Also, {{markUnresponsiveTasks()}} and {{transmitHeartBeat()}} are in the same thread: if the TT were unable to fail the UNASSIGNED task, it wouldn't be able to send a heartbeat either and will eventually be marked lost. To validate the patch, I ran the same setup as above and verified that the first attempt is killed before launching the subsequent attempt. Long task launch delays can lead to multiple parallel attempts of the task -- Key: MAPREDUCE-5110 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.2 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: expose-mr-5110.patch, mr-5110.patch If a task takes too long to launch, the JT expires the task and schedules another attempt. The earlier attempt can start after the later attempt leading to two parallel attempts running at the same time. This is particularly an issue if the user turns off speculation and expects a single attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira