[ https://issues.apache.org/jira/browse/MAPREDUCE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969572#action_12969572 ]
Joydeep Sen Sarma commented on MAPREDUCE-2214: ---------------------------------------------- i think what happened in our case was something like this: # task was requested to be killed # the TT performed the kill action and reported back to the JT # but the task reported back as done - at which point the TT promptly moved it into the SUCCEEDED state # meanwhile the JT scheduled a cleanup and the cleanup failed to launch without returning the slot the cris-crossing of #2 and #3 was what was unexpected i think (something the code doesn't anticipate). we don't hit this problem with speculation because we never request speculation when the task is about to complete (there's a check on the remaining time on the task and if the remaining time is less than N min - we don't speculate. there's a jira for this - don't remember which). > TaskTracker should release slot if task is not launched > ------------------------------------------------------- > > Key: MAPREDUCE-2214 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2214 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 0.20.1 > Reporter: Ramkumar Vadali > Assignee: Ramkumar Vadali > > TaskTracker.TaskInProgress.launchTask() does not launch a task if it is not > in an expected state. However, in the case where the task is not launched, > the slot is not released. We have observed this in production - the task was > in SUCCEEDED state by the time launchTask() got to it and then the slot was > never released. It is not clear how the task got into that state, but it is > better to handle the case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.