[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969572#action_12969572
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-2214:
----------------------------------------------

i think what happened in our case was something like this:
# task was requested to be killed
# the TT performed the kill action and reported back to the JT
# but the task reported back as done - at which point the TT promptly moved it 
into the SUCCEEDED state
# meanwhile the JT scheduled a cleanup and the cleanup failed to launch without 
returning the slot

the cris-crossing of #2 and #3 was what was unexpected i think (something the 
code doesn't anticipate). 

we don't hit this problem with speculation because we never request speculation 
when the task is about to complete (there's a check on the remaining time on 
the task and if the remaining time is less than N min - we don't speculate. 
there's a jira for this - don't remember which).

> TaskTracker should release slot if task is not launched
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-2214
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2214
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>
> TaskTracker.TaskInProgress.launchTask() does not launch a task if it is not 
> in an expected state. However, in the case where the task is not launched, 
> the slot is not released. We have observed this in production - the task was 
> in SUCCEEDED state by the time launchTask() got to it and then the slot was 
> never released. It is not clear how the task got into that state, but it is 
> better to handle the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to