[ 
https://issues.apache.org/jira/browse/HADOOP-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475936
 ] 

Arun C Murthy commented on HADOOP-1036:
---------------------------------------

There are two reasons, which combined, result in this scenario:

a) TaskTracker.startNewTask() doesn't catch the 'RuntimeException' (only 
catches IOException) which results in a failure to kill the task via 
TaskInProgress.killAndCleanup()

b) TaskTracker.startNewTask() adds the taskid & tip to 'runningTasks' before 
localizeJob (which fails as function right as above) and thus the JobTracker 
gets the 'status' for the non-existent task, removes it from 
ExpireLaunchingTasks's queue and is generally in a state of bliss...

This issue can be solved either by fixing a) or b), I'd guess we want to fix 
the exception part since it doesn't make sense to wait for the 10minute timeout 
for a task we already know has failed to init...


> task gets lost during assignment
> --------------------------------
>
>                 Key: HADOOP-1036
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1036
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.11.2
>            Reporter: Owen O'Malley
>         Assigned To: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.12.0
>
>
> I ran a unit test (TestMRClassPath) that had a problem (likely in task 
> initialization) that cause one of the maps to get "lost". The job tracker had 
> the task as "assigned" but the task tracker did not know about it. It did not 
> time out even after 30+ minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to