[ 
https://issues.apache.org/jira/browse/HADOOP-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526123
 ] 

Arun C Murthy commented on HADOOP-1862:
---------------------------------------

Hmm... one straw to clutch:

{noformat}


$ cat 1862-event.log | grep task_200709041519_0023_m_001149
OBSOLETE task_200709041519_0023_m_001149_0 
http://a.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001149_0
FAILED task_200709041519_0023_m_001149_0 null
SUCCEEDED task_200709041519_0023_m_001149_1 
http://b.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001149_1
SUCCEEDED task_200709041519_0023_m_001149_2 
http://c.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001149_2


$ cat 1862-event.log | grep task_200709041519_0023_m_001816
OBSOLETE task_200709041519_0023_m_001816_0 
http://x.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001816_0
FAILED task_200709041519_0023_m_001816_0 null
SUCCEEDED task_200709041519_0023_m_001816_1 
http://y.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001816_1
SUCCEEDED task_200709041519_0023_m_001816_2 
http://z.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001816_2


{noformat}


Essentially, in {{JobInProgress.updateTaskStatuses(TaskInProgress, TaskStatus, 
JobTrackerMetrics)}} the {{TaskCompletionEvent.Status.SUCCEEDED}} is added 
irrespective of whether the TIP is already complete or not, leading to each 
reducer seeing 2 {{TaskCompletionEvent.Status.SUCCEEDED}} events as above... 
clearly the fetch from one of them will fail since either _1 or _2 will be 
{{KILLED}}, not a happy situation. 

Like I said, I'll try to dig deeper, maybe this could help someone beat me to 
it. *smile*

> reduces are getting stuck trying to find map outputs
> ----------------------------------------------------
>
>                 Key: HADOOP-1862
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1862
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.14.1
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.15.0
>
>
> Some of the reduces have been stuck for hours looking for 137 map outputs. 
> When I look at the job events all 2600 of the maps have succeeded. There have 
> been lots of lost task trackers and shuffle failures. The maps have been run 
> between 1 to 6 times each. I do see some of the events in the task event log 
> are marked OBSOLETE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to