[ 
https://issues.apache.org/jira/browse/HADOOP-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated HADOOP-4716:
-------------------------------

    Attachment: HADOOP-4716-v1.1.patch

Looks like {{TestJobTrackerRestartWithLostTracker}} can still fail. Consider a 
map that is 
- hosted by a tracker that will be lost 
- the map completion event is not logged to the job history (i.e its in the 
buffer)

Call it a _hanging-map_. Once the reducer reaches the _hanging-map_, it will be 
stuck there forever as the map location is not known to the jobtracker and 
hence wont be added to the _ignore-list_. The previous fix solves the issue. 
What it does is :
- clears the old/stale mapping of map output locations
- reduces the backoff time so that the reducer doesnt back off for long

Updates the patch to trunk.


> testRestartWithLostTracker frequently times out
> -----------------------------------------------
>
>                 Key: HADOOP-4716
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4716
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Johan Oskarsson
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-4716-v1.1.patch, HADOOP-4716-v1.patch, log.txt
>
>
> This test frequently times out: 
> org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.testRestartWithLostTracker
> Example: 
> http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3637/testReport/org.apache.hadoop.mapred/TestJobTrackerRestartWithLostTracker/testRestartWithLostTracker/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to