[ 
https://issues.apache.org/jira/browse/HADOOP-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525560
 ] 

Hadoop QA commented on HADOOP-1018:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12365243/HADOOP-1018_1_20070906.patch
 applied and successfully tested against trunk revision r573383.

Test results:   
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/699/testReport/
Console output: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/699/console

> Single lost heartbeat leads to a "Lost task tracker"
> ----------------------------------------------------
>
>                 Key: HADOOP-1018
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1018
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.0, 0.11.2, 0.12.0
>         Environment: Nutch trunk/ (Hadoop 0.10.0), Linux, JDK 1.5, a cluster 
> of 9 machines.
>            Reporter: Andrzej Bialecki 
>            Assignee: Arun C Murthy
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1018_1_20070906.patch
>
>
> Under heavy load, task tracker may lose the heartbeat response from the 
> JobTracker. Task tracker tries to resend the last heartbeat message, which 
> job tracker treats as "duplicate" response and ignores. Since task tracker 
> tries to resend the same heartbeat message, with the same id, over and over 
> again, no "valid" messages reach the job tracker, so after a while it 
> considers the task tracker to be lost. Task tracker cannot recover from this 
> state and needs to be restarted.
> Looking at Hadoop trunk/ I believe this problem still may occur - in 
> JobTracker.java.heartbeat():992 JobTracker should not ignore duplicate 
> messages but acknowledge them without processing. This would cause the task 
> tracker to sync back it's last heartbeat id with the last hearbeat id 
> remembered in the job tracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to