Too many fetch-failures issue
-----------------------------

                 Key: HADOOP-1930
                 URL: https://issues.apache.org/jira/browse/HADOOP-1930
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.15.0
            Reporter: Christian Kunz


A job with 4000 maps on a 1400 node cluster (3 tasks per node allowed) had a 
lot (150) of 'Too many fetch-failures' map failures.

>From the jobtracker log it looks as if it got confused which tasktracker 
>actually ran the task:

(In the following log output, I replaced the corresponding tasktracker nodes 
with ***node_assigned*** and ***node_fetch_attempt** and they are different)

grep task_200709170247_0018_m_000009_0 
hadoop-xxx-jobtracker-node.log.2007-09-19:

2007-09-19 15:52:26,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task 
'task_200709170247_0018_m_000009_0' to tip tip_200709170247_0018_m_000009, for 
tracker 'tracker_***node_assigned_***:/127.0.0.1:54523'
2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskRunner: Saved output 
of task 'task_200709170247_0018_m_000009_0' to hdfs://location
2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.JobInProgress: Task 
'task_200709170247_0018_m_000009_0' has completed 
tip_200709170247_0018_m_000009 successfully.
2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskInProgress: Task 
'task_200709170247_0018_m_000009_0' has completed succesfully
2007-09-19 16:21:07,825 INFO org.apache.hadoop.mapred.JobInProgress: Failed 
fetch notification #1 for task task_200709170247_0018_m_000009_0
2007-09-19 16:23:23,483 INFO org.apache.hadoop.mapred.JobInProgress: Failed 
fetch notification #2 for task task_200709170247_0018_m_000009_0
2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Failed 
fetch notification #3 for task task_200709170247_0018_m_000009_0
2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Too many 
fetch-failures for output of task: task_200709170247_0018_m_000009_0 ... 
killing it
2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
from task_200709170247_0018_m_000009_0: Too many fetch-failures
2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Task 
'task_200709170247_0018_m_000009_0' has been lost.
2007-09-19 16:25:07,184 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'task_200709170247_0018_m_000009_0' from 
'tracker_***node_fetch_attempt***:/127.0.0.1:48818'
2007-09-19 21:40:00,235 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'task_200709170247_0018_m_000009_0' from 
'tracker_***node_fetch_attempt***:/127.0.0.1:48818'



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to