Reduce tips complete 100%, but job does not complete saying reduces still 
running.
----------------------------------------------------------------------------------

                 Key: HADOOP-2167
                 URL: https://issues.apache.org/jira/browse/HADOOP-2167
             Project: Hadoop
          Issue Type: Bug
            Reporter: Amareshwari Sri Ramadasu
            Assignee: Arun C Murthy
            Priority: Critical


Job's reduces are stuck at 99.43% progress and 2 reduces in running state and 
Job is not complete. 
But the reduce task list on the job tracker shows they are complete 100% and 
marked as SUCCEEDED and Finishtime is available jobtasks.jsp and jobhistory 
also.

With ipc.client.timeout = 600000, the exceptions on TT's running the reduces are
On one of the TTs, the logs show the following:
2007-11-07 08:34:16,092 INFO org.apache.hadoop.mapred.TaskTracker: Task 
task_200711070637_0001_r_000150_0 is done.
2007-11-07 08:35:34,013 INFO org.apache.hadoop.mapred.TaskTracker: Task 
task_200711070637_0001_r_000156_0 is done.
2007-11-07 08:42:44,751 ERROR org.apache.hadoop.mapred.TaskTracker: Caught 
exception: java.net.SocketTimeoutException: timedout waiting for rpc response
        at org.apache.hadoop.ipc.Client.call(Client.java:484)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
        at org.apache.hadoop.mapred.$Proxy0.heartbeat(Unknown Source)
        at 
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:897)
        at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:799)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1193)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2055)

2007-11-07 08:42:44,767 INFO org.apache.hadoop.mapred.TaskTracker: Resending 
'status' to .................

On the other TT,
2007-11-07 08:40:30,484 INFO org.apache.hadoop.mapred.TaskTracker: Task 
task_200711070637_0001_r_000160_0 is done.
2007-11-07 08:42:45,508 ERROR org.apache.hadoop.mapred.TaskTracker: Caught 
exception: java.net.SocketTimeoutException: timedout waiting for rpc response
        at org.apache.hadoop.ipc.Client.call(Client.java:484)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
        at org.apache.hadoop.mapred.$Proxy0.heartbeat(Unknown Source)
        at 
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:897)
        at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:799)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1193)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2055)

2007-11-07 08:42:45,508 INFO org.apache.hadoop.mapred.TaskTracker: Resending 
'status' to ..........

On JT logs, the reduce tasks are done successfully:
2007-11-07 06:39:09,151 INFO org.apache.hadoop.mapred.JobTracker: Adding task 
'task_200711070637_0001_r_000160_0' to tip tip_200711070637_0001_r_000160, for 
tracker 'x'
2007-11-07 08:42:45,708 INFO org.apache.hadoop.mapred.TaskRunner: Saved output 
of task 'task_200711070637_0001_r_000160_0' to 'y'
2007-11-07 08:42:45,708 INFO org.apache.hadoop.mapred.JobInProgress: Task 
'task_200711070637_0001_r_000160_0' has completed 
tip_200711070637_0001_r_000160 successfully.

This would suggest that if tasks are done before the timeout, the problem 
occurs in progress update. This is also not consistent since other reduce tasks 
in the same situation are successful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to