[
https://issues.apache.org/jira/browse/HADOOP-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Devaraj Das updated HADOOP-2167:
--------------------------------
Component/s: mapred
> Reduce tips complete 100%, but job does not complete saying reduces still
> running.
> ----------------------------------------------------------------------------------
>
> Key: HADOOP-2167
> URL: https://issues.apache.org/jira/browse/HADOOP-2167
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Reporter: Amareshwari Sri Ramadasu
> Assignee: Arun C Murthy
> Priority: Critical
> Fix For: 0.16.0
>
>
> Job's reduces are stuck at 99.43% progress and 2 reduces in running state and
> Job is not complete.
> But the reduce task list on the job tracker shows they are complete 100% and
> marked as SUCCEEDED and Finishtime is available jobtasks.jsp and jobhistory
> also.
> With ipc.client.timeout = 600000, the exceptions on TT's running the reduces
> are
> On one of the TTs, the logs show the following:
> 2007-11-07 08:34:16,092 INFO org.apache.hadoop.mapred.TaskTracker: Task
> task_200711070637_0001_r_000150_0 is done.
> 2007-11-07 08:35:34,013 INFO org.apache.hadoop.mapred.TaskTracker: Task
> task_200711070637_0001_r_000156_0 is done.
> 2007-11-07 08:42:44,751 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
> exception: java.net.SocketTimeoutException: timedout waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:484)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
> at org.apache.hadoop.mapred.$Proxy0.heartbeat(Unknown Source)
> at
> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:897)
> at
> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:799)
> at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1193)
> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2055)
> 2007-11-07 08:42:44,767 INFO org.apache.hadoop.mapred.TaskTracker: Resending
> 'status' to .................
> On the other TT,
> 2007-11-07 08:40:30,484 INFO org.apache.hadoop.mapred.TaskTracker: Task
> task_200711070637_0001_r_000160_0 is done.
> 2007-11-07 08:42:45,508 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
> exception: java.net.SocketTimeoutException: timedout waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:484)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
> at org.apache.hadoop.mapred.$Proxy0.heartbeat(Unknown Source)
> at
> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:897)
> at
> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:799)
> at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1193)
> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2055)
> 2007-11-07 08:42:45,508 INFO org.apache.hadoop.mapred.TaskTracker: Resending
> 'status' to ..........
> On JT logs, the reduce tasks are done successfully:
> 2007-11-07 06:39:09,151 INFO org.apache.hadoop.mapred.JobTracker: Adding task
> 'task_200711070637_0001_r_000160_0' to tip tip_200711070637_0001_r_000160,
> for tracker 'x'
> 2007-11-07 08:42:45,708 INFO org.apache.hadoop.mapred.TaskRunner: Saved
> output of task 'task_200711070637_0001_r_000160_0' to 'y'
> 2007-11-07 08:42:45,708 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'task_200711070637_0001_r_000160_0' has completed
> tip_200711070637_0001_r_000160 successfully.
> This would suggest that if tasks are done before the timeout, the problem
> occurs in progress update. This is also not consistent since other reduce
> tasks in the same situation are successful.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.