[ https://issues.apache.org/jira/browse/HADOOP-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy resolved HADOOP-2167. ----------------------------------- Resolution: Cannot Reproduce We haven't seen this nor can we seem to repro it. Also HADOOP-2216 led us astray... I'm closing this for now, please re-open if required. > Reduce tips complete 100%, but job does not complete saying reduces still > running. > ---------------------------------------------------------------------------------- > > Key: HADOOP-2167 > URL: https://issues.apache.org/jira/browse/HADOOP-2167 > Project: Hadoop > Issue Type: Bug > Components: mapred > Reporter: Amareshwari Sri Ramadasu > Assignee: Arun C Murthy > Priority: Critical > Fix For: 0.16.0 > > > Job's reduces are stuck at 99.43% progress and 2 reduces in running state and > Job is not complete. > But the reduce task list on the job tracker shows they are complete 100% and > marked as SUCCEEDED and Finishtime is available jobtasks.jsp and jobhistory > also. > With ipc.client.timeout = 600000, the exceptions on TT's running the reduces > are > On one of the TTs, the logs show the following: > 2007-11-07 08:34:16,092 INFO org.apache.hadoop.mapred.TaskTracker: Task > task_200711070637_0001_r_000150_0 is done. > 2007-11-07 08:35:34,013 INFO org.apache.hadoop.mapred.TaskTracker: Task > task_200711070637_0001_r_000156_0 is done. > 2007-11-07 08:42:44,751 ERROR org.apache.hadoop.mapred.TaskTracker: Caught > exception: java.net.SocketTimeoutException: timedout waiting for rpc response > at org.apache.hadoop.ipc.Client.call(Client.java:484) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184) > at org.apache.hadoop.mapred.$Proxy0.heartbeat(Unknown Source) > at > org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:897) > at > org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:799) > at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1193) > at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2055) > 2007-11-07 08:42:44,767 INFO org.apache.hadoop.mapred.TaskTracker: Resending > 'status' to ................. > On the other TT, > 2007-11-07 08:40:30,484 INFO org.apache.hadoop.mapred.TaskTracker: Task > task_200711070637_0001_r_000160_0 is done. > 2007-11-07 08:42:45,508 ERROR org.apache.hadoop.mapred.TaskTracker: Caught > exception: java.net.SocketTimeoutException: timedout waiting for rpc response > at org.apache.hadoop.ipc.Client.call(Client.java:484) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184) > at org.apache.hadoop.mapred.$Proxy0.heartbeat(Unknown Source) > at > org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:897) > at > org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:799) > at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1193) > at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2055) > 2007-11-07 08:42:45,508 INFO org.apache.hadoop.mapred.TaskTracker: Resending > 'status' to .......... > On JT logs, the reduce tasks are done successfully: > 2007-11-07 06:39:09,151 INFO org.apache.hadoop.mapred.JobTracker: Adding task > 'task_200711070637_0001_r_000160_0' to tip tip_200711070637_0001_r_000160, > for tracker 'x' > 2007-11-07 08:42:45,708 INFO org.apache.hadoop.mapred.TaskRunner: Saved > output of task 'task_200711070637_0001_r_000160_0' to 'y' > 2007-11-07 08:42:45,708 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'task_200711070637_0001_r_000160_0' has completed > tip_200711070637_0001_r_000160 successfully. > This would suggest that if tasks are done before the timeout, the problem > occurs in progress update. This is also not consistent since other reduce > tasks in the same situation are successful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.