very long cleanup after a job fails
-----------------------------------
Key: HADOOP-244
URL: http://issues.apache.org/jira/browse/HADOOP-244
Project: Hadoop
Type: Bug
Components: mapred
Reporter: Yoram Arnon
Assigned to: Sameer Paranjpye
Eight hours after a job failed (it executed for about 14 hours prior to
failing), many task trackers keep throwing the exceptions below:
060523 121732 Server handler 0 on 50040 caught: java.io.FileNotFoundException:
LocalFS
java.io.FileNotFoundException: LocalFS
at
org.apache.hadoop.fs.LocalFileSystem.openRaw(LocalFileSystem.java:123)
at
org.apache.hadoop.fs.FSDataInputStream$Checker.<init>(FSDataInputStream.java:46)
at
org.apache.hadoop.fs.FSDataInputStream.<init>(FSDataInputStream.java:228)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:157)
at org.apache.hadoop.mapred.MapOutputFile.write(MapOutputFile.java:116)
at
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:151)
at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:64)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:230)
060523 121814 task_0006_r_000123_0 copy failed: task_0006_m_046105_0 from
node5:50040
java.net.SocketTimeoutException: timed out waiting for rpc response
at org.apache.hadoop.ipc.Client.call(Client.java:305)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:150)
at org.apache.hadoop.mapred.$Proxy2.getFile(Unknown Source)
at
org.apache.hadoop.mapred.ReduceTaskRunner.prepare(ReduceTaskRunner.java:112)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:67)
060523 121814 task_0006_r_000123_0 0.13023989% reduce > copy > [EMAIL
PROTECTED]:50040
060523 121814 task_0006_r_000123_0 Copying task_0006_m_048815_0 output from
node6
060523 121817 SEVERE Can't open map
output:/hadoop/mapred/local/task_0006_m_031921_0/part-152.out
java.io.FileNotFoundException: LocalFS
at
org.apache.hadoop.fs.LocalFileSystem.openRaw(LocalFileSystem.java:123)
at
org.apache.hadoop.fs.FSDataInputStream$Checker.<init>(FSDataInputStream.java:46)
at
org.apache.hadoop.fs.FSDataInputStream.<init>(FSDataInputStream.java:228)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:157)
at org.apache.hadoop.mapred.MapOutputFile.write(MapOutputFile.java:116)
at
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:151)
at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:64)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:230)
060523 121817 Unknown child with bad map output: task_0006_m_031921_0. Ignored.
060523 121817 Server handler 1 on 50040 caught: java.io.FileNotFoundException:
LocalFS
java.io.FileNotFoundException: LocalFS
at
org.apache.hadoop.fs.LocalFileSystem.openRaw(LocalFileSystem.java:123)
at
org.apache.hadoop.fs.FSDataInputStream$Checker.<init>(FSDataInputStream.java:46)
at
org.apache.hadoop.fs.FSDataInputStream.<init>(FSDataInputStream.java:228)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:157)
at org.apache.hadoop.mapred.MapOutputFile.write(MapOutputFile.java:116)
at
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:151)
at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:64)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:230)
060523 121914 task_0006_r_000123_0 copy failed: task_0006_m_048815_0 from
node6:50040
java.net.SocketTimeoutException: timed out waiting for rpc response
at org.apache.hadoop.ipc.Client.call(Client.java:305)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:150)
at org.apache.hadoop.mapred.$Proxy2.getFile(Unknown Source)
at
org.apache.hadoop.mapred.ReduceTaskRunner.prepare(ReduceTaskRunner.java:112)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:67)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira