Hi,
We are using Hadoop 0.16 and on our heavy IO job we are seeing lot of these
exceptions.
We are seeing lot of task failures more than 50% :(. They are two reasons from
log:
a) Task task_200810092310_0003_m_000020_0 failed to report status for 600 seconds. Killing! -
b) java.io.IOException: Could not get block locations. Aborting... at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:1824)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java:1479)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1571)
/Tasktracker log:
/Exception in createBlockOutputStream java.net.SocketTimeoutException: Read
timed out
2008-10-10 05:50:10,485 INFO org.apache.hadoop.fs.DFSClient: Abandoning block
blk_-5660296346325180487
.
..
.
Parent Died.
/Datanode log /
2008-10-10 00:00:23,066 INFO org.apache.hadoop.dfs.DataNode: PacketResponder
blk_6562287961399683551 1 Exception java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
at org.apache.hadoop.dfs.DataNode$PacketResponder.run(DataNode.java:1823)
at java.lang.Thread.run(Thread.java:619)
2008-10-10 00:00:23,067 ERROR org.apache.hadoop.dfs.DataNode: /<localhost ip
>/:50010:DataXceiver: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at
org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2263)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150)
at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
at java.lang.Thread.run(Thread.java:619)
2008-10-10 00:53:53,790 INFO org.apache.hadoop.dfs.DataNode: Exception in
receiveBlock for block blk_-3482274249842371655 java.net.SocketException:
Connection reset
2008-10-10 00:53:53,791 INFO org.apache.hadoop.dfs.DataNode: writeBlock
blk_-3482274249842371655 received exception java.net.SocketException:
Connection reset
2008-10-10 00:53:53,791 ERROR org.apache.hadoop.dfs.DataNode: /<localhost
ip>/:50010:DataXceiver: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at
org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2263)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150)
at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
at java.lang.Thread.run(Thread.java:619)
Any pointer would help us a lot
-Sagar