Hi,

We are using Hadoop 0.16 and on our heavy IO job we are seeing lot of these 
exceptions.
We are seeing lot of task failures more than 50% :(. They are two reasons from 
log:
a) Task task_200810092310_0003_m_000020_0 failed to report status for 600 seconds. Killing! - b) java.io.IOException: Could not get block locations. Aborting... at
                
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:1824)
                at
                
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java:1479)
                at
                
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1571)



/Tasktracker log:

/Exception in createBlockOutputStream java.net.SocketTimeoutException: Read 
timed out
2008-10-10 05:50:10,485 INFO org.apache.hadoop.fs.DFSClient: Abandoning block 
blk_-5660296346325180487
.
..
.
Parent Died.

/Datanode log /
2008-10-10 00:00:23,066 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 
blk_6562287961399683551 1 Exception java.net.SocketException: Broken pipe
       at java.net.SocketOutputStream.socketWrite0(Native Method)
       at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
       at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
       at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
       at org.apache.hadoop.dfs.DataNode$PacketResponder.run(DataNode.java:1823)
       at java.lang.Thread.run(Thread.java:619)

2008-10-10 00:00:23,067 ERROR org.apache.hadoop.dfs.DataNode: /<localhost ip 
>/:50010:DataXceiver: java.io.EOFException
       at java.io.DataInputStream.readInt(DataInputStream.java:375)
       at 
org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2263)
       at 
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150)
       at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
       at java.lang.Thread.run(Thread.java:619)


2008-10-10 00:53:53,790 INFO org.apache.hadoop.dfs.DataNode: Exception in 
receiveBlock for block blk_-3482274249842371655 java.net.SocketException: 
Connection reset
2008-10-10 00:53:53,791 INFO org.apache.hadoop.dfs.DataNode: writeBlock 
blk_-3482274249842371655 received exception java.net.SocketException: 
Connection reset
2008-10-10 00:53:53,791 ERROR org.apache.hadoop.dfs.DataNode: /<localhost 
ip>/:50010:DataXceiver: java.net.SocketException: Connection reset
       at java.net.SocketInputStream.read(SocketInputStream.java:168)
       at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
       at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
       at java.io.DataInputStream.readInt(DataInputStream.java:370)
       at 
org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2263)
       at 
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150)
       at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
       at java.lang.Thread.run(Thread.java:619)



Any pointer would help us a lot

-Sagar

Reply via email to