Re: java.io.EOFException: while trying to read 65557 bytes

2009-05-07 Thread Raghu Angadi

Albert Sunwoo wrote:

Thanks for the info!

I was hoping to get some more specific information though. 


in short : we need to more info.

There are typically 4 machines/processes involved in a write : the 
client and 3 datanodes writing the replicas. To see what really 
happened, you need to provide error message(s) for this block on these 
other parts (at least on 3 datanodes should be useful).


This particular error just implies this datanode is the 2nd of the 3 
datanodes (assuming replication of 3) in the write pipeline and its 
connection from the 1st datanode was closed. To deduce more we need more 
info... starting with what happened to that block on the first datanode.


also the 3rd datanode is 10.102.0.106, the block you should grep for in 
other logs is blk_-7056150840276493498 etc..


You should try to see what could be useful information for others 
diagnose the problem... more than likely you will find the cause 
yourself in the process.


Raghu.


We are seeing these occur during every run, and as such it's not leaving some 
folks in our organization with a good feeling about the reliability of HDFS.
Do these occur as a result of resources being unavailable?  Perhaps the nodes 
are too busy and can no longer service reads from other nodes?  Or if the jobs 
are causing too much network traffic?  At first glance the machines do not 
seemed to be pinned, however I am wondering if sudden bursts of jobs can be 
causing these as well.  If so does anyone have configuration recommendations to 
minimize or remove these errors under any of these circumstances, or perhaps 
there is another explanation?

Thanks,
Albert

On 5/5/09 11:34 AM, Raghu Angadi rang...@yahoo-inc.com wrote:



This can happen for example when a client is killed when it has some
files open for write. In that case it is an expected error (the log
should really be at WARN or INFO level).

Raghu.

Albert Sunwoo wrote:

Hello Everyone,

I know there's been some chatter about this before but I am seeing the errors 
below on just about every one of our nodes.  Is there a definitive reason on 
why these are occuring, is there something that we can do to prevent these?

2009-05-04 21:35:11,764 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.102.0.105:50010, 
storageID=DS-991582569-127.0.0.1-50010-1240886381606, infoPort=50075, 
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)

Followed by:
2009-05-04 21:35:20,891 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_-7056150840276493498_10885 1 Exception 
java.io.InterruptedIOException: Interruped while waiting for IO on channel 
java.nio.channels.Socke
tChannel[connected local=/10.102.0.105:37293 remote=/10.102.0.106:50010]. 59756 
millis timeout left.
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:277)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:853)
at java.lang.Thread.run(Thread.java:619)

Thanks,
Albert









Re: java.io.EOFException: while trying to read 65557 bytes

2009-05-06 Thread Albert Sunwoo
Thanks for the info!

I was hoping to get some more specific information though.  We are seeing these 
occur during every run, and as such it's not leaving some folks in our 
organization with a good feeling about the reliability of HDFS.
Do these occur as a result of resources being unavailable?  Perhaps the nodes 
are too busy and can no longer service reads from other nodes?  Or if the jobs 
are causing too much network traffic?  At first glance the machines do not 
seemed to be pinned, however I am wondering if sudden bursts of jobs can be 
causing these as well.  If so does anyone have configuration recommendations to 
minimize or remove these errors under any of these circumstances, or perhaps 
there is another explanation?

Thanks,
Albert

On 5/5/09 11:34 AM, Raghu Angadi rang...@yahoo-inc.com wrote:



This can happen for example when a client is killed when it has some
files open for write. In that case it is an expected error (the log
should really be at WARN or INFO level).

Raghu.

Albert Sunwoo wrote:
 Hello Everyone,

 I know there's been some chatter about this before but I am seeing the errors 
 below on just about every one of our nodes.  Is there a definitive reason on 
 why these are occuring, is there something that we can do to prevent these?

 2009-05-04 21:35:11,764 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(10.102.0.105:50010, 
 storageID=DS-991582569-127.0.0.1-50010-1240886381606, infoPort=50075, 
 ipcPort=50020):DataXceiver
 java.io.EOFException: while trying to read 65557 bytes
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
 at java.lang.Thread.run(Thread.java:619)

 Followed by:
 2009-05-04 21:35:20,891 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 PacketResponder blk_-7056150840276493498_10885 1 Exception 
 java.io.InterruptedIOException: Interruped while waiting for IO on channel 
 java.nio.channels.Socke
 tChannel[connected local=/10.102.0.105:37293 remote=/10.102.0.106:50010]. 
 59756 millis timeout left.
 at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:277)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
 at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
 at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
 at java.io.DataInputStream.readFully(DataInputStream.java:178)
 at java.io.DataInputStream.readLong(DataInputStream.java:399)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:853)
 at java.lang.Thread.run(Thread.java:619)

 Thanks,
 Albert





Re: java.io.EOFException: while trying to read 65557 bytes

2009-05-05 Thread Raghu Angadi


This can happen for example when a client is killed when it has some 
files open for write. In that case it is an expected error (the log 
should really be at WARN or INFO level).


Raghu.

Albert Sunwoo wrote:

Hello Everyone,

I know there's been some chatter about this before but I am seeing the errors 
below on just about every one of our nodes.  Is there a definitive reason on 
why these are occuring, is there something that we can do to prevent these?

2009-05-04 21:35:11,764 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.102.0.105:50010, 
storageID=DS-991582569-127.0.0.1-50010-1240886381606, infoPort=50075, 
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)

Followed by:
2009-05-04 21:35:20,891 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_-7056150840276493498_10885 1 Exception 
java.io.InterruptedIOException: Interruped while waiting for IO on channel 
java.nio.channels.Socke
tChannel[connected local=/10.102.0.105:37293 remote=/10.102.0.106:50010]. 59756 
millis timeout left.
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:277)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:853)
at java.lang.Thread.run(Thread.java:619)

Thanks,
Albert





java.io.EOFException: while trying to read 65557 bytes

2009-05-04 Thread Albert Sunwoo
Hello Everyone,

I know there's been some chatter about this before but I am seeing the errors 
below on just about every one of our nodes.  Is there a definitive reason on 
why these are occuring, is there something that we can do to prevent these?

2009-05-04 21:35:11,764 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.102.0.105:50010, 
storageID=DS-991582569-127.0.0.1-50010-1240886381606, infoPort=50075, 
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:619)

Followed by:
2009-05-04 21:35:20,891 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_-7056150840276493498_10885 1 Exception 
java.io.InterruptedIOException: Interruped while waiting for IO on channel 
java.nio.channels.Socke
tChannel[connected local=/10.102.0.105:37293 remote=/10.102.0.106:50010]. 59756 
millis timeout left.
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:277)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:853)
at java.lang.Thread.run(Thread.java:619)

Thanks,
Albert