Thanks for the info! I was hoping to get some more specific information though. We are seeing these occur during every run, and as such it's not leaving some folks in our organization with a good feeling about the reliability of HDFS. Do these occur as a result of resources being unavailable? Perhaps the nodes are too busy and can no longer service reads from other nodes? Or if the jobs are causing too much network traffic? At first glance the machines do not seemed to be pinned, however I am wondering if sudden bursts of jobs can be causing these as well. If so does anyone have configuration recommendations to minimize or remove these errors under any of these circumstances, or perhaps there is another explanation?
Thanks, Albert On 5/5/09 11:34 AM, "Raghu Angadi" <rang...@yahoo-inc.com> wrote: This can happen for example when a client is killed when it has some files open for write. In that case it is an expected error (the log should really be at WARN or INFO level). Raghu. Albert Sunwoo wrote: > Hello Everyone, > > I know there's been some chatter about this before but I am seeing the errors > below on just about every one of our nodes. Is there a definitive reason on > why these are occuring, is there something that we can do to prevent these? > > 2009-05-04 21:35:11,764 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.102.0.105:50010, > storageID=DS-991582569-127.0.0.1-50010-1240886381606, infoPort=50075, > ipcPort=50020):DataXceiver > java.io.EOFException: while trying to read 65557 bytes > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) > at java.lang.Thread.run(Thread.java:619) > > Followed by: > 2009-05-04 21:35:20,891 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > PacketResponder blk_-7056150840276493498_10885 1 Exception > java.io.InterruptedIOException: Interruped while waiting for IO on channel > java.nio.channels.Socke > tChannel[connected local=/10.102.0.105:37293 remote=/10.102.0.106:50010]. > 59756 millis timeout left. > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:277) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at java.io.DataInputStream.readLong(DataInputStream.java:399) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:853) > at java.lang.Thread.run(Thread.java:619) > > Thanks, > Albert >