We have seen the following HADOOP error occur about 100 times a day spread
out thoughout the day on each RegionServer/DataNode in our always-on
HBase/Hadoop cluster.
>From *hadoop-gumgum-datanode-xxxxxxxxxxxx.log*
*2009-12-23* *09:58:29*,*717* *ERROR*
*org.apache.hadoop.hdfs.server.datanode.DataNode:*
*DatanodeRegistration*(*10.255.9.187:50010*,
*storageID=DS-1057956046-10.255.9.187-50010-1248395287725*,
*infoPort=50075*, *ipcPort=50020*)*:DataXceiver*
*java.net.SocketTimeoutException:* *480000* *millis* *timeout* *while*
*waiting* *for* *channel* *to* *be* *ready* *for* *write.* *ch* *:*
*java.nio.channels.SocketChannel*[*connected*
*local=/10.255.9.187:50010* *remote=/10.255.9.187:46154*]
*at*
*org.apache.hadoop.net.SocketIOWithTimeout.waitForIO*(*SocketIOWithTimeout.java:246*)
*at*
*org.apache.hadoop.net.SocketOutputStream.waitForWritable*(*SocketOutputStream.java:159*)
*at*
*org.apache.hadoop.net.SocketOutputStream.transferToFully*(*SocketOutputStream.java:198*)
*at*
*org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks*(*BlockSender.java:313*)
*at*
*org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock*(*BlockSender.java:400*)
*at*
*org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock*(*DataXceiver.java:180*)
*at*
*org.apache.hadoop.hdfs.server.datanode.DataXceiver.run*(*DataXceiver.java:95*)
*at* *java.lang.Thread.run*(*Thread.java:619*)
Are other people seeing this error too? How serious is it? Can it be
prevented?
I found a few things that seem related, but I'm not sure how they apply to
the HBase environment:
http://issues.apache.org/jira/browse/HDFS-693
https://issues.apache.org/jira/browse/HADOOP-3831
Info on our environment:
1 Node: Master/NameNode/JobTracker (EC2 m1.large)
3 Nodes: RegionServer/DataNode/TaskTracker (EC2 m1.large)
Thanks!
-Ken Weiner
GumGum & BEDROCK