We have seen the following HADOOP error occur about 100 times a day spread
out thoughout the day on each RegionServer/DataNode in our always-on
HBase/Hadoop cluster.

>From *hadoop-gumgum-datanode-xxxxxxxxxxxx.log*

*2009-12-23* *09:58:29*,*717* *ERROR*
*org.apache.hadoop.hdfs.server.datanode.DataNode:*
*DatanodeRegistration*(*10.255.9.187:50010*,
*storageID=DS-1057956046-10.255.9.187-50010-1248395287725*,
*infoPort=50075*, *ipcPort=50020*)*:DataXceiver*
*java.net.SocketTimeoutException:* *480000* *millis* *timeout* *while*
*waiting* *for* *channel* *to* *be* *ready* *for* *write.* *ch* *:*
*java.nio.channels.SocketChannel*[*connected*
*local=/10.255.9.187:50010* *remote=/10.255.9.187:46154*]
        *at* 
*org.apache.hadoop.net.SocketIOWithTimeout.waitForIO*(*SocketIOWithTimeout.java:246*)
        *at* 
*org.apache.hadoop.net.SocketOutputStream.waitForWritable*(*SocketOutputStream.java:159*)
        *at* 
*org.apache.hadoop.net.SocketOutputStream.transferToFully*(*SocketOutputStream.java:198*)
        *at* 
*org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks*(*BlockSender.java:313*)
        *at* 
*org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock*(*BlockSender.java:400*)
        *at* 
*org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock*(*DataXceiver.java:180*)
        *at* 
*org.apache.hadoop.hdfs.server.datanode.DataXceiver.run*(*DataXceiver.java:95*)
        *at* *java.lang.Thread.run*(*Thread.java:619*)


Are other people seeing this error too?  How serious is it?  Can it be
prevented?

I found a few things that seem related, but I'm not sure how they apply to
the HBase environment:
http://issues.apache.org/jira/browse/HDFS-693
https://issues.apache.org/jira/browse/HADOOP-3831

Info on our environment:
1 Node: Master/NameNode/JobTracker (EC2 m1.large)
3 Nodes: RegionServer/DataNode/TaskTracker (EC2 m1.large)

Thanks!

-Ken Weiner
 GumGum & BEDROCK

Reply via email to