Hi all,

I've got an installation of Hadoop up working with a Nutch crawler, and it looks like recently the jobs are all halting in the middle of the reduce phase. This is on Hadoop 0.19.1

Here's what I'm seeing in the datanode logs: (there were a few in the logs, but the last error was almost a day ago)

2009-05-04 17:02:24,889 ERROR datanode.DataNode - DatanodeRegistration(10.9.17.206:50010, storageID=DS-1024739802-10.9.17.206-50010-1238445482034, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.9.17.206:50010 remote=/10.9.17.206:50537] at org .apache .hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185) at org .apache .hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java: 159) at org .apache .hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java: 198) at org .apache .hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java: 313) at org .apache .hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) at org .apache .hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) at org .apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
       at java.lang.Thread.run(Thread.java:619)

I searched for the error message and it turned up a few potential bugs with HBase, but I don't think that's in play here as I can't find any mention of it in the configuration files for our setup. Or, if it's possible I need to change hbase configurations, would that involve creating an hbase-site.xml config file in the hadoop conf directory or does that go directly in hadoop-site.xml?

Otherwise, I can't seem to track down what might be causing this. All of the status information about the job that I can find seems to report it's fine and normal, but it hasn't progressed in almost a day's time now. (Should be a 3-5 hour job if all goes well, and it used to...)

Ideas? Can I provide more info?

Thanks,
Mark

Reply via email to