Hi all,
I've got an installation of Hadoop up working with a Nutch crawler,
and it looks like recently the jobs are all halting in the middle of
the reduce phase. This is on Hadoop 0.19.1
Here's what I'm seeing in the datanode logs: (there were a few in the
logs, but the last error was almost a day ago)
2009-05-04 17:02:24,889 ERROR datanode.DataNode -
DatanodeRegistration(10.9.17.206:50010,
storageID=DS-1024739802-10.9.17.206-50010-1238445482034,
infoPort=50075, ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 480000 millis timeout while waiting
for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/10.9.17.206:50010
remote=/10.9.17.206:50537]
at
org
.apache
.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)
at
org
.apache
.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:
159)
at
org
.apache
.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:
198)
at
org
.apache
.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:
313)
at
org
.apache
.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
at
org
.apache
.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
at
org
.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619)
I searched for the error message and it turned up a few potential bugs
with HBase, but I don't think that's in play here as I can't find any
mention of it in the configuration files for our setup. Or, if it's
possible I need to change hbase configurations, would that involve
creating an hbase-site.xml config file in the hadoop conf directory or
does that go directly in hadoop-site.xml?
Otherwise, I can't seem to track down what might be causing this. All
of the status information about the job that I can find seems to
report it's fine and normal, but it hasn't progressed in almost a
day's time now. (Should be a 3-5 hour job if all goes well, and it
used to...)
Ideas? Can I provide more info?
Thanks,
Mark