Hi I'm trying to set up a nutch crawl using hadoop, and the crawl normally stops at depth 0, although sometimes it goes to depth 1. It should continue to depth 3.
I think the problem may be in hadoop, since I'm seeing various errors in the datanode log files, such as: 2008-01-30 10:27:51,487 WARN dfs.DataNode - Failed to transfer blk_3160625876530276979 to 129.215.164.52:51010 got java.net.SocketException: Connection reset I can telnet to this ip/port so I don't think it's firewalled. Also: 2008-01-30 10:27:17,157 ERROR dfs.DataNode - DataXceiver: java.io.IOException: Block blk_-3070006959369401863 has already been started (though not completed), and thus cannot be created. 2008-01-30 10:27:56,217 ERROR dfs.DataNode - DataXceiver: java.io.IOException: Block blk_-712543843244766261 is valid, and cannot be written to. 2008-01-30 10:34:59,510 ERROR dfs.DataNode - DataXceiver: java.io.EOFException I assume these errors are indicative of some problem in my hadoop configuration, but I can't see what. I'm using hadoop 0.15.0, distributed with nutch 2008-01-25 Any suggestions? thanks and regards Barry
