Hello, I am running nutch 0.9 currently. I am running on 4 nodes, one is the master, in addition to being a slave.
I originally set up nutch to run locally. Nutch would output log messages similar to the following: ----- fetch of http://www.example.com/path/to/script1.asp failed with: java.net.SocketTimeoutException: Read timed out fetch of http://www.example.com/path/to/script2.asp failed with: java.lang.NullPointerException ----- Now, using the hadoop/mapred configuration with multiple nodes, I am not seeing anything like this. I just see the normal output: ----- Fetcher: starting Fetcher: segment: /var/nutch/crawl/segments/20080116220010 Fetcher: done ----- I have made one change to the Fetcher.java code, changing the default logging of every URL from the info level to the debug level: ----- --- archive/Fetcher.java.20070402.2044 Fri Sep 7 17:47:25 2007 +++ Fetcher.java Fri Sep 7 17:48:06 2007 @@ -131,7 +131,7 @@ Text url = new Text(); url.set(key); try { - if (LOG.isInfoEnabled()) { LOG.info("fetching " + url); } + if (LOG.isDebugEnabled()) { LOG.debug("fetching " + url); } // fetch the page boolean redirecting; ----- I have searched my output logs, as well as all the hadoop logs. I am unable to find the normal failures I see when running the fetch command. Please let me know where else I should be checking for these logs. If you need any additional information, please let me know and I'll send them. Thanks! JohnM -- john mendenhall [EMAIL PROTECTED] surf utopia internet services