Hello,

I am running nutch 0.9 currently.
I am running on 4 nodes, one is the master, in
addition to being a slave.

I originally set up nutch to run locally.
Nutch would output log messages similar to the
following:

-----
fetch of http://www.example.com/path/to/script1.asp failed with: 
java.net.SocketTimeoutException: Read timed out
fetch of http://www.example.com/path/to/script2.asp failed with: 
java.lang.NullPointerException
-----

Now, using the hadoop/mapred configuration
with multiple nodes, I am not seeing anything
like this.  I just see the normal output:

-----
Fetcher: starting
Fetcher: segment: /var/nutch/crawl/segments/20080116220010
Fetcher: done
-----

I have made one change to the Fetcher.java
code, changing the default logging of every
URL from the info level to the debug level:

-----
--- archive/Fetcher.java.20070402.2044  Fri Sep  7 17:47:25 2007
+++ Fetcher.java        Fri Sep  7 17:48:06 2007
@@ -131,7 +131,7 @@
           Text url = new Text();
           url.set(key);
           try {
-            if (LOG.isInfoEnabled()) { LOG.info("fetching " + url); }
+            if (LOG.isDebugEnabled()) { LOG.debug("fetching " + url); }
 
             // fetch the page
             boolean redirecting;
-----

I have searched my output logs, as well as
all the hadoop logs.  I am unable to find
the normal failures I see when running the
fetch command.

Please let me know where else I should be
checking for these logs.

If you need any additional information,
please let me know and I'll send them.

Thanks!

JohnM

-- 
john mendenhall
[EMAIL PROTECTED]
surf utopia
internet services

Reply via email to