While trying to put together a good test-case today for the
"refetching all pages to update anchor text" problem, I ran into a
little problem with ProtocolFactory when a protocol handler isn't
found:

----
--FETCH-- /home/kangas/nutch-cvs/TEST/segments/20050115004458
expr: syntax error
050115 004459 loading file:/home/kangas/nutch-cvs/nutch/conf/nutch-default.xml
050115 004459 loading file:/home/kangas/nutch-cvs/nutch/conf/nutch-site.xml
050115 004459 Plugins: looking in: /home/kangas/nutch-cvs/nutch/build/plugins
050115 004459 not including:
/home/kangas/nutch-cvs/nutch/build/plugins/protocol-file
...
050115 004459 logging at FINE
050115 004459 fetching file:/home/kangas/nutch-cvs/TEST/htdocs/index.html
050115 004459 fetch of
file:/home/kangas/nutch-cvs/TEST/htdocs/index.html failed with:
java.lang.NullPointerException
050115 004459 stack
java.lang.NullPointerException
        at java.util.Hashtable.put(Hashtable.java:393)
        at 
net.nutch.protocol.ProtocolFactory.getExtension(ProtocolFactory.java:66)
        at 
net.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:44)
        at net.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:108)
050115 004500 number of active threads: 0
----

Digging down into ProtocolFactory.getExtension(), the problem is
pretty straightforward: it's the CACHE.put(name, extension) line, in
the case that 'extension' is null.

Did this work in some earlier JVM? It definitely does not with
1.4.2-p6, on either Darwin or FreeBSD.

Here is a trivial patch that fixes the problem, so ProtocolNotFound is
thrown instead of NullPointerException:



Index: ProtocolFactory.java
===================================================================
RCS file: 
/cvsroot/nutch/nutch/src/java/net/nutch/protocol/ProtocolFactory.java,v
retrieving revision 1.3
diff -u -r1.3 ProtocolFactory.java
--- ProtocolFactory.java        14 Jul 2004 23:02:07 -0000      1.3
+++ ProtocolFactory.java        15 Jan 2005 00:54:27 -0000
@@ -61,8 +61,9 @@
       return (Extension)CACHE.get(name);

     Extension extension = findExtension(name);
-
-    CACHE.put(name, extension);
+
+    if (extension != null)
+       CACHE.put(name, extension);

     return extension;
   }


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to