060307 141033 parsing file:/home/hdiwan/nutch-0.7.1/conf/nutch-default.xml 060307 141033 parsing file:/home/hdiwan/nutch-0.7.1/conf/crawl-tool.xml 060307 141033 parsing file:/home/hdiwan/nutch-0.7.1/conf/nutch-site.xml 060307 141033 SEVERE bad conf file: top-level element not <nutch-conf> 060307 141033 No FS indicated, using default:local 060307 141033 crawl started in: ../SpectraSearch/crawl/ 060307 141033 rootUrlFile = ../SpectraSearch/urls 060307 141033 threads = 3 060307 141033 depth = 2 060307 141033 Created webdb at LocalFS,/home/hdiwan/SpectraSearch/crawl/db 060307 141033 Starting URL processing 060307 141033 Plugins: looking in: /home/hdiwan/nutch-0.7.1/build/plugins 060307 141033 not including: /home/hdiwan/nutch-0.7.1 /build/plugins/protocol-file 060307 141033 not including: /home/hdiwan/nutch-0.7.1 /build/plugins/protocol-ftp 060307 141033 parsing: /home/hdiwan/nutch-0.7.1 /build/plugins/protocol-http/plugin.xml 060307 141033 impl: point=org.apache.nutch.protocol.Protocol class= org.apache.nutch.protocol.http.Http 060307 141033 parsing: /home/hdiwan/nutch-0.7.1 /build/plugins/protocol-httpclient/plugin.xml 060307 141034 impl: point=org.apache.nutch.protocol.Protocol class= org.apache.nutch.protocol.httpclient.Http 060307 141034 impl: point=org.apache.nutch.protocol.Protocol class= org.apache.nutch.protocol.httpclient.Http 060307 141034 parsing: /home/hdiwan/nutch-0.7.1 /build/plugins/parse-html/plugin.xml 060307 141034 impl: point=org.apache.nutch.parse.Parser class= org.apache.nutch.parse.html.HtmlParser 060307 141034 parsing: /home/hdiwan/nutch-0.7.1 /build/plugins/parse-js/plugin.xml 060307 141034 impl: point=org.apache.nutch.parse.Parser class= org.apache.nutch.parse.js.JSParseFilter 060307 141034 impl: point=org.apache.nutch.parse.HtmlParseFilter class= org.apache.nutch.parse.js.JSParseFilter 060307 141034 parsing: /home/hdiwan/nutch-0.7.1 /build/plugins/parse-text/plugin.xml 060307 141034 impl: point=org.apache.nutch.parse.Parser class= org.apache.nutch.parse.text.TextParser 060307 141034 not including: /home/hdiwan/nutch-0.7.1 /build/plugins/parse-pdf 060307 141034 not including: /home/hdiwan/nutch-0.7.1 /build/plugins/parse-rss 060307 141034 not including: /home/hdiwan/nutch-0.7.1 /build/plugins/parse-msword 060307 141034 not including: /home/hdiwan/nutch-0.7.1 /build/plugins/parse-ext 060307 141034 parsing: /home/hdiwan/nutch-0.7.1 /build/plugins/index-basic/plugin.xml 060307 141034 impl: point=org.apache.nutch.indexer.IndexingFilter class= org.apache.nutch.indexer.basic.BasicIndexingFilter 060307 141034 parsing: /home/hdiwan/nutch-0.7.1 /build/plugins/index-more/plugin.xml 060307 141034 impl: point=org.apache.nutch.indexer.IndexingFilter class= org.apache.nutch.indexer.more.MoreIndexingFilter 060307 141034 parsing: /home/hdiwan/nutch-0.7.1 /build/plugins/query-basic/plugin.xml 060307 141034 impl: point=org.apache.nutch.searcher.QueryFilter class= org.apache.nutch.searcher.basic.BasicQueryFilter 060307 141034 parsing: /home/hdiwan/nutch-0.7.1 /build/plugins/query-more/plugin.xml 060307 141034 impl: point=org.apache.nutch.searcher.QueryFilter class= org.apache.nutch.searcher.more.TypeQueryFilter 060307 141034 impl: point=org.apache.nutch.searcher.QueryFilter class= org.apache.nutch.searcher.more.DateQueryFilter 060307 141034 parsing: /home/hdiwan/nutch-0.7.1 /build/plugins/query-site/plugin.xml 060307 141034 impl: point=org.apache.nutch.searcher.QueryFilter class= org.apache.nutch.searcher.site.SiteQueryFilter 060307 141034 parsing: /home/hdiwan/nutch-0.7.1 /build/plugins/query-url/plugin.xml 060307 141034 impl: point=org.apache.nutch.searcher.QueryFilter class= org.apache.nutch.searcher.url.URLQueryFilter 060307 141034 parsing: /home/hdiwan/nutch-0.7.1 /build/plugins/urlfilter-regex/plugin.xml 060307 141034 impl: point=org.apache.nutch.net.URLFilter class= org.apache.nutch.net.RegexURLFilter 060307 141034 not including: /home/hdiwan/nutch-0.7.1 /build/plugins/urlfilter-prefix 060307 141034 not including: /home/hdiwan/nutch-0.7.1 /build/plugins/creativecommons 060307 141034 not including: /home/hdiwan/nutch-0.7.1 /build/plugins/language-identifier 060307 141034 not including: /home/hdiwan/nutch-0.7.1 /build/plugins/clustering-carrot2 060307 141034 not including: /home/hdiwan/nutch-0.7.1/build/plugins/ontology 060307 141034 SEVERE org.apache.nutch.plugin.PluginRuntimeException: extension point: org.apache.nutch.protocol.Protocol does not exist. Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437) at org.apache.nutch.db.WebDBInjector.injectURLFile( WebDBInjector.java:378) at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535) at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134) Caused by: java.lang.RuntimeException: org.apache.nutch.plugin.PluginRuntimeException: extension point: org.apache.nutch.protocol.Protocol does not exist. at org.apache.nutch.plugin.PluginRepository.getInstance( PluginRepository.java:147) at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:40) ... 4 more Caused by: org.apache.nutch.plugin.PluginRuntimeException: extension point: org.apache.nutch.protocol.Protocol does not exist. at org.apache.nutch.plugin.PluginRepository.installExtensions( PluginRepository.java:78) at org.apache.nutch.plugin.PluginRepository.<init>( PluginRepository.java:61) at org.apache.nutch.plugin.PluginRepository.getInstance( PluginRepository.java:144) ... 5 more
That's from my log. A preliminary investigation follows, with steps and results pasted: 1. check the nutch-0.7.1 war file for the relevant class: % jar tvf ./nutch-0.7.1.jar | grep Protocol server: 2:14pm % jar tvf ./nutch-0.7.1.jar | grep Protocol.class 756 Tue Mar 07 13:17:04 PST 2006 org/apache/nutch/mapReduce/InterTrackerProtocol.class 491 Tue Mar 07 13:17:04 PST 2006 org/apache/nutch/mapReduce/JobSubmissionProtocol.class 324 Tue Mar 07 13:17:04 PST 2006 org/apache/nutch/mapReduce/MapOutputProtocol.class 409 Tue Mar 07 13:17:04 PST 2006 org/apache/nutch/mapReduce/TaskUmbilicalProtocol.class 517 Tue Mar 07 13:17:04 PST 2006 org/apache/nutch/protocol/Protocol.class 469 Tue Mar 07 13:17:04 PST 2006 org/apache/nutch/searcher/DistributedSearch$Protocol.class So it indeed exists. 2. ... Perhaps, it wasn't found in the source tree... find ./src/java -name 'Protocol.java' -print server: 2:14pm % find ./src -name 'Protocol.java' -print [~/nutch- 0.7.1] ./src/java/org/apache/nutch/protocol/Protocol.java Now I'm stumped... Help! -- Cheers, Hasan Diwan <[EMAIL PROTECTED]>