I aplogize in advance for the lengthy e-mail, but i tried to provide as much
info as i could...
Everything, including the crawler works fine right until i go on my
localhost, and enter something and click search, it always says Hits 0-0
(out of about 0 total matching pages): . I already spent about 18 hours
total reading everything related to Nutch and trying to find my mistake, but
no luck.(I've spent almost a week before that trying to set it up with
tutorials on Nutch wiki) I have to have nutch working by Sunday afternoon,
and right now i'm very stressed out about it. So if anyone would please help
i would be so very very thankfull.
Ok, my computer is Fedora Core 5, and i have jdk1.6.0_06,
apache-tomcat-5.5.16, and currently trying to get nutch 9 running (after
having been extremly unsucessfull with versions 7 and 8)
Here is my nutch-site.xml, from
/opt/apache-tomcat-5.5.16/webapps/nutch-0.9/WEB-INF/classes
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>searcher.dir</name>
<value>/opt/nutch-0.9/crawl/segments</value>
</property>
</configuration>
Here is my nutch-site.xml from /opt/nutch-0.9/conf
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>http.agent.name</name>
<value>User</value>
<description>User
</description>
</property>
<property>
<name>http.agent.description</name>
<value>Nutch spiderman</value>
<description> Nutch spiderman
</description>
</property>
</property>
</configuration>
Here is my shoppinglist.txt from /opt/nutch-0.9/urls
http://www.google.com/
Here is the part of crawl-urlfilter.txt from /opt/nutch-0.9/conf that i
modified.
# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*google.com
And here is a recent piece from my catalina.out
Jun 13, 2008 2:13:49 PM org.apache.catalina.startup.Catalina start
INFO: Server startup in 1376 ms
2008-06-13 14:14:09,617 INFO PluginRepository - Plugins: looking in:
/opt/apache-tomcat-5.5.16/webapps/nutch-0.9/WEB-INF/classes/plugins
2008-06-13 14:14:09,762 INFO PluginRepository - Plugin Auto-activation
mode: [true]
2008-06-13 14:14:09,762 INFO PluginRepository - Registered Plugins:
2008-06-13 14:14:09,762 INFO PluginRepository - the nutch core extension
points (nutch-extensionpoints)
2008-06-13 14:14:09,762 INFO PluginRepository - Basic Query Filter
(query-basic)
2008-06-13 14:14:09,762 INFO PluginRepository - Basic URL Normalizer
(urlnormalizer-basic)
2008-06-13 14:14:09,762 INFO PluginRepository - Html Parse Plug-in
(parse-html)
2008-06-13 14:14:09,762 INFO PluginRepository - Basic Indexing Filter
(index-basic)
2008-06-13 14:14:09,762 INFO PluginRepository - Basic Summarizer Plug-in
(summary-basic)
2008-06-13 14:14:09,763 INFO PluginRepository - Site Query Filter
(query-site)
2008-06-13 14:14:09,763 INFO PluginRepository - HTTP Framework
(lib-http)
2008-06-13 14:14:09,763 INFO PluginRepository - Text Parse Plug-in
(parse-text)
2008-06-13 14:14:09,763 INFO PluginRepository - Regex URL Filter
(urlfilter-regex)
2008-06-13 14:14:09,763 INFO PluginRepository - Pass-through URL
Normalizer (urlnormalizer-pass)
2008-06-13 14:14:09,763 INFO PluginRepository - Http Protocol Plug-in
(protocol-http)
2008-06-13 14:14:09,763 INFO PluginRepository - Regex URL Normalizer
(urlnormalizer-regex)
2008-06-13 14:14:09,763 INFO PluginRepository - OPIC Scoring Plug-in
(scoring-opic)
2008-06-13 14:14:09,763 INFO PluginRepository - CyberNeko HTML Parser
(lib-nekohtml)
2008-06-13 14:14:09,763 INFO PluginRepository - JavaScript Parser
(parse-js)
2008-06-13 14:14:09,764 INFO PluginRepository - URL Query Filter
(query-url)
2008-06-13 14:14:09,764 INFO PluginRepository - Regex URL Filter
Framework
(lib-regex-filter)
2008-06-13 14:14:09,764 INFO PluginRepository - Registered
Extension-Points:
2008-06-13 14:14:09,764 INFO PluginRepository - Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2008-06-13 14:14:09,764 INFO PluginRepository - Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
2008-06-13 14:14:09,764 INFO PluginRepository - Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2008-06-13 14:14:09,764 INFO PluginRepository - Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2008-06-13 14:14:09,764 INFO PluginRepository - Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2008-06-13 14:14:09,764 INFO PluginRepository - Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
2008-06-13 14:14:09,764 INFO PluginRepository - Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2008-06-13 14:14:09,764 INFO PluginRepository - HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2008-06-13 14:14:09,765 INFO PluginRepository - Nutch Content Parser
(org.apache.nutch.parse.Parser)
2008-06-13 14:14:09,765 INFO PluginRepository - Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2008-06-13 14:14:09,765 INFO PluginRepository - Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2008-06-13 14:14:09,765 INFO PluginRepository - Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
2008-06-13 14:14:09,774 INFO NutchBean - creating new bean
2008-06-13 14:14:09,791 INFO NutchBean - opening indexes in
/opt/nutch-0.9/crawl/segments/indexes
2008-06-13 14:14:09,833 INFO Configuration - found resource
common-terms.utf8 at
file:/opt/apache-tomcat-5.5.16/webapps/nutch-0.9/WEB-INF/classes/common-terms.utf8
2008-06-13 14:14:09,839 INFO NutchBean - opening segments in
/opt/nutch-0.9/crawl/segments/segments
2008-06-13 14:14:09,851 INFO SummarizerFactory - Using the first summarizer
extension found: Basic Summarizer
2008-06-13 14:14:09,851 INFO NutchBean - opening linkdb in
/opt/nutch-0.9/crawl/segments/linkdb
2008-06-13 14:14:09,858 INFO NutchBean - query request from 127.0.0.1
2008-06-13 14:14:09,870 INFO NutchBean - query: bananas
2008-06-13 14:14:09,871 INFO NutchBean - lang: en
2008-06-13 14:14:09,901 INFO NutchBean - searching for 20 raw hits
2008-06-13 14:14:09,942 INFO NutchBean - total hits: 0
2008-06-13 14:15:16,336 INFO NutchBean - query request from 127.0.0.1
2008-06-13 14:15:16,337 INFO NutchBean - query: HTTP Status 500 root cause
2008-06-13 14:15:16,338 INFO NutchBean - lang: en
2008-06-13 14:15:16,341 INFO NutchBean - searching for 20 raw hits
2008-06-13 14:15:16,353 INFO NutchBean - total hits: 0
--
View this message in context:
http://www.nabble.com/Please-help-me-find-my-mistake--Searching-tp17830512p17830512.html
Sent from the Nutch - User mailing list archive at Nabble.com.