I aplogize in advance for the lengthy e-mail, but i tried to provide as much
info as i could...
Everything, including the crawler works fine right until i go on my
localhost, and enter something and click search, it always says Hits 0-0
(out of about 0 total matching pages): . I already spent about 18 hours
total reading everything related to Nutch and trying to find my mistake, but
no luck.(I've spent almost a week before that trying to set it up with
tutorials on Nutch wiki) I have to have nutch working by Sunday afternoon,
and right now i'm very stressed out about it. So if anyone would please help
i would be so very very thankfull. 


Ok, my computer is Fedora Core 5, and i have jdk1.6.0_06,
apache-tomcat-5.5.16, and currently trying to get nutch 9 running (after
having been extremly unsucessfull with versions 7 and 8) 

Here is my nutch-site.xml, from
/opt/apache-tomcat-5.5.16/webapps/nutch-0.9/WEB-INF/classes

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
        <name>searcher.dir</name>
        <value>/opt/nutch-0.9/crawl/segments</value>
</property>
</configuration>

Here is my nutch-site.xml from /opt/nutch-0.9/conf

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                
                <name>http.agent.name</name>
                
                <value>User</value>
                
                <description>User
                        
                </description>
                
        </property>
        
        
        
        <property>
                
                <name>http.agent.description</name>
                
                <value>Nutch spiderman</value>
                
                <description> Nutch spiderman
                        
                </description>
                
        </property>
        
        
        

                
</property>
</configuration>



Here is my shoppinglist.txt from /opt/nutch-0.9/urls

http://www.google.com/ 

Here is the part of crawl-urlfilter.txt from  /opt/nutch-0.9/conf  that i
modified. 

# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*google.com

And here is a recent piece from my catalina.out


Jun 13, 2008 2:13:49 PM org.apache.catalina.startup.Catalina start
INFO: Server startup in 1376 ms
2008-06-13 14:14:09,617 INFO  PluginRepository - Plugins: looking in:
/opt/apache-tomcat-5.5.16/webapps/nutch-0.9/WEB-INF/classes/plugins
2008-06-13 14:14:09,762 INFO  PluginRepository - Plugin Auto-activation
mode: [true]
2008-06-13 14:14:09,762 INFO  PluginRepository - Registered Plugins:
2008-06-13 14:14:09,762 INFO  PluginRepository -        the nutch core extension
points (nutch-extensionpoints)
2008-06-13 14:14:09,762 INFO  PluginRepository -        Basic Query Filter
(query-basic)
2008-06-13 14:14:09,762 INFO  PluginRepository -        Basic URL Normalizer
(urlnormalizer-basic)
2008-06-13 14:14:09,762 INFO  PluginRepository -        Html Parse Plug-in
(parse-html)
2008-06-13 14:14:09,762 INFO  PluginRepository -        Basic Indexing Filter
(index-basic)
2008-06-13 14:14:09,762 INFO  PluginRepository -        Basic Summarizer Plug-in
(summary-basic)
2008-06-13 14:14:09,763 INFO  PluginRepository -        Site Query Filter
(query-site)
2008-06-13 14:14:09,763 INFO  PluginRepository -        HTTP Framework 
(lib-http)
2008-06-13 14:14:09,763 INFO  PluginRepository -        Text Parse Plug-in
(parse-text)
2008-06-13 14:14:09,763 INFO  PluginRepository -        Regex URL Filter
(urlfilter-regex)
2008-06-13 14:14:09,763 INFO  PluginRepository -        Pass-through URL
Normalizer (urlnormalizer-pass)
2008-06-13 14:14:09,763 INFO  PluginRepository -        Http Protocol Plug-in
(protocol-http)
2008-06-13 14:14:09,763 INFO  PluginRepository -        Regex URL Normalizer
(urlnormalizer-regex)
2008-06-13 14:14:09,763 INFO  PluginRepository -        OPIC Scoring Plug-in
(scoring-opic)
2008-06-13 14:14:09,763 INFO  PluginRepository -        CyberNeko HTML Parser
(lib-nekohtml)
2008-06-13 14:14:09,763 INFO  PluginRepository -        JavaScript Parser
(parse-js)
2008-06-13 14:14:09,764 INFO  PluginRepository -        URL Query Filter
(query-url)
2008-06-13 14:14:09,764 INFO  PluginRepository -        Regex URL Filter 
Framework
(lib-regex-filter)
2008-06-13 14:14:09,764 INFO  PluginRepository - Registered
Extension-Points:
2008-06-13 14:14:09,764 INFO  PluginRepository -        Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
2008-06-13 14:14:09,764 INFO  PluginRepository -        Nutch URL Normalizer
(org.apache.nutch.net.URLNormalizer)
2008-06-13 14:14:09,764 INFO  PluginRepository -        Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2008-06-13 14:14:09,764 INFO  PluginRepository -        Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
2008-06-13 14:14:09,764 INFO  PluginRepository -        Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2008-06-13 14:14:09,764 INFO  PluginRepository -        Nutch Indexing Filter
(org.apache.nutch.indexer.IndexingFilter)
2008-06-13 14:14:09,764 INFO  PluginRepository -        Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2008-06-13 14:14:09,764 INFO  PluginRepository -        HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
2008-06-13 14:14:09,765 INFO  PluginRepository -        Nutch Content Parser
(org.apache.nutch.parse.Parser)
2008-06-13 14:14:09,765 INFO  PluginRepository -        Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2008-06-13 14:14:09,765 INFO  PluginRepository -        Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
2008-06-13 14:14:09,765 INFO  PluginRepository -        Ontology Model Loader
(org.apache.nutch.ontology.Ontology)
2008-06-13 14:14:09,774 INFO  NutchBean - creating new bean
2008-06-13 14:14:09,791 INFO  NutchBean - opening indexes in
/opt/nutch-0.9/crawl/segments/indexes
2008-06-13 14:14:09,833 INFO  Configuration - found resource
common-terms.utf8 at
file:/opt/apache-tomcat-5.5.16/webapps/nutch-0.9/WEB-INF/classes/common-terms.utf8
2008-06-13 14:14:09,839 INFO  NutchBean - opening segments in
/opt/nutch-0.9/crawl/segments/segments
2008-06-13 14:14:09,851 INFO  SummarizerFactory - Using the first summarizer
extension found: Basic Summarizer
2008-06-13 14:14:09,851 INFO  NutchBean - opening linkdb in
/opt/nutch-0.9/crawl/segments/linkdb
2008-06-13 14:14:09,858 INFO  NutchBean - query request from 127.0.0.1
2008-06-13 14:14:09,870 INFO  NutchBean - query: bananas
2008-06-13 14:14:09,871 INFO  NutchBean - lang: en
2008-06-13 14:14:09,901 INFO  NutchBean - searching for 20 raw hits
2008-06-13 14:14:09,942 INFO  NutchBean - total hits: 0
2008-06-13 14:15:16,336 INFO  NutchBean - query request from 127.0.0.1
2008-06-13 14:15:16,337 INFO  NutchBean - query: HTTP Status 500 root cause
2008-06-13 14:15:16,338 INFO  NutchBean - lang: en
2008-06-13 14:15:16,341 INFO  NutchBean - searching for 20 raw hits
2008-06-13 14:15:16,353 INFO  NutchBean - total hits: 0


-- 
View this message in context: 
http://www.nabble.com/Please-help-me-find-my-mistake--Searching-tp17830512p17830512.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to