For debugging purposes, could you re-fetch that segment or at least create a
small new segment and fetch it under Linux?
I want to see if you can get search results from it or not. It might help us
determine if its a problem with Nutch, or something else more specific.
----- Original Message ----
From: kan001 <[EMAIL PROTECTED]>
To: [email protected]
Sent: Tuesday, March 6, 2007 11:05:04 AM
Subject: Re: [SOLVED] moving crawled db from windows to linux
I have crawled in windows and searched with tomcat that is installed in
windows. It is working perfectly fine.
Then I moved the same crawled directory and files to linux and searche with
the tomcat that is installed in that linux machine. It is giving 0 hits. I
have changed the searcher.dir property and I think it is connecting. Because
in the logs, the following statements have been printed... Any idea??
INFO [TP-Processor1] (NutchBean.java:69) - creating new bean
INFO [TP-Processor1] (NutchBean.java:121) - opening indexes in
/home/nutch-0.8/crawl/indexes
INFO [TP-Processor1] (Configuration.java:360) - found resource
common-terms.utf8 at
file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8
INFO [TP-Processor1] (NutchBean.java:143) - opening segments in
/home/nutch-0.8/crawl/segments
INFO [TP-Processor1] (SummarizerFactory.java:52) - Using the first
summarizer extension found: Basic Summarizer
INFO [TP-Processor1] (NutchBean.java:154) - opening linkdb in
/home/nutch-0.8/crawl/linkdb
INFO [TP-Processor1] (search_jsp.java:108) - query request from
192.168.1.64
INFO [TP-Processor1] (search_jsp.java:151) - query:
INFO [TP-Processor1] (search_jsp.java:152) - lang:
INFO [TP-Processor1] (NutchBean.java:247) - searching for 20 raw hits
INFO [TP-Processor1] (search_jsp.java:337) - total hits: 0
INFO [TP-Processor5] (search_jsp.java:108) - query request from
192.168.1.64
INFO [TP-Processor5] (search_jsp.java:151) - query: ads
INFO [TP-Processor5] (search_jsp.java:152) - lang: en
INFO [TP-Processor5] (NutchBean.java:247) - searching for 20 raw hits
INFO [TP-Processor5] (search_jsp.java:337) - total hits: 0
Sean Dean-3 wrote:
>
> Everything looks okay in terms of the files.
>
> When you copied everything over from windows, other then the operating
> system is there anything different with the software?
>
> Maybe you have an old windows style path somewhere (C:\Nutch\Crawl)? Also
> double check to see if your "searcher.dir" property inside your
> nutch-site.xml file is correct.
>
>
> ----- Original Message ----
> From: kan001 <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Monday, March 5, 2007 11:48:56 PM
> Subject: Re: [SOLVED] moving crawled db from windows to linux
>
>
> Thanks for the immediate reply.
>
> please find the result from du -h crawl/ command and the logs below:
> 32K crawl/crawldb/current/part-00000
> 36K crawl/crawldb/current
> 40K crawl/crawldb
> 120K crawl/index
> 128K crawl/indexes/part-00000
> 132K crawl/indexes
> 52K crawl/linkdb/current/part-00000
> 56K crawl/linkdb/current
> 60K crawl/linkdb
> 40K crawl/segments/20070228143239/content/part-00000
> 44K crawl/segments/20070228143239/content
> 20K crawl/segments/20070228143239/crawl_fetch/part-00000
> 24K crawl/segments/20070228143239/crawl_fetch
> 12K crawl/segments/20070228143239/crawl_generate
> 12K crawl/segments/20070228143239/crawl_parse
> 20K crawl/segments/20070228143239/parse_data/part-00000
> 24K crawl/segments/20070228143239/parse_data
> 24K crawl/segments/20070228143239/parse_text/part-00000
> 28K crawl/segments/20070228143239/parse_text
> 148K crawl/segments/20070228143239
> 136K crawl/segments/20070228143249/content/part-00000
> 140K crawl/segments/20070228143249/content
> 20K crawl/segments/20070228143249/crawl_fetch/part-00000
> 24K crawl/segments/20070228143249/crawl_fetch
> 12K crawl/segments/20070228143249/crawl_generate
> 28K crawl/segments/20070228143249/crawl_parse
> 32K crawl/segments/20070228143249/parse_data/part-00000
> 36K crawl/segments/20070228143249/parse_data
> 44K crawl/segments/20070228143249/parse_text/part-00000
> 48K crawl/segments/20070228143249/parse_text
> 292K crawl/segments/20070228143249
> 20K crawl/segments/20070228143327/content/part-00000
> 24K crawl/segments/20070228143327/content
> 20K crawl/segments/20070228143327/crawl_fetch/part-00000
> 24K crawl/segments/20070228143327/crawl_fetch
> 16K crawl/segments/20070228143327/crawl_generate
> 12K crawl/segments/20070228143327/crawl_parse
> 20K crawl/segments/20070228143327/parse_data/part-00000
> 24K crawl/segments/20070228143327/parse_data
> 20K crawl/segments/20070228143327/parse_text/part-00000
> 24K crawl/segments/20070228143327/parse_text
> 128K crawl/segments/20070228143327
> 20K crawl/segments/20070228143434/content/part-00000
> 24K crawl/segments/20070228143434/content
> 20K crawl/segments/20070228143434/crawl_fetch/part-00000
> 24K crawl/segments/20070228143434/crawl_fetch
> 16K crawl/segments/20070228143434/crawl_generate
> 12K crawl/segments/20070228143434/crawl_parse
> 20K crawl/segments/20070228143434/parse_data/part-00000
> 24K crawl/segments/20070228143434/parse_data
> 20K crawl/segments/20070228143434/parse_text/part-00000
> 24K crawl/segments/20070228143434/parse_text
> 128K crawl/segments/20070228143434
> 700K crawl/segments
> 1.1M crawl/
>
> INFO [TP-Processor1] (Configuration.java:397) - parsing
> jar:file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/lib/hadoop-0.4.0.jar!/hadoop-default.xml
> INFO [TP-Processor1] (Configuration.java:397) - parsing
> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/nutch-default.xml
> INFO [TP-Processor1] (Configuration.java:397) - parsing
> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/nutch-site.xml
> INFO [TP-Processor1] (Configuration.java:397) - parsing
> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/hadoop-site.xml
> INFO [TP-Processor1] (PluginManifestParser.java:81) - Plugins: looking in:
> /usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/plugins
> INFO [TP-Processor1] (PluginRepository.java:333) - Plugin Auto-activation
> mode: [true]
> INFO [TP-Processor1] (PluginRepository.java:334) - Registered Plugins:
> INFO [TP-Processor1] (PluginRepository.java:341) - CyberNeko HTML
> Parser (lib-nekohtml)
> INFO [TP-Processor1] (PluginRepository.java:341) - Site Query Filter
> (query-site)
> INFO [TP-Processor1] (PluginRepository.java:341) - Html Parse Plug-in
> (parse-html)
> INFO [TP-Processor1] (PluginRepository.java:341) - Regex URL Filter
> Framework (lib-regex-filter)
> INFO [TP-Processor1] (PluginRepository.java:341) - Basic Indexing
> Filter (index-basic)
> INFO [TP-Processor1] (PluginRepository.java:341) - Basic Summarizer
> Plug-in (summary-basic)
> INFO [TP-Processor1] (PluginRepository.java:341) - Text Parse Plug-in
> (parse-text)
> INFO [TP-Processor1] (PluginRepository.java:341) - JavaScript Parser
> (parse-js)
> INFO [TP-Processor1] (PluginRepository.java:341) - Regex URL Filter
> (urlfilter-regex)
> INFO [TP-Processor1] (PluginRepository.java:341) - Basic Query Filter
> (query-basic)
> INFO [TP-Processor1] (PluginRepository.java:341) - HTTP Framework
> (lib-http)
> INFO [TP-Processor1] (PluginRepository.java:341) - URL Query Filter
> (query-url)
> INFO [TP-Processor1] (PluginRepository.java:341) - Http Protocol
> Plug-in (protocol-http)
> INFO [TP-Processor1] (PluginRepository.java:341) - the nutch core
> extension points (nutch-extensionpoints)
> INFO [TP-Processor1] (PluginRepository.java:341) - OPIC Scoring
> Plug-in
> (scoring-opic)
> INFO [TP-Processor1] (PluginRepository.java:345) - Registered
> Extension-Points:
> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Summarizer
> (org.apache.nutch.searcher.Summarizer)
> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Scoring
> (org.apache.nutch.scoring.ScoringFilter)
> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Protocol
> (org.apache.nutch.protocol.Protocol)
> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch URL Filter
> (org.apache.nutch.net.URLFilter)
> INFO [TP-Processor1] (PluginRepository.java:352) - HTML Parse Filter
> (org.apache.nutch.parse.HtmlParseFilter)
> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Online Search
> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Indexing
> Filter (org.apache.nutch.indexer.IndexingFilter)
> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Content
> Parser
> (org.apache.nutch.parse.Parser)
> INFO [TP-Processor1] (PluginRepository.java:352) - Ontology Model
> Loader (org.apache.nutch.ontology.Ontology)
> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Analysis
> (org.apache.nutch.analysis.NutchAnalyzer)
> INFO [TP-Processor1] (PluginRepository.java:352) - Nutch Query Filter
> (org.apache.nutch.searcher.QueryFilter)
> INFO [TP-Processor1] (NutchBean.java:69) - creating new bean
> INFO [TP-Processor1] (NutchBean.java:121) - opening indexes in
> /home/nutch-0.8/crawl/indexes
> INFO [TP-Processor1] (Configuration.java:360) - found resource
> common-terms.utf8 at
> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8
> INFO [TP-Processor1] (NutchBean.java:143) - opening segments in
> /home/nutch-0.8/crawl/segments
> INFO [TP-Processor1] (SummarizerFactory.java:52) - Using the first
> summarizer extension found: Basic Summarizer
> INFO [TP-Processor1] (NutchBean.java:154) - opening linkdb in
> /home/nutch-0.8/crawl/linkdb
> INFO [TP-Processor1] (search_jsp.java:108) - query request from
> 192.168.1.64
> INFO [TP-Processor1] (search_jsp.java:151) - query:
> INFO [TP-Processor1] (search_jsp.java:152) - lang:
> INFO [TP-Processor1] (NutchBean.java:247) - searching for 20 raw hits
> INFO [TP-Processor1] (search_jsp.java:337) - total hits: 0
>
> INFO [TP-Processor5] (search_jsp.java:108) - query request from
> 192.168.1.64
> INFO [TP-Processor5] (search_jsp.java:151) - query: ads
> INFO [TP-Processor5] (search_jsp.java:152) - lang: en
> INFO [TP-Processor5] (NutchBean.java:247) - searching for 20 raw hits
> INFO [TP-Processor5] (search_jsp.java:337) - total hits: 0
>
>
>
>
>
> kan001 wrote:
>>
>> When I copied crawled db from windows to linux and trying to search
>> through tomcat in linux - it returns 0 hits.
>> But in windows its getting results from search screen. Any idea?? I have
>> given root permissions to the crawled db.
>> In the logs it is showing - oening segments.... But hits 0!!!
>>
>
> --
> View this message in context:
> http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9326034
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
--
View this message in context:
http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9335094
Sent from the Nutch - User mailing list archive at Nabble.com.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general