Re: [Nutch-general] [SOLVED] moving crawled db from windows to linux

kan001 Thu, 08 Mar 2007 10:44:53 -0800

will update you once I am done with that testing... just stuck there :(



kan001 wrote:
> 
> As my linux server is a virtual dedicated server and more often it goes
> outofmemory error, I wont be able to do fetch there right now. I need to
> upgrade the server or stop all applications running in that and test. This
> will take time. That is why I was trying to fetch from windows and move
> that crawled db into the linux box.
> 
> Thanks for the reponses.
> 
> 
> 
> Sean Dean-3 wrote:
>> 
>> For debugging purposes, could you re-fetch that segment or at least
>> create a small new segment and fetch it under Linux?
>>  
>> I want to see if you can get search results from it or not. It might help
>> us determine if its a problem with Nutch, or something else more
>> specific.
>> 
>> 
>> ----- Original Message ----
>> From: kan001 <[EMAIL PROTECTED]>
>> To: [email protected]
>> Sent: Tuesday, March 6, 2007 11:05:04 AM
>> Subject: Re: [SOLVED] moving crawled db from windows to linux
>> 
>> 
>> I have crawled in windows and searched with tomcat that is installed in
>> windows. It is working perfectly fine.
>> Then I moved the same crawled directory and files to linux and searche
>> with
>> the tomcat that is installed in that linux machine. It is giving 0 hits.
>> I
>> have changed the searcher.dir property and I think it is connecting.
>> Because
>> in the logs, the following statements have been printed... Any idea??
>> 
>> INFO [TP-Processor1] (NutchBean.java:69) - creating new bean
>> INFO [TP-Processor1] (NutchBean.java:121) - opening indexes in
>> /home/nutch-0.8/crawl/indexes
>> INFO [TP-Processor1] (Configuration.java:360) - found resource
>> common-terms.utf8 at
>> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8
>> INFO [TP-Processor1] (NutchBean.java:143) - opening segments in
>> /home/nutch-0.8/crawl/segments
>> INFO [TP-Processor1] (SummarizerFactory.java:52) - Using the first
>> summarizer extension found: Basic Summarizer
>> INFO [TP-Processor1] (NutchBean.java:154) - opening linkdb in
>> /home/nutch-0.8/crawl/linkdb
>> INFO [TP-Processor1] (search_jsp.java:108) - query request from
>> 192.168.1.64
>> INFO [TP-Processor1] (search_jsp.java:151) - query:
>> INFO [TP-Processor1] (search_jsp.java:152) - lang:
>> INFO [TP-Processor1] (NutchBean.java:247) - searching for 20 raw hits
>> INFO [TP-Processor1] (search_jsp.java:337) - total hits: 0
>> 
>> INFO [TP-Processor5] (search_jsp.java:108) - query request from
>> 192.168.1.64
>> INFO [TP-Processor5] (search_jsp.java:151) - query: ads
>> INFO [TP-Processor5] (search_jsp.java:152) - lang: en
>> INFO [TP-Processor5] (NutchBean.java:247) - searching for 20 raw hits
>> INFO [TP-Processor5] (search_jsp.java:337) - total hits: 0
>> 
>> 
>> 
>> 
>> Sean Dean-3 wrote:
>>> 
>>> Everything looks okay in terms of the files.
>>>  
>>> When you copied everything over from windows, other then the operating
>>> system is there anything different with the software?
>>>  
>>> Maybe you have an old windows style path somewhere (C:\Nutch\Crawl)?
>>> Also
>>> double check to see if your "searcher.dir" property inside your
>>> nutch-site.xml file is correct.
>>> 
>>> 
>>> ----- Original Message ----
>>> From: kan001 <[EMAIL PROTECTED]>
>>> To: [email protected]
>>> Sent: Monday, March 5, 2007 11:48:56 PM
>>> Subject: Re: [SOLVED] moving crawled db from windows to linux
>>> 
>>> 
>>> Thanks for the immediate reply.
>>> 
>>> please find the result from du -h crawl/ command  and the logs below:
>>> 32K     crawl/crawldb/current/part-00000
>>> 36K     crawl/crawldb/current
>>> 40K     crawl/crawldb
>>> 120K    crawl/index
>>> 128K    crawl/indexes/part-00000
>>> 132K    crawl/indexes
>>> 52K     crawl/linkdb/current/part-00000
>>> 56K     crawl/linkdb/current
>>> 60K     crawl/linkdb
>>> 40K     crawl/segments/20070228143239/content/part-00000
>>> 44K     crawl/segments/20070228143239/content
>>> 20K     crawl/segments/20070228143239/crawl_fetch/part-00000
>>> 24K     crawl/segments/20070228143239/crawl_fetch
>>> 12K     crawl/segments/20070228143239/crawl_generate
>>> 12K     crawl/segments/20070228143239/crawl_parse
>>> 20K     crawl/segments/20070228143239/parse_data/part-00000
>>> 24K     crawl/segments/20070228143239/parse_data
>>> 24K     crawl/segments/20070228143239/parse_text/part-00000
>>> 28K     crawl/segments/20070228143239/parse_text
>>> 148K    crawl/segments/20070228143239
>>> 136K    crawl/segments/20070228143249/content/part-00000
>>> 140K    crawl/segments/20070228143249/content
>>> 20K     crawl/segments/20070228143249/crawl_fetch/part-00000
>>> 24K     crawl/segments/20070228143249/crawl_fetch
>>> 12K     crawl/segments/20070228143249/crawl_generate
>>> 28K     crawl/segments/20070228143249/crawl_parse
>>> 32K     crawl/segments/20070228143249/parse_data/part-00000
>>> 36K     crawl/segments/20070228143249/parse_data
>>> 44K     crawl/segments/20070228143249/parse_text/part-00000
>>> 48K     crawl/segments/20070228143249/parse_text
>>> 292K    crawl/segments/20070228143249
>>> 20K     crawl/segments/20070228143327/content/part-00000
>>> 24K     crawl/segments/20070228143327/content
>>> 20K     crawl/segments/20070228143327/crawl_fetch/part-00000
>>> 24K     crawl/segments/20070228143327/crawl_fetch
>>> 16K     crawl/segments/20070228143327/crawl_generate
>>> 12K     crawl/segments/20070228143327/crawl_parse
>>> 20K     crawl/segments/20070228143327/parse_data/part-00000
>>> 24K     crawl/segments/20070228143327/parse_data
>>> 20K     crawl/segments/20070228143327/parse_text/part-00000
>>> 24K     crawl/segments/20070228143327/parse_text
>>> 128K    crawl/segments/20070228143327
>>> 20K     crawl/segments/20070228143434/content/part-00000
>>> 24K     crawl/segments/20070228143434/content
>>> 20K     crawl/segments/20070228143434/crawl_fetch/part-00000
>>> 24K     crawl/segments/20070228143434/crawl_fetch
>>> 16K     crawl/segments/20070228143434/crawl_generate
>>> 12K     crawl/segments/20070228143434/crawl_parse
>>> 20K     crawl/segments/20070228143434/parse_data/part-00000
>>> 24K     crawl/segments/20070228143434/parse_data
>>> 20K     crawl/segments/20070228143434/parse_text/part-00000
>>> 24K     crawl/segments/20070228143434/parse_text
>>> 128K    crawl/segments/20070228143434
>>> 700K    crawl/segments
>>> 1.1M    crawl/
>>> 
>>> INFO [TP-Processor1] (Configuration.java:397) - parsing
>>> jar:file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/lib/hadoop-0.4.0.jar!/hadoop-default.xml
>>> INFO [TP-Processor1] (Configuration.java:397) - parsing
>>> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/nutch-default.xml
>>> INFO [TP-Processor1] (Configuration.java:397) - parsing
>>> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/nutch-site.xml
>>> INFO [TP-Processor1] (Configuration.java:397) - parsing
>>> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/hadoop-site.xml
>>> INFO [TP-Processor1] (PluginManifestParser.java:81) - Plugins: looking
>>> in:
>>> /usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/plugins
>>> INFO [TP-Processor1] (PluginRepository.java:333) - Plugin
>>> Auto-activation
>>> mode: [true]
>>> INFO [TP-Processor1] (PluginRepository.java:334) - Registered Plugins:
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     CyberNeko HTML
>>> Parser (lib-nekohtml)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     Site Query Filter
>>> (query-site)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     Html Parse
>>> Plug-in
>>> (parse-html)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     Regex URL Filter
>>> Framework (lib-regex-filter)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     Basic Indexing
>>> Filter (index-basic)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     Basic Summarizer
>>> Plug-in (summary-basic)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     Text Parse
>>> Plug-in
>>> (parse-text)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     JavaScript Parser
>>> (parse-js)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     Regex URL Filter
>>> (urlfilter-regex)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     Basic Query
>>> Filter
>>> (query-basic)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     HTTP Framework
>>> (lib-http)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     URL Query Filter
>>> (query-url)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     Http Protocol
>>> Plug-in (protocol-http)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     the nutch core
>>> extension points (nutch-extensionpoints)
>>> INFO [TP-Processor1] (PluginRepository.java:341) -     OPIC Scoring
>>> Plug-in
>>> (scoring-opic)
>>> INFO [TP-Processor1] (PluginRepository.java:345) - Registered
>>> Extension-Points:
>>> INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Summarizer
>>> (org.apache.nutch.searcher.Summarizer)
>>> INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Scoring
>>> (org.apache.nutch.scoring.ScoringFilter)
>>> INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Protocol
>>> (org.apache.nutch.protocol.Protocol)
>>> INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch URL Filter
>>> (org.apache.nutch.net.URLFilter)
>>> INFO [TP-Processor1] (PluginRepository.java:352) -     HTML Parse Filter
>>> (org.apache.nutch.parse.HtmlParseFilter)
>>> INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Online
>>> Search
>>> Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
>>> INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Indexing
>>> Filter (org.apache.nutch.indexer.IndexingFilter)
>>> INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Content
>>> Parser
>>> (org.apache.nutch.parse.Parser)
>>> INFO [TP-Processor1] (PluginRepository.java:352) -     Ontology Model
>>> Loader (org.apache.nutch.ontology.Ontology)
>>> INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Analysis
>>> (org.apache.nutch.analysis.NutchAnalyzer)
>>> INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Query
>>> Filter
>>> (org.apache.nutch.searcher.QueryFilter)
>>> INFO [TP-Processor1] (NutchBean.java:69) - creating new bean
>>> INFO [TP-Processor1] (NutchBean.java:121) - opening indexes in
>>> /home/nutch-0.8/crawl/indexes
>>> INFO [TP-Processor1] (Configuration.java:360) - found resource
>>> common-terms.utf8 at
>>> file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8
>>> INFO [TP-Processor1] (NutchBean.java:143) - opening segments in
>>> /home/nutch-0.8/crawl/segments
>>> INFO [TP-Processor1] (SummarizerFactory.java:52) - Using the first
>>> summarizer extension found: Basic Summarizer
>>> INFO [TP-Processor1] (NutchBean.java:154) - opening linkdb in
>>> /home/nutch-0.8/crawl/linkdb
>>> INFO [TP-Processor1] (search_jsp.java:108) - query request from
>>> 192.168.1.64
>>> INFO [TP-Processor1] (search_jsp.java:151) - query:
>>> INFO [TP-Processor1] (search_jsp.java:152) - lang:
>>> INFO [TP-Processor1] (NutchBean.java:247) - searching for 20 raw hits
>>> INFO [TP-Processor1] (search_jsp.java:337) - total hits: 0
>>> 
>>> INFO [TP-Processor5] (search_jsp.java:108) - query request from
>>> 192.168.1.64
>>> INFO [TP-Processor5] (search_jsp.java:151) - query: ads
>>> INFO [TP-Processor5] (search_jsp.java:152) - lang: en
>>> INFO [TP-Processor5] (NutchBean.java:247) - searching for 20 raw hits
>>> INFO [TP-Processor5] (search_jsp.java:337) - total hits: 0
>>> 
>>> 
>>> 
>>> 
>>> 
>>> kan001 wrote:
>>>> 
>>>> When I copied crawled db from windows to linux and trying to search
>>>> through tomcat in linux - it returns 0 hits.
>>>> But in windows its getting results from search screen. Any idea?? I
>>>> have
>>>> given root permissions to the crawled db.
>>>> In the logs it is showing - oening segments.... But hits 0!!!
>>>> 
>>> 
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9326034
>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>> 
>> 
>> -- 
>> View this message in context:
>> http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9335094
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9378870
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] [SOLVED] moving crawled db from windows to linux

Reply via email to