[Nutch-general] Re: New index representation in search results

Stefan Groschupf Tue, 03 Jan 2006 08:23:09 -0800

Add a line into you shell script running your cycle that just touchthe web.xml of the webapp context.

It's not that nice but works.
Stefan

Am 02.01.2006 um 20:13 schrieb Chetan Sahasrabudhe:

 Hello Doug,

At our nutch setup, whenever we update the index with newindex, the

search results are not getting updated accordingly.

Once we restart tomcat the new indices start reflecting in search
results.

I know many of the sites are using nutch in background and I am sure
they don't need to restart the web server to reflect index changes.

Any pointers for avoiding the restart problem ?

Regards
Chetan

-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 03, 2006 12:40 AM
To: [email protected]
Subject: Re: Is any one able to successfully run Distributed Crawl?

Pushpesh Kr. Rajwanshi wrote:

I want to know if anyone is able to successfully run distributedcrawl

on multiple machines involving crawling millions of pages? and how
hard is to do that? Do i just have to do some configuration andset up

or do some implementations also?


I recently performed a four-level deep crawl, starting from urls in
DMOZ, limiting each level to 16M urls.  This ran on 20 machines taking
around 24 hours using about 100Mbit and retrieved around 50M pages.  I

used Nutch unmodified, specifying only a few configurationoptions. So,

yes, it is possible.

Doug




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: New index representation in search results

Reply via email to