Hi, In the past Andrzej put some stuff related to your issue in the Jira. Try to look it up there.
Found it :) http://issues.apache.org/jira/browse/NUTCH-368 HTH, Gal > -----Original Message----- > From: Ian Holsman [mailto:[EMAIL PROTECTED] > Sent: Sunday, July 01, 2007 2:38 AM > To: [EMAIL PROTECTED] > Subject: Re: Interrupting a nutch crawl -- or use topN? > > Kai_testing Middleton wrote: > > I am running a nutch crawl of 19 sites. I wish to let this crawl go for > about two days then gracefully stop it (I don't expect it to complete by > then). Is there a way to do this? I want it to stop crawling then build > the lucene index. Note that I used a simple nutch crawl command, rather > than the "whole web" crawling methodology: > > > > nutch crawl urls.txt -dir /usr/tmp/19sites -depth 10 > > > I use a iterative approach using a script similar to what Sami blogs > about here: > http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch- > with.html > > I then issue a crawl of 10,000 URLs at a time, and just repeat the > process for as long as the window available. because I use solr to store > the crawl results > It makes the index available during the crawl window. > > but I'm a relative newbie as well, so look forward what the experts say. > > > regards > Ian ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
