Re: [Nutch-general] Interrupting a nutch crawl -- or use topN?

Gal Nitzan Sun, 01 Jul 2007 11:32:15 -0700

Hi,

In the past Andrzej put some stuff related to your issue in the Jira. Try to 
look it up there.


Found it :) http://issues.apache.org/jira/browse/NUTCH-368

HTH,

Gal

> -----Original Message-----
> From: Ian Holsman [mailto:[EMAIL PROTECTED]
> Sent: Sunday, July 01, 2007 2:38 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Interrupting a nutch crawl -- or use topN?
>
> Kai_testing Middleton wrote:
> > I am running a nutch crawl of 19 sites.  I wish to let this crawl go for
> about two days then gracefully stop it (I don't expect it to complete by
> then).  Is there a way to do this?  I want it to stop crawling then build
> the lucene index.  Note that I used a simple nutch crawl command, rather
> than the "whole web" crawling methodology:
> >
> > nutch crawl urls.txt -dir /usr/tmp/19sites -depth 10
> >
> I use a iterative approach using a script similar to what Sami blogs
> about here:
> http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-
> with.html
>
> I then issue a crawl of 10,000 URLs at a time, and just repeat the
> process for as long as the window available. because I use solr to store
> the crawl results
> It makes the index available during the crawl window.
>
> but I'm a relative newbie as well, so look forward what the experts say.
>
>
> regards
> Ian



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Interrupting a nutch crawl -- or use topN?

Reply via email to