Re: shouldFetch rejects all files

Hannu Väisänen Mon, 24 Aug 2009 23:14:46 -0700

On Mon, Aug 24, 2009 at 01:15:53PM +0300, Doğacan Güney wrote:
> 2009/8/24 Hannu Väisänen <[email protected]>
> > DEBUG crawl.Generator - -shouldFetch rejected [file name here]
> > fetchTime=1253697537652, curTime=1251105859942
> 
> fetchTime is ahead of curTime, that's why it is rejected.
> I would suggest playing around with conf options in nutch-site.xml.
> Depending on which scheduler you use, you should modify
> db.fetch.schedule.adaptive.* or db.fetch.interval.default.



If I use Nutch (version 1.0) to index some directories on my hard disk
like this

bin/nutch crawl urls -dir crawl -depth 300 >&crawl.log

how many times it should fetch a file in one run?


If I put db.fetch.interval.default and db.fetch.interval.max to 1
second Nutch seems to fetch files again and again and again... And if
the numbers are to big, Nutch rejects all files.

Obviously I don't understand how scheduler works. (-:

Re: shouldFetch rejects all files

Reply via email to