On Mon, Aug 24, 2009 at 01:15:53PM +0300, Doğacan Güney wrote: > 2009/8/24 Hannu Väisänen <[email protected]> > > DEBUG crawl.Generator - -shouldFetch rejected [file name here] > > fetchTime=1253697537652, curTime=1251105859942 > > fetchTime is ahead of curTime, that's why it is rejected. > I would suggest playing around with conf options in nutch-site.xml. > Depending on which scheduler you use, you should modify > db.fetch.schedule.adaptive.* or db.fetch.interval.default.
If I use Nutch (version 1.0) to index some directories on my hard disk like this bin/nutch crawl urls -dir crawl -depth 300 >&crawl.log how many times it should fetch a file in one run? If I put db.fetch.interval.default and db.fetch.interval.max to 1 second Nutch seems to fetch files again and again and again... And if the numbers are to big, Nutch rejects all files. Obviously I don't understand how scheduler works. (-:
