2009/8/24 Hannu Väisänen <[email protected]>

> I am using Nutch to index some directories on my hard disk. It used to
> work but now Nutch rejects all files.
>
> File logs/hadoop.log has this
>
> DEBUG crawl.Generator - -shouldFetch rejected [file name here]
> fetchTime=1253697537652, curTime=1251105859942
>
> for every file in directories I want to index.
>
>
> How can I start to debug the problem?
>

fetchTime is ahead of curTime, that's why it is rejected. After fetching a
file, nutch sets a next fetch time (i.e. the next time it will fetch the
file), and won't fetch it till that time.
I would suggest playing around with conf options in nutch-site.xml.
Depending on which scheduler you use, you should modify
db.fetch.schedule.adaptive.* or db.fetch.interval.default.

-- 
Doğacan Güney

Reply via email to