2009/8/24 Hannu Väisänen <[email protected]> > I am using Nutch to index some directories on my hard disk. It used to > work but now Nutch rejects all files. > > File logs/hadoop.log has this > > DEBUG crawl.Generator - -shouldFetch rejected [file name here] > fetchTime=1253697537652, curTime=1251105859942 > > for every file in directories I want to index. > > > How can I start to debug the problem? >
fetchTime is ahead of curTime, that's why it is rejected. After fetching a file, nutch sets a next fetch time (i.e. the next time it will fetch the file), and won't fetch it till that time. I would suggest playing around with conf options in nutch-site.xml. Depending on which scheduler you use, you should modify db.fetch.schedule.adaptive.* or db.fetch.interval.default. -- Doğacan Güney
