On Wed, Apr 08, 2009 at 08:54:37AM +0200, Andrzej Bialecki wrote: > Most likely this is related to the setting db.max.outlinks.per.page. The > default is 1000. In case of file:// URLs this means that directory > listings with more than 1000 entries will be truncated. Solution: simply > increase the limit.
That helped a little. Now Nutch is fetching more files but it is still skipping files. I have more questions. How does Nutch select the files it fetches? Is it reading every file name in a directory and then selecting what it fetches? Is it possible to output the file names Nutch consideres for fetching? Where do I look in the code? (-:
