Re: Nutch can't find all files

Hannu Väisänen Wed, 08 Apr 2009 21:43:13 -0700

On Wed, Apr 08, 2009 at 08:54:37AM +0200, Andrzej Bialecki wrote:
> Most likely this is related to the setting db.max.outlinks.per.page. The  
> default is 1000. In case of file:// URLs this means that directory  
> listings with more than 1000 entries will be truncated. Solution: simply  
> increase the limit.


That helped a little. Now Nutch is fetching more files but it is still
skipping files.

I have more questions.

How does Nutch select the files it fetches?

Is it reading every file name in a directory and then selecting what it
fetches?

Is it possible to output the file names Nutch consideres for fetching?

Where do I look in the code? (-:

Re: Nutch can't find all files

Reply via email to