Hannu Väisänen wrote:
On Mon, Apr 06, 2009 at 11:18:59PM +0800, yanky young wrote:
Maybe it is about Windows path names and file names.
In Windows, path names and file names can have whitespace.

I am running Linux and I have no whitespace in my file names.


log4j.logger.org.apache.nutch.protocol.file=DEBUG,cmdstdout

This did not show the files Nutch is skipping.

Most likely this is related to the setting db.max.outlinks.per.page. The default is 1000. In case of file:// URLs this means that directory listings with more than 1000 entries will be truncated. Solution: simply increase the limit.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to