Hannu Väisänen wrote:
On Mon, Apr 06, 2009 at 11:18:59PM +0800, yanky young wrote:
Maybe it is about Windows path names and file names.
In Windows, path names and file names can have whitespace.
I am running Linux and I have no whitespace in my file names.
log4j.logger.org.apache.nutch.protocol.file=DEBUG,cmdstdout
This did not show the files Nutch is skipping.
Most likely this is related to the setting db.max.outlinks.per.page. The
default is 1000. In case of file:// URLs this means that directory
listings with more than 1000 entries will be truncated. Solution: simply
increase the limit.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com