Crawling all links on a page

Steven Yelton Sat, 14 Jan 2006 06:17:14 -0800

I have a page that has an enormous amount of links on it:


http://www.devdaily.com/unix/man/longlist.shtml

I would like nutch to fetch and index all the pages, but it stops after80 or so. I have made sure that the http.content.limit setting exceedsthe size of the page. I surveyed all the other settings, and don't seeone that seems applicable.

This appears to be the last hurdle for me to replace a proprietarysearch with nutch.


Thanks in advance,
Steven

Crawling all links on a page

Reply via email to