On Saturday, March 23, 2002, at 02:02 PM, sascha mantscheff wrote:
> rather not let htdig spider the pages, but prefer to list them > explicitly in > start_url. > > 1) Is there a limit on the number of urls in the start_url attribute? > What > about the system load when I start htdig with 1,25 mio urls? No, there's no limit. However, one reason to spider pages is that the memory load is lower--you don't have to have all those URLs in an assembled list at once. > 2) How can I prevent htdig to spider any link in general? Is there > something > like an "no-follow" attribute in the config? (I did not find anything > like > it.) I could include a limit_urls list with the same content as the > start_url > list, but this would mean 1,25 mio urls for htdig to parse with each > link. You probably want to see the max_hop_count attribute: http://www.htdig.org/attrs.html#max_hop_count -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

