At 9:18 PM +0100 12/1/01, Roman Maeder wrote: >But the index pages are excluded from the database, so it never checks >those for new links.
Yes, this could be a problem. >Shouldn't it either traverse the document space starting with the >start URLs also for update digs (in the same way it does with an initial >dig), or maybe keep a list of excluced documents and check those as well? You don't want to traverse the document space if you can help it--that would require parsing all the documents again or keeping the entire link structure of the site. The latter may be useful for other purposes, but that could involve some serious additional storage for some sites. As far as keeping a list of excluded documents, this may be the right way to go. Right now htmerge/htpurge completely remove all traces of a document if it was marked "noindex." Probably the solution is to leave the document (still marked "noindex") but make sure all words are removed from the word db. This way it would never come up in a search. This is a very good point you've raised, thanks. -- -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

