On Fri, 2 Feb 2001, tk wrote:

> sorry if these questions have been asked many times before, i tried 
> to find recent mail regarding this but couldn't. oh by the way, i'm
> running snapshot 2001/01/07 of htdig.

OK. There is a fundamental difference between the 3.2 code and the 3.1
code in terms of operation. Versions before 3.2.0b1 had htdig generate an
intermediate form, which needed to be run through htmerge to produce
searchable databases. In the process, htmerge did things like removing bad
urls, etc.

The 3.2 code (as in the snapshots) generate databases that could
potentially be searched directly. In version 3.2.0b1 and 3.2.0b2, you
still needed to run htmerge for the purpose of removing bad URLs--it's
just that you could technically search them anyway.

After 3.2.0b2, the syntax of htmerge changed: it only merges
databases. (Makes more sense to newcomers.) So the new htpurge program
takes over the duties of cleaning out bad URLs since it also will delete
any URLs you specify.

But even so, you will *still* want to run htpurge on your databases since
after a dig, there will assuredly be stubs for unretrieved URLs, documents
marked "noindex", URLs that returned error codes, etc. Purging these will
cut down on the database's internal index and improve
performance--htsearch won't need to filter out "bad" documents before
putting them up.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


_______________________________________________
htdig-general mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to