At 7:59 PM -0500 12/3/99, Tom Metro wrote:
> > But beyond using Berkeley DB2, the code now encodes/compresses URLs
> > as well as excerpts.
>Right, good point. I noticed that when browsing through the attributes
>documentation. You use zlib now, but previously you used
>url_part_aliases and common_url_parts as a simple form of compression -
>right? Although I didn't see any evidence that the existing Perl
>scripts handled any form of compression. Do they predate all forms of
>compression or are they written with the assumption that compression
>is turned off?
You don't have that quite right. The existing Perl scripts didn't
handle compression because compression, url_part_aliases, and
common_url_parts all came in after they were written.
But to clarify your point, zlib and u_p_a and c_u_p are used on
different things. The first is used *solely* on document excerpts
(the DocHead field), while the latter two are used on URLs in both
the document database and the document index (the URL->DocID list).
So there are two steps to decoding an entry--first decoding based on
url_part_aliases and common_url_parts, then decompressing the DocHead
field if it's compressed.
-Geoff
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.