Geoff Hutchison <[EMAIL PROTECTED]> writes:
> >If I can get things to work, I'll update those Perl scripts to
> >Berkeley DB2 and submit them to you guys. They appear to be pretty
> They'll need more updating than that. As I mentioned on a few
> threads, they need some XS magic to hook into the database.
Yup. They need the Perl module that provides the interface to the
Berkeley DB2 library. That can be obtained here:
http://search.cpan.org/search?dist=BerkeleyDB
and should be easy for any Perl developer to find. I've yet to
complete the install as the Perl module requires a newer version of
the Berkeley DB2 library than what was on my target system, so I'll
have to upgrade the underlying library first. (Supposedly, if I
upgrade from Berkeley DB2 2.4.14 to 2.7.7, it shouldn't affect already
built applications, like htdig.)
> But beyond using Berkeley DB2, the code now encodes/compresses URLs
> as well as excerpts.
Right, good point. I noticed that when browsing through the attributes
documentation. You use zlib now, but previously you used
url_part_aliases and common_url_parts as a simple form of compression -
right? Although I didn't see any evidence that the existing Perl
scripts handled any form of compression. Do they predate all forms of
compression or are they written with the assumption that compression
is turned off?
So in a nutshell, how is the compression used? Is it applied to just
the stored document fragments or to URLs too? So even extracting a
list of indexed URLs will require using zlib, and to be fully
accurate, parsing url_part_aliases and common_url_parts in htdig.conf?
Fortunately a Perl interface to zlib is available:
http://search.cpan.org/search?dist=Compress-Zlib
So the hard part will be figuring out how to use the library to undo
what htdig does.
-Tom
--
Tom Metro
Venture Logic [EMAIL PROTECTED]
Newton, MA, USA
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.