Geoff Hutchison <[EMAIL PROTECTED]> writes:
> >If I can get things to work, I'll update those Perl scripts to
> >Berkeley DB2 and submit them to you guys. They appear to be pretty
> They'll need more updating than that. As I mentioned on a few 
> threads, they need some XS magic to hook into the database.
Yup. They need the Perl module that provides the interface to the 
Berkeley DB2 library. That can be obtained here:
http://search.cpan.org/search?dist=BerkeleyDB

and should be easy for any Perl developer to find. I've yet to 
complete the install as the Perl module requires a newer version of 
the Berkeley DB2 library than what was on my target system, so I'll 
have to upgrade the underlying library first. (Supposedly, if I 
upgrade from Berkeley DB2 2.4.14 to 2.7.7, it shouldn't affect already 
built applications, like htdig.)

> But beyond using Berkeley DB2, the code now encodes/compresses URLs 
> as well as excerpts.
Right, good point. I noticed that when browsing through the attributes 
documentation. You use zlib now, but previously you used 
url_part_aliases and common_url_parts as a simple form of compression -
right? Although I didn't see any evidence that the existing Perl 
scripts handled any form of compression. Do they predate all forms of 
compression or are they written with the assumption that compression 
is turned off?

So in a nutshell, how is the compression used? Is it applied to just 
the stored document fragments or to URLs too? So even extracting a 
list of indexed URLs will require using zlib, and to be fully 
accurate, parsing url_part_aliases and common_url_parts in htdig.conf?

Fortunately a Perl interface to zlib is available:
http://search.cpan.org/search?dist=Compress-Zlib

So the hard part will be figuring out how to use the library to undo 
what htdig does.

 -Tom

-- 
Tom Metro
Venture Logic                                     [EMAIL PROTECTED]
Newton, MA, USA


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to