[htdig3-dev] RE: PERL-htsearch

Tillman, James Mon, 6 Dec 1999 03:52:47 -0800
> Just curious...what benefits does a Perl based htsearch provide?

Currently to use htdig from perl you have to make a system call to run
htsearch, get back the output on STDIN and then parse it.  That's what the
currently existing perl-wrapper does.  While this does work fine, there is
always overhead involved in making system calls and the parsing of the
output takes time as well.  What I'm trying to do is give someone the
ability to do this from perlspace:

my $htdig = new HtDig::Searcher();
$htdig->setSearchWords("James Tillman");
my $result_list = $htdig->execute();
foreach my $doc_match ($result_list->all()) {
        print "Document titled " . $doc_match->getTitle() . " matched your
query\n";
}
 

or something similar.  This gives perl a direct link to the htsearch
parsing/querying engine without having to hit the database himself.  This
will be a big plus when htdig is extended to support sql backends, which I'm
pretty sure will happen soon.  The perl interface will require almost no
rewriting to support this because htsearch will still be doing the work.

> What dependency problems are you having?

The problems are mainly to do with the large number of included header files
in the htdig source, many of which have names that create clashes with the
perl XS macros.  There's also a lot of operator overloading going on that
seems to create havoc with the stream classes.  So I can't just say
"#include Searcher.h" and be done with it.

What I was able to do in the last few days is compile htsearch into a
loadable library, which seems to have solved these compile time problems,
but has revealed several run-time problems, such as in-lined member
functions in certain htsearch libraries that make them unreachable through
an object-oriented ".so" library.  So my next step is going to be to query
the mailing list to see what opinions are about making the member functions
that I need into "non-inlined" versions and what sort of performance hit
that will cause.

If you have any suggestions, tips, or complaints about all this mucking
about that I'm doing, let me know.  I'm certainly open to ideas right now,
especially when it comes to solutions to the clashing macros problem.  Using
the include files would be preferable to the loadable library, I think, in
terms of performance and complexity-reduction, although the library does
have a certain appeal when you consider the possibility of easily creating
interfaces for other programming languages like python.

> I assume you've written code to decompress the URLs. That would be 
> handy to have if I try updating the contributed reporting scripts.

I'm not sure what you mean about decompressing the URLs.  I haven't looked
closely yet at the database access portions of htsearch, since my perl
interface is trying very hard to use as much of the current codebase as
possible.  The URLs are coming out of the DocumentRef object uncompressed,
I'm fairly certain.

I have recently wondered what sort of a task it would be to implement a perl
module that bypasses htsearch altogether, and simply recodes the entire
htsearch engine into pure perl and uses DBI to go to the database.  If I
were to end up taking on that very complicated task, I'm certain I'd opt to
do my own pure perl search engine and move away from htdig altogether,
because at that point the only thing we would have in common is the database
and a query syntax!  While this may happen, I'm interested in seeing how
htdig develops first, because it looks like a such a great project with
great programmers working together.

Jamie

> 
>  -Tom
> 
> -- 
> Tom Metro
> Venture Logic                                     [EMAIL PROTECTED]
> Newton, MA, USA
> 

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this.
[htdig3-dev] RE: PERL-htsearch

Reply via email to