RE: [htdig3-dev] RE: PERL-htsearch

Tillman, James Tue, 7 Dec 1999 07:53:18 -0800
I realize everyone's busy working on the new beta of htdig, but I need to
solicit some opinions on something.  I've recently modified htsearch in the
non-beta CVS branch to allow for an object oriented access to the parsing
and searching mechanism.  This was mainly with a view to creating a Perl
interface, but it doesn't preclude someone creating their own custom
htsearch program in C++ or anything else capable of using C++ objects.

In creating the Perl interface, I had difficulties using the include files
and decided to create a loadable library called htsearch.so instead.  It
seemed to work fine, but I soon discovered that a lot of the C++ class
definitions have many member function that in-lined, causing them to be
inaccessible from programs accessing these classes through the loadable
module.  So I wanted to ask what the general opinion was on making these
functions non-inlined.  I understand that in-lining methods increases
efficiency in many cases, esp. for simple accessors, so I would expect that
only the items that really needed to be accessed externally should be
modified in this way.  But that's only if general opinion supports this kind
of mucking about with the interfaces at all.  I did try compiling with the
fno-default-inline flag set, which supposedly doesn't inline functions
unless you explicitly state it, but it generated a lot of error messages
that had nothing to do with inlines, so I gave up on that track for lack of
better understanding.

For my own purposes, modifications to certain classes in the htlib and
htcommon library might need to be modified as well, so this extends outside
the htsearch branch of the source code.

What's the opinion on this?  Suggestions for other solutions are also
welcome, of course.  If there's a way to get fno-default-inline to work, I'd
prefer to use that and allow a custom compile of htsearch and its
dependencies through the htdig Makefiles instead, but I'm just not sure how
to get it to work.

See below for a more in-depth explanation of how I got to this point, and
keep in mind what I say about my being new to large-scale development.  I'm
probably only a couple steps above a newbie.

Jamie

----

> Just curious...what benefits does a Perl based htsearch provide?

Currently to use htdig from perl you have to make a system call to run
htsearch, get back the output on STDIN and then parse it.  That's what the
currently existing perl-wrapper does.  While this does work fine, there is
always overhead involved in making system calls and the parsing of the
output takes time as well.  What I'm trying to do is give someone the
ability to do this from perlspace:

my $htdig = new HtDig::Searcher();
$htdig->setSearchWords("James Tillman");
my $result_list = $htdig->execute();
foreach my $doc_match ($result_list->all()) {
        print "Document titled " . $doc_match->getTitle() . " matched your
query\n";
}
 

or something similar.  This gives perl a direct link to the htsearch
parsing/querying engine without having to hit the database himself.  This
will be a big plus when htdig is extended to support sql backends, which I'm
pretty sure will happen soon.  The perl interface will require almost no
rewriting to support this because htsearch will still be doing the work.

> What dependency problems are you having?

The problems are mainly to do with the large number of included header files
in the htdig source, many of which have names that create clashes with the
perl XS macros.  There's also a lot of operator overloading going on that
seems to create havoc with the stream classes.  So I can't just say
"#include Searcher.h" and be done with it.

What I was able to do in the last few days is compile htsearch into a
loadable library, which seems to have solved these compile time problems,
but has revealed several run-time problems, such as in-lined member
functions in certain htsearch libraries that make them unreachable through
an object-oriented ".so" library.  So my next step is going to be to query
the mailing list to see what opinions are about making the member functions
that I need into "non-inlined" versions and what sort of performance hit
that will cause.

If you have any suggestions, tips, or complaints about all this mucking
about that I'm doing, let me know.  I'm certainly open to ideas right now,
especially when it comes to solutions to the clashing macros problem.  Using
the include files would be preferable to the loadable library, I think, in
terms of performance and complexity-reduction, although the library does
have a certain appeal when you consider the possibility of easily creating
interfaces for other programming languages like python.

> I assume you've written code to decompress the URLs. That would be 
> handy to have if I try updating the contributed reporting scripts.

I'm not sure what you mean about decompressing the URLs.  I haven't looked
closely yet at the database access portions of htsearch, since my perl
interface is trying very hard to use as much of the current codebase as
possible.  The URLs are coming out of the DocumentRef object uncompressed,
I'm fairly certain.

I have recently wondered what sort of a task it would be to implement a perl
module that bypasses htsearch altogether, and simply recodes the entire
htsearch engine into pure perl and uses DBI to go to the database.  If I
were to end up taking on that very complicated task, I'm certain I'd opt to
do my own pure perl search engine and move away from htdig altogether,
because at that point the only thing we would have in common is the database
and a query syntax!  While this may happen, I'm interested in seeing how
htdig develops first, because it looks like a such a great project with
great programmers working together.

Jamie

> 
>  -Tom
> 
> -- 
> Tom Metro
> Venture Logic                                     [EMAIL PROTECTED]
> Newton, MA, USA
> 

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this.
RE: [htdig3-dev] RE: PERL-htsearch

Reply via email to