Wow! I wish you had been able to respond earlier! I think we can
definitely do some work together. I'm currently approaching the problem
from the other direction. Rather than writing new code that replicates what
htdig already does, I'm recoding parts of htsearch to be more
object-oriented, and was planning on hooking into the classes from XS. It
sounds like you're getting much more low-level than I am right now.
What I've worked on so far is reorganizing the htsearch program (what you're
calling "c" and "d" in your list, I believe), so that the CGI becomes a
wrapper around a C++ class. As you correctly noted, the c and d portions
are intermingled terribly, and not only that, but the CGI is mixed up with
c&d. I had hoped that by breaking the parse, search, and retrieve functions
into distinct classes that my XS code would have very little work to do.
Also future changes to the htdig spec wouldn't require serious rewrites on
my part.
I hadn't looked yet at the possibility of letting perl actually do the
document parsing and insertion of data. I was mainly concentrating on a
search interface, but I'm certainly interested in that!
As guess as far as source, I'll show you mine if you'll show me yours!
Mine's not really ready for even casual viewing yet, because as you stated,
the intermingling is pretty ugly and it's a bitch getting the functionality
separated out into classes.
Having talked to Geof Hutchison about all this, I got a positive response
from him and had already begun the work. I hope you and I can benefit from
each other's efforts! One other thing you might think about is that Geof
has promised the use of loadable modules for document parsing, so we could
probably provide some guidance on how best to allow Perl module developers
to write their parsing modules as loadables. Then you'd be able to hook
your doc parsers right into the current htdig "digger" without having to
write your own "digger". He's also promised URL handlers which would allow
us to define our own handlers for certain "dig" locations, another
possibility for perl, and even for DBI to index databases.
Jamie
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, September 19, 1999 4:41 PM
> To: Tillman, James
> Cc: [EMAIL PROTECTED]
> Subject: [htdig3-dev] Perl interface
>
>
> Tillman, James writes:
> > Is someone already working on a perl interface to
> htsearch? I and a friend
> > of mine are interested in doing the work, but don't want
> to duplicate anyone
> > else's effort.
> >
> > What we really want is an XS module.
> >
>
> Hi,
>
> I'm going to do that, in a way. Let me explain.
>
> In the search/index methods there are a few different levels:
>
> Data:
>
> . The word database
> . The document database
>
> Functions:
>
> a The word insertion/udpate/delete (indexing)
> b The document parsing
> c The search query parsing (building a query syntax tree)
> d The query resolution (using the syntax tree to match words)
> d The information retrieval (given top N matches for a query
> retrieve the relevant document information)
> e The information display
>
> I'm currently working hard on 'a' and will provide a perl XS
> interface
> to it. It will define a set of primitives to access the word
> database.
> I won't do anything (yet) concerning the document database. The next
> step is to implement 'd'. This requires to define the syntax tree. At
> present c/d are intermixed, which is a very confusing thing.
> For one thing
> it prevents easy implementation of a new query syntax. Many
> people would
> love to have AltaVista like syntax :-)
>
> I plan to release 'a' by Wednesday (including unary tests). Being a
> co-author of the Text::Query CPAN module and author of the
> Text::Query-SQL
> CPAN module, I already have a syntax tree structure in mind.
> My idea is to
> be compatible with it in htdig so that Perl interface search
> have the same
> semantic as the htdig C++ search library.
>
> If you could explain what you have in mind and what you
> need, we can work
> together for the time needed to release the beast :-)
>
> Cheers,
>
> --
> Loic Dachary
>
> ECILA
> 100 av. du Gal Leclerc
> 93500 Pantin - France
> Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
> e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/
>
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.