Hi,
I'm about to release a package named mifluz that will contain
a standalone version of the inverted index code of htdig. The idea is
that many applications need to be able to handle scalable inverted
indexes and currently have no low level support for it. Many of them
already have a well defined context for searching/parsing etc.
The package basically contains htlib and htword, some glue and
testing. It contains very basic functionalities and follows the rule
'keep it simple, stupid' :-) I state very clearly in the
documentation that the authors of the package are the htdig group and
myself. If, for some reason, you find that this is not enough, I'll
change it to satisfy the htdig group.
Why do I need that ? Mainly because I maintain a Perl Module named
Catalog that desperatly need full text indexing capabilities that is fast
and scalable. Beside I also have a crawler library that will also use it.
I'd like to merge my crawler functionalities with htdig functionalities and
put everything in the htdig tree. But this is a bit more complex because
my crawler uses a MySQL database at present and does not support all the
functionalities of htdig. Its main advantage is to be able to crawl million
of documents, its drawback is that it's bound to MySQL and has less
functionalities than htdig.
I the following weeks I'll spend most of my time integrating
mifluz with Catalog and the crawler, debugging and benchmarking. It
will be a relief for people tired of my daily commits to the CVS tree
:-) This is a good time because the current development release of
htdig will go to beta testing/functionality freeze next week.
I'm not sure all this is interesting for all of you, hopefully it
was. All the software mentionned in this mail is on http://www.senga.org/.
Cheers,
--
Loic Dachary
ECILA
100 av. du Gal Leclerc
93500 Pantin - France
Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.