Your original message got through the first time, but your email bounced.
I think what you are looking for is called mifluz and is the indexing
library that htdig uses. The link is http://www.gnu.org/software/mifluz/ .
If you develop any kind of bindings to use mifluz to index a mysql database
let me know I would definitly be interested.
ryan
----- Original Message -----
From: "Christian Jaeger" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, September 14, 2001 12:43 AM
Subject: Fulltext indexing libraries (perl/C/C++)
> Hello
>
> [ It seems the post didn't make it through the first time ]
>
> While programming a journal in perl/axkit I realize that the problems
> of both creating useful indexes for searching content efficiently and
> parse user input and create the right sql queries from it are sooo
> common that there *must* be some good library already. :-) So I
> headed over to CPAN, but didn't really find what I was looking for.
>
> It should create indexes that are efficiently searchable in mysql,
> i.e. only <select ... where .. like "abcd%"> queries, not "%abc%".
> Allow to search for word parts (i.e. find "fulltext" when entering
> "text"). Allow for multiple form fields (i.e. one field for title
> words, one for author names, etc.) at once. Preferably allow for some
> sort of query rules (AND/NOT/OR or something).
> Preferably do some relevance sorting. Preferably allow to hook some
> numbers (link or access counts etc) into the relevance sorting.
>
> I think there are 3 tough parts which are needed:
> 1. creation of sophisticated index structures (inverted indexes)
> 2. somehow recognize sub-word boundaries to split words on. Maybe use
> some form of thesaurus? Or syllables? (I suspect it should be the
> same rules as for splitting words on line boundaries)
> 3. user input parser / query creator
>
> Why not:
>
> - use mysql's fulltext indexes? Because I think that currently they
> are too limited (i.e. see user comments about them
> www.mysql.com/doc/) (should be better in mysql-4, I read, but we need
> it in a few weeks already...). And they are also not supported in
> Innodb which we want to use.
>
> - use indexing robots? Because we work with XML documents, and would
> like to both keep the index up to date immediately, as well as split
> the XML contents into several parts (i.e. there's a title, byline,
> etcetc, which should be searchable or weigted differently). We want a
> *library*, not a finished product.
>
> There's Lucene (www.lucene.com) in Java that I think does exactly
> what I want. Anyone who helps me port that to perl or
> C(++)/perl-bindings (-; ? (It should be ready in a few weeks, and
> it's about 500k source code :-().
>
> (Something in C/C++ that would be loaded as UDF or so would be nice
> too, but as I understand (from recent discussion about stored
> procedures) it's not possible since these UDF's would have to start
> other queries (i.e. to insert each word fragment into an index
> table).)
>
> Like Daniel Gardner has pointed out to me, one could maybe use
> Search::InvertedIndex as a basis and complement it with Lingua::Stem
> (only english) or Text::German (german) (both seem to be quite
> imperfect tough) or with some word list processing. (I don't
> understand Search::InvertedIndex enough yet.) I think it would still
> be much work.
>
>
> Has someone finished something like this? More info about mysql4?
>
> Thx
> Christian.
>
> ---------------------------------------------------------------------
> Before posting, please check:
> http://www.mysql.com/manual.php (the manual)
> http://lists.mysql.com/ (the list archive)
>
> To request this thread, e-mail <[EMAIL PROTECTED]>
> To unsubscribe, e-mail <[EMAIL PROTECTED]>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
>
---------------------------------------------------------------------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)
To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php