Your original message got through the first time, but your email bounced.

I think what you are looking for is called mifluz and is the indexing
library that htdig uses. The link is http://www.gnu.org/software/mifluz/ .

If you develop any kind of bindings to use mifluz to index a mysql database
let me know I would definitly be interested.

ryan

----- Original Message ----- 
From: "Christian Jaeger" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, September 14, 2001 12:43 AM
Subject: Fulltext indexing libraries (perl/C/C++)


> Hello
> 
> [ It seems the post didn't make it through the first time ]
> 
> While programming a journal in perl/axkit I realize that the problems 
> of both creating useful indexes for searching content efficiently and 
> parse user input and create the right sql queries from it are sooo 
> common that there *must* be some good library already. :-) So I 
> headed over to CPAN, but didn't really find what I was looking for.
> 
> It should create indexes that are efficiently searchable in mysql, 
> i.e. only <select ... where .. like "abcd%"> queries, not "%abc%". 
> Allow to search for word parts (i.e. find "fulltext" when entering 
> "text"). Allow for multiple form fields (i.e. one field for title 
> words, one for author names, etc.) at once. Preferably allow for some 
> sort of query rules (AND/NOT/OR or something).
> Preferably do some relevance sorting. Preferably allow to hook some 
> numbers (link or access counts etc) into the relevance sorting.
> 
> I think there are 3 tough parts which are needed:
> 1. creation of sophisticated index structures (inverted indexes)
> 2. somehow recognize sub-word boundaries to split words on. Maybe use 
> some form of thesaurus? Or syllables? (I suspect it should be the 
> same rules as for splitting words on line boundaries)
> 3. user input parser / query creator
> 
> Why not:
> 
> - use mysql's fulltext indexes? Because I think that currently they 
> are too limited (i.e. see user comments about them 
> www.mysql.com/doc/) (should be better in mysql-4, I read, but we need 
> it in a few weeks already...). And they are also not supported in 
> Innodb which we want to use.
> 
> - use indexing robots? Because we work with XML documents, and would 
> like to both keep the index up to date immediately, as well as split 
> the XML contents into several parts (i.e. there's a title, byline, 
> etcetc, which should be searchable or weigted differently). We want a 
> *library*, not a finished product.
> 
> There's Lucene (www.lucene.com) in Java that I think does exactly 
> what I want. Anyone who helps me port that to perl or 
> C(++)/perl-bindings (-; ? (It should be ready in a few weeks, and 
> it's about 500k source code :-().
> 
> (Something in C/C++ that would be loaded as UDF or so would be nice 
> too, but as I understand (from recent discussion about stored 
> procedures) it's not possible since these UDF's would have to start 
> other queries (i.e. to insert each word fragment into an index 
> table).)
> 
> Like Daniel Gardner has pointed out to me, one could maybe use 
> Search::InvertedIndex as a basis and complement it with Lingua::Stem 
> (only english) or Text::German (german) (both seem to be quite 
> imperfect tough) or with some word list processing. (I don't 
> understand Search::InvertedIndex enough yet.) I think it would still 
> be much work.
> 
> 
> Has someone finished something like this? More info about mysql4?
> 
> Thx
> Christian.
> 
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
> 
> To request this thread, e-mail <[EMAIL PROTECTED]>
> To unsubscribe, e-mail <[EMAIL PROTECTED]>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
> 


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to