On Tue, 2003-03-25 at 18:11, Brian wrote:

> 
> What mechanism do you recommend?
> 
> Something in perl, python or php?
> 

Well... I tend to be a Perl bigot so I'd choose Perl. I would do a
couple of things. 

1) I'd develop a list of words to ignore such as "and", "if" ,"but" etc.
etc..  This may take time and iterations. 

2) Read each file in and split on word boundaries and tally the words
that are not in the exclusion list and theoretically what is left will
be keywords. 

3) Use the number of times that a keyword is found in each flat text
file as a "weight" to be used later as a scoring mechanism for the
search to determine relevance. 

4) Write all this to a table. Once all the documents are scanned THEN 
build your index. 

> Are their prebuilt modules that would develop such an index?
> 

I don't know for sure, check CPAN (www.cpan.org) and see. There may well
be as I'm sure someone else has had to do this before. 


-- 
Peter L. Berghold <[EMAIL PROTECTED]>
The New Jersey Bergholds


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to