On Tue, 2003-03-25 at 18:11, Brian wrote: > > What mechanism do you recommend? > > Something in perl, python or php? >
Well... I tend to be a Perl bigot so I'd choose Perl. I would do a couple of things. 1) I'd develop a list of words to ignore such as "and", "if" ,"but" etc. etc.. This may take time and iterations. 2) Read each file in and split on word boundaries and tally the words that are not in the exclusion list and theoretically what is left will be keywords. 3) Use the number of times that a keyword is found in each flat text file as a "weight" to be used later as a scoring mechanism for the search to determine relevance. 4) Write all this to a table. Once all the documents are scanned THEN build your index. > Are their prebuilt modules that would develop such an index? > I don't know for sure, check CPAN (www.cpan.org) and see. There may well be as I'm sure someone else has had to do this before. -- Peter L. Berghold <[EMAIL PROTECTED]> The New Jersey Bergholds -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]