Brian, Here's Some hints on how to accomplish an efficiant way to index the data
Regular Expressions: ([\w\d]{5,64}) -Matches all Word and Mumeric data in a given string Database Tables files : [int id][char*255 file name] (Propagate This With File Names) word : [int id][char*64 word] (Propagate This With *Unique* Words) map : [int id][int word][int files] (Propagate This With `file`.`id`, `word`.`id` where `word`.`name` is found in file named by `file`.`name`) Querys To Find a file With given words SELECT `file`.`name` from `file`, `word`, `map` where (`word`.`name` IN ('word1','word2', 'word3')) and (`map`.`word`=`word`.`id` and `map`.`file`=`file`.`id`) GROUP BY `file`.`name`; Room for Improvement Add in a field into the MAP table that gives the offset (in words) where the word was found. This would prove useful for "Quoted Queries" (ie: Phrase searching). Add a blob segment into the FILE table for easier access to the data (very optional, _will_ bloat your database) If you're willing to pay for it, I'll Write it for you. BTW, I recommend JAVA for writing the reader program, much easier and clean cut to do regular expressions, and PHP (v4.x) for the search program (easier UI). Mark C. Roduner, Jr. Medical Systematics Research -----Original Message----- From: Brian [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 25, 2003 3:12 PM To: Peter L. Berghold Cc: MySQL Subject: Re: Your professional opinion Please... > On Mon, Mar 24, 2003 at 06:41:07PM -0800, Brian wrote: > > I have a client with approximately 2 gigabytes of un-indexed > > document files (includes text and graphics). > > He wants to be able to enter a few parameters and bring up > > a list of all... > If they are flat text files this should not be too big an issue > although a very large project nonetheless. Develop an index by yanking > out keywords of interest and devloping a table to index them either by > filename title or whatever. What mechanism do you recommend? Something in perl, python or php? Are their prebuilt modules that would develop such an index? > I'd leave them as flat text files and go from there. If they are > adding or removing from the "library" then do a re-index at an > interval that makes sense. Understood - could be done once a night during slow time. Thanks for the feedback. Best regards, Brian -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]