RE: Your professional opinion Please...

Mark C. Roduner, Jr. Tue, 25 Mar 2003 15:45:27 -0800

Brian,
        Here's Some hints on how to accomplish an efficiant way to index
the data


        Regular Expressions:
                ([\w\d]{5,64})  -Matches all Word and Mumeric data in a
given string
        Database
                Tables
                        files   : [int id][char*255 file name]
                                (Propagate This With File Names)
                        word    : [int id][char*64 word]
                                (Propagate This With *Unique* Words)
                        map     : [int id][int word][int files]
                                (Propagate This With `file`.`id`,
`word`.`id` 
                                where `word`.`name` is found in file
named by
                                `file`.`name`)
                Querys
                        To Find a file With given words
                                SELECT `file`.`name` from `file`,
`word`, `map` 
                                where (`word`.`name` IN
('word1','word2', 'word3')) and 
                                (`map`.`word`=`word`.`id` and
`map`.`file`=`file`.`id`)
                                GROUP BY `file`.`name`;
                Room for Improvement
                        Add in a field into the MAP table that gives the
offset 
                        (in words) where the word was found.  This would
prove
                        useful for "Quoted Queries" (ie: Phrase
searching).
                        Add a blob segment into the FILE table for
easier access
                        to the data (very optional, _will_ bloat your
database)

If you're willing to pay for it, I'll Write it for you. 
BTW, I recommend JAVA for writing the reader program, much easier and
clean cut to do regular expressions, and PHP (v4.x) for the search
program (easier UI).

Mark C. Roduner, Jr.
Medical Systematics Research 
-----Original Message-----
From: Brian [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 25, 2003 3:12 PM
To: Peter L. Berghold
Cc: MySQL
Subject: Re: Your professional opinion Please...


> On Mon, Mar 24, 2003 at 06:41:07PM -0800, Brian wrote:
> > I have a client with approximately 2 gigabytes of un-indexed
> > document files (includes text and graphics).

> > He wants to be able to enter a few parameters and bring up
> > a list of all...

> If they are flat text files this should not be too big an issue 
> although a very large project nonetheless. Develop an index by yanking

> out keywords of interest and devloping a table to index them either by

> filename title or whatever.

What mechanism do you recommend?

Something in perl, python or php?

Are their prebuilt modules that would develop such an index?

> I'd leave them as flat text files and go from there. If they are 
> adding or removing from the "library" then do a re-index at an 
> interval that makes sense.

Understood - could be done once a night during slow time.

Thanks for the feedback.

Best regards,

Brian



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

RE: Your professional opinion Please...

Reply via email to