I have no experience with htdig, other than using it, but I know a bit
about databases. :)

What about something like this:

CREATE table words (
        wordID int PRIMARY KEY,      // a unique id for each word
        word varchar(<wordlength>)   // the words
);

CREATE table references (
        wordID int,                  // refernence to the word table
        docID int,                   // reference to the document
        location int,                // position within the document
);

There would be no primary keys in the references table, but you could
create keys (or indices) on the wordID and docID columns.

Searching algorithms seem pretty straightfoward now:

- Doing a quick search on the words table will tell you if those words even
exist.
- For a phrase search, you'd check that the first word has a location of x,
that the second word has a location of x+1, etc., all with the same docID.
- For a near search, you'd check that the first word has a location of x,
that the second word has a location between x-5 and x+5 (or however close),
both with the same docID.
- For a before/after search, you'd check that the first word has a location
of x, that the second word has a location less than or greater than x.

Hope these have been useful comments.

.........................................................................
Colin Viebrock           Creative Director - Private World Communciations
[EMAIL PROTECTED]                                331 - 67 Mowat Avenue
http://www.privateworld.com             Toronto, Ontario, CANADA, M6K 3E3
ICQ: 11386088

                                           "Duct tape is like the force.
                                   It has a light side, and a dark side,
                                    and it holds the universe together."
                                                          - Carl Zwanzig
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to