I have no experience with htdig, other than using it, but I know a bit
about databases. :)
What about something like this:
CREATE table words (
wordID int PRIMARY KEY, // a unique id for each word
word varchar(<wordlength>) // the words
);
CREATE table references (
wordID int, // refernence to the word table
docID int, // reference to the document
location int, // position within the document
);
There would be no primary keys in the references table, but you could
create keys (or indices) on the wordID and docID columns.
Searching algorithms seem pretty straightfoward now:
- Doing a quick search on the words table will tell you if those words even
exist.
- For a phrase search, you'd check that the first word has a location of x,
that the second word has a location of x+1, etc., all with the same docID.
- For a near search, you'd check that the first word has a location of x,
that the second word has a location between x-5 and x+5 (or however close),
both with the same docID.
- For a before/after search, you'd check that the first word has a location
of x, that the second word has a location less than or greater than x.
Hope these have been useful comments.
.........................................................................
Colin Viebrock Creative Director - Private World Communciations
[EMAIL PROTECTED] 331 - 67 Mowat Avenue
http://www.privateworld.com Toronto, Ontario, CANADA, M6K 3E3
ICQ: 11386088
"Duct tape is like the force.
It has a light side, and a dark side,
and it holds the universe together."
- Carl Zwanzig
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.