[EMAIL PROTECTED] wrote:
> Each word of each document has an entry in a B-Tree structured
> DB file. The key is the word + DocId, the data is a WordRecord, i.e.
> encoded DocId + Flags + Anchor + Location.
Yes, this is correct. The DocID in the WordRecord is probably a bit
redundant, but I included it for the time being.
> If I have 100 documents containing 100 words, I'll have
> 10 000 entries in the B-Tree.
Yes. Interestingly, in the "real-world" tests, this seems to only be
about a 40% increase in filesize from the previous organization that had
an entry for each unique word with a data of a list of weight/DocID
pairs (also compressed).
> In addition a hint about the semantic of Anchor would be appreciated.
In an HTML document, the nearest <a name> tag is recorded. This allows
the user to pop to the closest section in documents such as this.
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.