[EMAIL PROTECTED] wrote:
>          Each word of each document has an entry in a B-Tree structured
> DB file. The key is the word + DocId, the data is a WordRecord, i.e.
> encoded DocId + Flags + Anchor + Location.

Yes, this is correct. The DocID in the WordRecord is probably a bit
redundant, but I included it for the time being.

>         If I have 100 documents containing 100 words, I'll have
> 10 000 entries in the B-Tree.

Yes. Interestingly, in the "real-world" tests, this seems to only be
about a 40% increase in filesize from the previous organization that had
an entry for each unique word with a data of a list of weight/DocID
pairs (also compressed).

>         In addition a hint about the semantic of Anchor would be appreciated.

In an HTML document, the nearest <a name> tag is recorded. This allows
the user to pop to the closest section in documents such as this.

-- 
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to