On Tue, 21 Mar 2000 [EMAIL PROTECTED] wrote:

>     db.words.db
>     db.docdb
>     db.docs.index
> 
> Presumably, these are in some fairly-standard database format; if I could 
> determine what this is, and obtain field lists, it would be a major step 
> forward.  

You'll be *much* happier parsing db.wordlist for the word database, which
is an ASCII file. You'll also be much happier using the -t flag for htdig
and parsing the resulting db.docs text file.

Both files have records separated by \n characters and fields separated by
tabs with field labels before each field (label:field)

The wordlist format is:
word <tab> i:DocID <tab> l:location <tab> w:weight <tab> c:count <tab> a:anchor

Note that count and anchor are optional and are dropped if they're the
default.

The fields in the db.docs are a bit more complex, but if you're willing to
read the source, they're in DocumentDB.cc under "CreateSearchDB" with the
key fields being the DocID and the URL (the first two).

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to