Hey,

So here's a very weird observation.

It looks to me like the db.words.db is using only a 'key' value, and has
a blank 'value' for each and every key!

How did I find this?

1.  I indexed a single web-page consisting of the 'Gettysburg Address'
250+ words.
2.  I added printfs to htPack/htUnpack & WordDB:Get & WordDB:Put

This is what is see during and htdig

HtPack []
db->put key=[people] data=[] flags=[0]
HtPack []
db->put key=[people] data=[] flags=[0]
HtPack []
db->put key=[people] data=[] flags=[0]
HtPack []
db->put key=[perish] data=[] flags=[0]
HtPack []
db->put key=[earth] data=[] flags=[0]
HtPack []
db->put key=[abraham] data=[] flags=[0]
HtPack []
db->put key=[lincoln] data=[] flags=[0]

Nothing in the data-value!

This seems to contradict (in spirit) whats in the db.worddump produced by
htdump!

I also downloaded and built the 3.0.55 BDB and used the db_dump utility to
dump the db.words.db.

This is what I get (The first line is a key, the following line is the value):

%db_dump_3055 -pk db.words.db
 people\02\00\00\00\00\c9\00
 \00
 people\02\00\00\00\00\cb\00
 \00
 people\02\00\00\00\00\ce\00
 \00
 perish\02\00\00\00\00\d1\00
 \00
 poor\02\00\00\00\00j\00
 \00
 portion\02\00\00\00\009\00
 \00
 power\02\00\00\00\00k\00
 \00

Note the Zeros in the VALUE!!

Here's the relevant entries in db.worddump

people  2   0   201 0
people  2   0   203 0
people  2   0   206 0
perish  2   0   209 0
poor    2   0   106 0

c9 = 201
cb = 203
ce = 206


This is brain dead for an inverted index!

It should at least be

key = 'people\02', value = '00\00\00\00\c9\00'

A more efficient solution to make the index smaller would be this:

key = 'people\02', value = '00\00\00\00\c9\cb\ce\00'


Eh?

Thanks.

Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485




-------------------------------------------------------
This sf.net emial is sponsored by: Influence the future 
of Java(TM) technology. Join the Java Community 
Process(SM) (JCP(SM)) program now. 
http://ad.doubleclick.net/clk;4699841;7576301;v?http://www.sun.com/javavote
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to