> Not necessarily. If you're only interested in space savings
> and rarely
> update your databases, then you can delete db.wordlist and
> do what you
> described and you'd probably be OK.
>
> >     Is it true that 3.2 produces indexes 1/2 the size ?
> So, would it be
> > 15% of the total size of the htmls ?
>


        I'd like to know more details on how the inverted text is stored in
Berkely DB (at a higher level, without having to look at the C code).

        For instance, is it keyed by word and the associated value is a list
of document pointers ? Do you use numbers to identify these doc.
pointers ? Do you do any compression of the list, like just storing
the delta value from the previous number/pointer, and then using
variable-bit encoding to represent these deltas ? etc.


> Probably not quite. (And I'm not sure where you got the 1/2
> figure from.)

1.10. What are the practical and/or theoretical limits of ht://Dig?
The 3.2 development code helps with many of these limitations. In
paticular, it generates the databases on the fly, which means you
don't have to sort them before searching. Additionally, the new
databases are compressed significantly, making them usually around 50%
the size of those in previous versions.


marcio


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to