> The expected *space* gain is small.  However, wasted space on

 Yes, I focused on space overall saving. I understand saving space in
internal pages gives better performances.

 > There is a default prefix compression routine:
 > If your application-specified prefix compression function does not
 > perform as well as the default one, this would be the expected outcome.

 I was using the default prefix function. 

 > Why is this feature absolutely critical for htdig?

 It is because the approach chosen at present implies to create an entry
for every word occurence in every document. That is, if we have 100 documents
containing 100 words, we will create 10 000 entries in the btree. Each key
is the word + document number. Each existing word will therefore be the 
prefix of a large number of keys. Tests show that it takes 500 Mb to store
11 million entries with the current db implementation. If we assume that
each document contains 100 words in average, this will mean that indexing
100 000 documents will take 500 Mb. It's too much.

 > We're happy to provide snapshots of our source tree, we don't currently
 > export CVS access, although I could probably be talked into doing that
 > in September (we use SCCS internally, but will probably be switching to
 > CVS in late August).

 This is good news :-) Could you tell me where to download the latest
snapshot ? I'll have to discuss with Geoff about the leaf page compression
you suggested. If we take this approach it may be convinient to just
compress the pages (feature of the mpool, for instance). Although this
is a more general solution to the space/time problem, I can't imagine
how hard it is for the buffer pool to manage pages that have a different
size on disk (compressed) and in memory (uncompressed).

 > 
 > I'd be very interested in working with someone to try either of those two
 > approaches.
 > 

 That's good news too :-)

-- 
                Loic Dachary

                ECILA
                100 av. du Gal Leclerc
                93500 Pantin - France
                Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
                e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to