Sleepycat Software writes:
>
> Not in a page, in each key/data pair. Do the compression and
> decompression in the interfaces to Berkeley DB, and leave the
> Btree (and Hash, for that matter) algorithms untouched.
>
Ok I got it this time ;-)
> 1. To use any adaptive compression algorithms, you need more
> than a few bytes to compress, so compressing individual
> key or data items isn't worthwhile.
Agree.
> 2. Since the datastore is page-oriented, the largest chunk we
> can compress is a page (well, we could try and group pages
> and I/O, but that doesn't change the problem much).
Agree.
> 3. Adaptive algorithms can't guarantee any amount of compression,
> and so while a compressed 8K page might only be 2K bytes in
> length, it might be 8191 bytes in length, which means that we
> have variable sized pages that we need to write to the backing
> store.
Neither static ones, BTW, since it depends on the input.
> 4. Variable sized pages are hard.
Ho yes. That why I thought that making the BTree code think that
it has 8k page and leave to the cache to compress it to 4k would be
(maybe) a good solution. The cache should handle an 'overflow' page
if the compression rate does not reduce the content of the page to 4k
or less. Hopefully this case will be rare. Anyway, if we can make
this work we still have fixed size pages. The cache hides the fact
that disk pages are half the size of the memory pages when it compresses
them.
> And, given how fast CPU is compared to disk these days, I think it's
> the right trade-off to make.
We have CPU that can be used for compression, that's for sure (I'm thinking
of the fact that my process manipulating the db spend 80% of the time waiting
for IO, in the worst case). However, having to uncompress a key each time
the comparison function is called freaks me out. This morning I've build
my own comparison function (to handle the key format invented by Geoff
in htdig). I did not care about efficiency and it used 25% of the CPU time of
the process. What will it be if uncompression is involved ?
Cheers,
--
Loic Dachary
ECILA
100 av. du Gal Leclerc
93500 Pantin - France
Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.