Thanks for the description of the prefix situation -- it sounds
wrong to me, I'll try and have someone look at it.  For future
reference, I opened a new Support-Request, #950, to track it.

>> That's hard.  Writing a buffer pool to have variable sized pages is
>> a nasty little problem.  I remember one paper that's probably worth
>> finding: I think it was SOSP 1993, and the work was done at DECSRC.
>> The title was something about compression in a log-structured filesystem.
>> (If you can't find it, let me know, I can probably dig up my copy.)
>
> Never mind. I was already half sure that handling transparent compression
> was beyond the capabilities of the current db. It would certainly be 
> worth reading the code of file systems that compress data on the fly
> (there is one somewhere).

I've never seen any algorithms for applications other than
things like log-structure file systems, and I can't imagine
how those could be reasonably applied to a B+tree datastore.
If anyone knows of a way to compress blocks in a randomly
ordered, a page-oriented cache, I'd sure like to hear about
it.

> Let me see if I understand well. You think that compressing the data
> in a page using Huffman with a static pre-determined frequency table based
> on english characters frequency will be easier.

Not in a page, in each key/data pair.  Do the compression and
decompression in the interfaces to Berkeley DB, and leave the
Btree (and Hash, for that matter) algorithms untouched.

> The reason you think it
> will be easier is that the size of a compressed entry will be the same, 
> regardless of it's context. 

Yes -- the reasoning went something like this:

    1. To use any adaptive compression algorithms, you need more
       than a few bytes to compress, so compressing individual
       key or data items isn't worthwhile.

    2. Since the datastore is page-oriented, the largest chunk we
       can compress is a page (well, we could try and group pages
       and I/O, but that doesn't change the problem much).

    3. Adaptive algorithms can't guarantee any amount of compression,
       and so while a compressed 8K page might only be 2K bytes in
       length, it might be 8191 bytes in length, which means that we
       have variable sized pages that we need to write to the backing
       store.

    4. Variable sized pages are hard.

Since I don't want to try and deal with variable sized pages, consider
compression algorithms that will work on short strings, e.g., Huffman.
In which case we do the compression/decompression at the interface level,
and the B+tree algorithms don't have to be modified, a nice side-effect.

> I agree with you on this point. When we have this functionality it should
> be quite straight forward to also implement a function that builds a 
> frequency table based on the actual content of the data and rebuild the
> db file with this table. It would allow someone to do the following:
>
>       . Build a db file using default frequency table
>       . When the db file is built calculate the actual frequency table
>       . Rebuild the db file from scratch using the actual frequency table

Yes.

> In other words we are likely to have a near optimal compression rate
> if we can recalculate the frequency table from time to time. Of course this
> operation will cost a lot, but the savings will be big. It could be a
> functionality of db_dump and db_load.

Yes.

> Using this technique we can probably expect a 50% compression rate.

I don't know what people get for Huffman encoding of English, but I
wouldn't be surprised.

And, given how fast CPU is compared to disk these days, I think it's
the right trade-off to make.

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic
Sleepycat Software Inc.         [EMAIL PROTECTED]
394 E. Riding Dr.               +1-978-287-4781
Carlisle, MA 01741-1601         http://www.sleepycat.com


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to