[htdig3-dev] db compression [#948]

loic Sun, 25 Jul 1999 15:07:25 -0700

Sleepycat Software writes:
 > The question that I would suggest asking is to select a Huffman
 > encoding scheme, and then see what percentage of compression
 > you would expect to see on the data that you're storing.  That
 > is completely orthogonal to the fact that those key/data items
 > are stored in a Btree.  Huffman encoding will not affect the
 > Btree storage algorithms at all, and you'll see the same
 > page-fill factors both before and after encoding (see below).

 I should have been more explicit. I've assumed a 50% compression and
stored data that was half the size of the original. I found out (correct
me if I'm wrong) that the overhead of the BTree structure is around 12
bytes per entry, assuming that you have no duplicates and that entry
sizes vary from 8 bytes to 35 bytes with a page size of 4k. 

 > The trick is to insert sorted data.  If you split a full Btree
 > page, half of the original page is put on one page and half on
 > another, giving you a 50% page fill factor.  If you insert
 > sorted data, the Berkeley DB implementation splits in a special
 > way, putting almost all the old data on one page and only a
 > single key on the new page.  From the Berkeley DB documentation:

 Ok, thanks :-) I finally found that today too. I've been misleaded
because I thought I was inserting sorted entries but they were not
(contained a non zero padded number, and the comparison function was
purely lexicographic). 

 > Compression doesn't matter, it's whether or not the keys that
 > you insert are sorted.  My expectation for a Huffman compression
 > scheme would be that all key comparisons would still have to be
 > done with the uncompressed keys (so that user-specified
 > comparison routines would continue to work).

 This is a concern. I've checked how many times the comparison routine
is called. If it has to uncompress the key for each comparison we will
either need an uncompression cache or accept to eat a lot of CPU to 
repeateadly uncompress keys.

 > If you get N% compression from Huffman encoding, that's the
 > level of compression you should see from your data, but the
 > page-fill factor won't change much -- smaller key/data pairs do
 > result in better page-fill factors in Btrees, but not N%!  So,
 > if you do Huffman encoding you get N% plus a little bit due to
 > the side-effect of generally storing smaller objects.

 I fully agree. 

 > There are a lot of other algorithms to keep the page-fill factor
 > in the Btrees higher, mostly involving page-balancing when keys
 > are added or deleted in the tree.  The reason that Berkeley DB
 > does not implement them is because they greatly increase the
 > chance of deadlock during the operation because you're locking
 > logically some number of unrelated leaf pages during an operation.

 Thanks (again) for this hint. You made it very clear why balancing is
not done on the fly. The reiserfs file system does real-time balancing
but the cost is high, indeed. There are two separated issues regarding
space management in Berkeley DB, as far as I understand.
      1) The order of the pages.
         When inserting keys randomly (I just did some test with
         20 million keys), pages are not ordered according to 
         their order of appearance in the tree.
      2) The balance of the tree.
         When inserting keys randomly the tree tend to be deeper
         and contain more internal pages (60% fill).
  The straight forward solution to solve both issues ( 1) because
we definitely want a pages to be ordered the same way keys are, 2) because
saving internal pages and tree depth speeds search a lot, as you have 
highlighted in a previous mail ) is to db_dump + db_load (taking care
to compile a db_load with appropriate comparison function).
  If we are forced to do that every so often, we partially lose the advantage
of having a dynamic index. It would be nice to have a daemon process that
does the job in the background, when the machine is otherwise idle, or
when a threshold is reached (let's say 40% free space). I understand that
this may be problematic for balancing the tree, as you pointed out. What
if we say that balancing must be done only after stoping processes that
want to write in the db ? It won't be a problem then. Or will it be 
a problem ? For 2) I guess this can be done at any time for leaf nodes,
just by locking/swaping pages to sort them. Can you confirm ? Or say
it's stupid ?-)

     Just last word concerning space efficiency. It also matters because
IO are slow. For instance inserting 20 million entries (1Gb file) in random
order took 1h on my machine and spent 80% of it's time waiting for IO.
I had a 50Mb cachsize, enough for all internal pages.

     I've tried something else today : gzip -v on db files, just to 
check how it compressed. I got 50% compression on db files created
with keys in random order (60% fill) and 80% compression on db files
created with sorted keys (99% fill). This makes me come back to the
idea of compressing pages themselves, transparently from the cache.
I don't mean we should use adaptative compression, this won't work.
But we could use semi-static compression. We could have three frequency
tables (old, current, new). Pages are marked : using old or current. When
a page goes from disk to cache it is uncompressed according the the 
frequency table used to compress it. When it goes from cache to disk it
is compressed using the 'current' frequency table and it's data is used
to feed the 'new' frequency table. When we have no more 'old' compressed
pages (a background process can take care of that, at idle time), we remove
old, rename current to old, copy new to current. 
     The hard part is not this, however. It's how to modify the Berkeley
db code to manipulate pages whose size in memory is variable. A simple trick
would be to assume that we always achieve at least 50% compression. The
cache would store 4 k pages and return 8 k pages in memory. Of course
if we compress more than 50% we will only get 50%. If we don't compress
to 50% for a full page, the cache has to handle the exception. I definitely
would like to read a paper about file systems that do compression on the fly.
     The reward is high if we can do that. I fear that the solution of
compressing individual entries will cost a lot of CPU because each time
we manipulate an entry we must compress/uncompress. The comparison function
is the most scary for that mater. But you will probably say that uncompressing
a full page just to lookup one key costs the same price. Let me know if you
have a firm opinion on the subject and I'll start experimenting. I would
like to find out:
     whole page compression : CPU cost, space saving
     individual entries compression : CPU cost, space saving
     (my intuition is that both will cost the same, maybe individual
      entries compression will cost more and that whole page will save
      significantly more space).

     Cheers,

-- 
                Loic Dachary

                ECILA
                100 av. du Gal Leclerc
                93500 Pantin - France
                Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
                e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.
[htdig3-dev] db compression [#948]

Reply via email to