>> Compression doesn't matter, it's whether or not the keys that
>> you insert are sorted. My expectation for a Huffman compression
>> scheme would be that all key comparisons would still have to be
>> done with the uncompressed keys (so that user-specified
>> comparison routines would continue to work).
>
> This is a concern. I've checked how many times the comparison routine
> is called. If it has to uncompress the key for each comparison we will
> either need an uncompression cache or accept to eat a lot of CPU to
> repeateadly uncompress keys.
I agree that this is a possible concern. Maybe we could
instrument Berkeley DB to find out how many times the key
comparison function is called over time, and estimate how
much slower it will be to decode on each comparison. That
should give us a rough idea if it will be a problem or not.
> The straight forward solution to solve both issues ( 1) because
> we definitely want a pages to be ordered the same way keys are, 2) because
> saving internal pages and tree depth speeds search a lot, as you have
> highlighted in a previous mail ) is to db_dump + db_load (taking care
> to compile a db_load with appropriate comparison function).
I'm not sure why you want pages ordered the same way keys are?
I don't see why that's an advantage.
> If we are forced to do that every so often, we partially lose the advantage
> of having a dynamic index. It would be nice to have a daemon process that
> does the job in the background, when the machine is otherwise idle, or
> when a threshold is reached (let's say 40% free space). I understand that
> this may be problematic for balancing the tree, as you pointed out. What
> if we say that balancing must be done only after stoping processes that
> want to write in the db ? It won't be a problem then. Or will it be
> a problem ? For 2) I guess this can be done at any time for leaf nodes,
> just by locking/swaping pages to sort them. Can you confirm ? Or say
> it's stupid ?-)
No, it's reasonable to have a thread that wanders through the
database packing keys. It wouldn't even be that hard to write
one, we've just never bothered.
> Just last word concerning space efficiency. It also matters because
> IO are slow. For instance inserting 20 million entries (1Gb file) in random
> order took 1h on my machine and spent 80% of it's time waiting for IO.
> I had a 50Mb cachsize, enough for all internal pages.
Memory is almost as cheap as disk... ;-)
> I've tried something else today : gzip -v on db files, just to
> check how it compressed. I got 50% compression on db files created
> with keys in random order (60% fill) and 80% compression on db files
> created with sorted keys (99% fill). This makes me come back to the
> idea of compressing pages themselves, transparently from the cache.
> I don't mean we should use adaptative compression, this won't work.
Actually, I would think that adaptive compression will work just
fine if you're compressing page-size quantities.
> But we could use semi-static compression. We could have three frequency
> tables (old, current, new). Pages are marked : using old or current. When
> a page goes from disk to cache it is uncompressed according the the
> frequency table used to compress it. When it goes from cache to disk it
> is compressed using the 'current' frequency table and it's data is used
> to feed the 'new' frequency table. When we have no more 'old' compressed
> pages (a background process can take care of that, at idle time), we remove
> old, rename current to old, copy new to current.
Yuck. This is going to be complex -- do we have any data that
justifies using anything other than static tables? My bet,
based on no data at all is that it's not worth the effort, that
a single, application-specified table gets us most of what can
be gotten, and the application can always figure out a better
table and dump/load to use it. I also note that you just stole
bits from my database page on-disk format, and that's not a nice
thing to do.
> The hard part is not this, however. It's how to modify the Berkeley
> db code to manipulate pages whose size in memory is variable. A simple trick
> would be to assume that we always achieve at least 50% compression. The
> cache would store 4 k pages and return 8 k pages in memory. Of course
> if we compress more than 50% we will only get 50%. If we don't compress
> to 50% for a full page, the cache has to handle the exception. I definitely
> would like to read a paper about file systems that do compression on the fly.
Yeah, this is probably possible.
I have no idea how nasty a solution it would be. The nastiest
part of this is probably that you'd have to encode information
about a page on the on-disk page format, and have the memory pool
know about it. Right now, the memory pool is blissfully ignorant
of on-page formats, and that's a very nice layering.
As you say, the great thing about this is that we could do
serious compression on the database without doing any kind
of repeated de-compression of key/data items.
> The reward is high if we can do that. I fear that the solution of
> compressing individual entries will cost a lot of CPU because each time
> we manipulate an entry we must compress/uncompress. The comparison function
> is the most scary for that mater. But you will probably say that uncompressing
> a full page just to lookup one key costs the same price.
Yes, but the cache had better be working. If the data is so
large that you're doing I/O on every key lookup, you've already
lost the performance contest. So, the decompression in your
scheme is amortized over many lookups, whereas in my Huffman
scheme, it isn't. If you can make your scheme work, it's likely
to outperform mine. :-)
--keith
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.