Sleepycat Software writes:
>
> I'm not sure why you want pages ordered the same way keys are?
> I don't see why that's an advantage.
>
Maybe it's a narrow htdig oriented view. But I think it will be usefull
to others. When doing DB_NEXT on a cursor it goes a lot faster when pages
are ordered as key are because when it goes from one page to another disk
access is reduced. It's related to the idea that internal pages should
be as close as possible to the pages they reference (locality of references).
But this last one is much more complex and I can't see that you win much.
You could say that pages ordering is not a db problem but a file system
problem and that the 'elevator algorithm' that serializes IO to optimize
disk access should take care of that.
I have another idea in mind though. If we are able to reorganize db pages
transparently, we could do something that would greatly speed up the
boostraping and usage of large database. The cache could move at the
beginning (or the end) of the file, according to the frequency of their usage.
This would naturally cluster frequently used pages in the same portion of
the disk and therefore reduce disk head latency when the cache misses. When
restarting applications that have a common usage pattern, they will be able
to restore their cache faster because frequently used pages are close to
each other.
> No, it's reasonable to have a thread that wanders through the
> database packing keys. It wouldn't even be that hard to write
> one, we've just never bothered.
That's a great relief.
> Memory is almost as cheap as disk... ;-)
I'll take that as a half joke. I agree to the extent that for a 150Gb
disk machine you can't expect to have more than 4 gb of RAM on PCs and
64 gb if you by an expensive machine such as a SUN. And I don't want
to assume that we have something else but PC as a standard platform.
> Actually, I would think that adaptive compression will work just
> fine if you're compressing page-size quantities.
Maybe, I'll have to try.
> Yuck. This is going to be complex -- do we have any data that
> justifies using anything other than static tables? My bet,
> based on no data at all is that it's not worth the effort, that
> a single, application-specified table gets us most of what can
> be gotten, and the application can always figure out a better
> table and dump/load to use it. I also note that you just stole
> bits from my database page on-disk format, and that's not a nice
> thing to do.
We are using static tables. The question is, as always, how to avoid
db_dump + db_load. Calculating the 'new' table on the fly is stupid
though. It should be calculated after all pages that have 'old'
compression are upgraded to 'current' compression. I don't understand
what you mean by 'stole bits' ?
> Yeah, this is probably possible.
Ok. I'll try to find out how hard it is then.
> Yes, but the cache had better be working. If the data is so
> large that you're doing I/O on every key lookup, you've already
> lost the performance contest. So, the decompression in your
Something else occured to me. If compressing pages you compress/uncompress
only when an IO is involved. The amount of CPU wasted is therefore
proportional to your disk activity. When the disk is very solicited you
spend a lot of time waiting for IO and this time can be used to compress
pages. If you compress entries you do it all the time, even on highly
frequently used pages.
> scheme, it isn't. If you can make your scheme work, it's likely
> to outperform mine. :-)
You seem skeptical about that. I hope this won't be a dead end.
Cheers,
--
Loic Dachary
ECILA
100 av. du Gal Leclerc
93500 Pantin - France
Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.