On Apr 24, 2008, at 10:43 AM, Bruce Momjian wrote:

Bruce asked if these should be TODOs...

Index compression is possible in many ways, depending upon the
situation. All of the following sound similar at a high level, but each
covers a different use case.

* For Long, Similar data e.g. Text we can use Prefix Compression
We still store one pointer per row, but we reduce the size of the index by reducing the size of the key values. This requires us to reach inside datatypes, so isn't a very general solution but is probably an important
one in the future for Text.

I think what would be even more useful is doing this within the table itself, and then bubbling that up to the index.

* For Unique/nearly-Unique indexes we can use Range Compression
We reduce the size of the index by holding one index pointer per range
of values, thus removing both keys and pointers. It's more efficient
than prefix compression and isn't datatype-dependant.

Definitely.

* For Highly Non-Unique Data we can use Duplicate Compression
The latter is the technique used by Bitmap Indexes. Efficient, but not
useful for unique/nearly-unique data

Also definitely. This would be hugely useful for things like "status" or "type" fields.

* Multi-Column Leading Value Compression - if you have a multi-column
index, then leading columns are usually duplicated between rows inserted at the same time. Using an on-block dictionary we can remove duplicates.
Only useful for multi-column indexes, possibly overlapping/contained
subset of the GIT use case.


Also useful, though I generally try and put the most diverse values first in indexes to increase the odds of them being used. Perhaps if we had compression this would change.
--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828


Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to