On Tue, 2007-08-14 at 16:27 -0500, Decibel! wrote: > Isn't this what Grouped Index Tuples is? >
http://community.enterprisedb.com/git/git-readme.txt It looks like GIT is a little different. GIT actually stores a lower-bound key of a contiguous* range of keys that all point to the same page, and for each of those ranges stores a bitmap of page offsets. A search searches first for an exact match in the index, and failing that, looks to see if the previous index tuple happens to be one of these ranges. The algorithm Chris is talking about stores a set of tuple ids (which include page and offset) for each distinct key. Both could be helpful, although I don't think they can work together very well. GIT has the disadvantage that it's "lossy". It doesn't even store every key in the index, so it can't be sure that the match actually is a match. Thus, it returns candidate matches. That also has implications for enforcing UNIQUE (although it's not impossible, according to the readme). However, GIT can be used effectively on an index that happens to be unique. GIT also assumes a tree structure, and makes no sense for a hash index, and makes no sense for a types without ordering. GIT's space savings is dependent on the clustering of the table. Chris's suggestion would work on a UNIQUE index, but would be no help at all, because there would be no duplicate keys to collapse. However, it could be used for non-tree indexes. The space savings for this strategy is dependent on how repetitive the keys are. I guess the ultimate deciding factor is which can save you more space. If you have lots of duplicates, Chris's suggestion might work better, because you don't have to try to maintain cluster order. If you have a wider distribution of data, GIT is probably better, although you have to keep some degree of clustering (HOT may help with that). Heikki is the authority on GIT, so I'm including him in the CC so he can correct me :) Regards, Jeff Davis *: I'm not 100% sure I'm using "contiguous" correctly, but the range of keys can contain gaps or duplicates, so long as every key in the range points to that same page. That is, if keys 1,1,2,3,5 all point to page P, they can be grouped into just "1" so long as there doesn't exist a key 4 that points to a page other than P. ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate