[HACKERS] GiST: PickSplit and multi-attr indexes

Neil Conway Sun, 14 Nov 2004 01:22:08 -0800

Oleg & Teodor,

If I understand the code correctly, GiST will only pass the first attribute of each index tuple to the user-defined PickSplit method when it wants to split a node. (see circa line 1269 of gist.c)

Is this a wise design decision? Granted, in many situations the first attribute in the index is sufficient to make a reasonable decision about how to divide the node into two halves, but I don't think that is universally true. For example, consider a two column index whose first attribute has a small number of distinct values. It could well be that all the first attribute values in a node-to-be-split would be the same. Only passing the first attribute to PickSplit would result in an essentially random distribution of tuples among the split nodes, rather than allowing the GiST extension to make use of the second attribution to partition the nodes. That's an extreme example, but it is easy to construct more realistic scenarios (basically, any situation in which the cardinality of the first index attribute is low -- a reasonably common occurrence with a multi-column index, I believe).

I'm not sure whether this would be a problem in practice. Speculation: repeated invocations of PickSplit are one of the main factors in deriving the ultimate shape of the GiST tree. Poor distribution of keys resulting from PickSplit would eventually result in unnecessarily loose bounding predicates in internal nodes, which would hurt performance.

Comments welcome,

Neil

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

[HACKERS] GiST: PickSplit and multi-attr indexes

Reply via email to