On Thu, Jan 30, 2014 at 8:51 PM, Peter Geoghegan <p...@heroku.com> wrote: > I've done some more digging. It turns out that the 1977 paper "An > Encoding Method for Multifield Sorting and Indexing" describes a > technique that involves concatenating multiple column values and > comparing them using a simple strcmp().
So I thought about it again, and realized that this probably can't be made to work given our present assumptions. Commit 656beff59 added the varstr_cmp() strcmp() tie-breaker. If we cannot trust strcoll() to indicate equality, why should we be able to trust strxfrm() + strcmp() to do so, without an equivalent tie-breaker on the original string? Now, I guess it might be good enough that the comparison in the inner page indicated "equivalence" provided the comparison on leaf pages indicated "equality proper". That seems like the kind of distinction that I'd rather not have to make, though. You could end up with two non-equal but equivalent tuples in an inner page. Seems pretty woolly to me. However, it also occurs to me that strxfrm() blobs have another useful property: We (as, say, the author of an equality operator on text, an operator intended for a btree operator class) *can* trust a strcmp()'s result on blobs, provided the result isn't 0/equal, *even if the blobs are truncated*. So maybe a better scheme, and certainly a simpler one would be to have a pseudo-attribute in inner pages only with, say, the first 8 bytes of a strxfrm() blob formed from the logically-leading text attribute of the same indexTuple. Because we're only doing this on inner pages, there is a very good chance that that will be good enough most of the time. This also allows us to reduce bloat very effectively. As I touched on before, other schemes for other types do seem to suggest themselves. We could support a pseudo leading attribute for numeric too, for example, where we use a 64-bit integer as a proxy for the whole number part (with the implementation having special handling of "out of range" values by means of an internal magic number - as always with the scheme, equality means "inconclusive, ask logically leading attribute"). That could win just by mostly avoiding the serialization overhead. Now, the obvious counter-argument here is to point out that there are worst cases where none of this helps. I suspect that those are so rare in the real world, and the cost of doing all this is so low, that it's still likely to be well worth it for any given use-case at the macro level. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers