Peter Geoghegan <p...@heroku.com> writes: > On more occasions than I care to recall, someone has suggested that it > would be valuable to do something with strxfrm() blobs in order to > have cheaper locale-aware text comparisons. One obvious place to do so > would be in indexes, but in the past that has been dismissed on the > following grounds:
> * Index-only scans need fully formed datums to work, and strxfrm() is > a one-way function (or so I believe). There is no reason to think that > the original string can be reassembled from the blob, so clearly that > won't fly. > * There is a high cost to be paid in storage overhead. Even for > collations like "en_US.UTF-8", that can mean that the blob is as much > as 3-4 times larger than the original text string. Who is to say that > we'll come out ahead even with the savings of just doing a strcmp() > rather than a strcoll()? Quite aside from the index bloat risk, this effect means a 3-4x reduction in the maximum string length that can be indexed before getting the dreaded "Values larger than 1/3 of a buffer page cannot be indexed" error. Worse, a value insertion might well succeed, with the failure happening only (much?) later when that entry is chosen as a page split boundary. It's possible that TOAST compression of the strings would save you, but I'm not convinced of that; it certainly doesn't seem like we could guarantee no post-insertion failures that way. Also, detoasting of strings that hadn't been long enough to need toasting before could easily eat any savings. > I'm sure anyone that has read this far knows where I'm going with > this: why can't we just have strxfrm() blobs in the inner pages, > implying large savings for a big majority of text comparisons that > service index scans, without bloating the indexes too badly, and > without breaking anything? We only use inner pages to find leaf pages. > They're redundant copies of the data within the index. It's a cute idea though, and perhaps worth pursuing as long as you've got the pitfalls in mind. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers