On Sat, Oct 18, 2014 at 3:10 AM, Peter Geoghegan <p...@heroku.com> wrote: > So the first eight bytes of the first string is 0x131F1F1B2222221E, > and the second 0x131F1F1B2222220C. The last byte is different.
That's interesting but I think it's mostly a quirk of your example. Afaics the difference is only that the en_US locale ignores punctuation like : and / (or at least treats them as less significant than alphabetic characters). If you had strings that had less punctuation or differences that didn't happen to arrive shortly after the 8-byte boundary then it wouldn't make any difference. And we still have to run strfrm at least once, write out the whole binary blob to memory somewhere and if it spills to disk we still have to write and read much more data. I think recognizing cases where equality is the only thing we're interested in and locale-sensitive sorting isn't necessary and using a memcmp would be a clear win. I'm not immediately clear on what the cleanest way to integrate it would be. A btree opclass function like the cmp function but that doesn't need to be consistent with < and >, only = ? Or perhaps a flag on the btree opclass that indicates that the data types can safely be compared with memcmp when equality is all that's needed? The latter is pretty tempting since it would tell code something interesting about the data type's internal storage that may lead to other optimizations. On the other hand the former is nice in that the operator could maybe handle other cases like padding by doing memcmp on only the significant bits. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers