Re: [HACKERS] Optimizer on sort aggregate

Greg Stark Sat, 18 Oct 2014 05:29:11 -0700

On Sat, Oct 18, 2014 at 3:10 AM, Peter Geoghegan <[email protected]> wrote:
> So the first eight bytes of the first string is 0x131F1F1B2222221E,
> and the second 0x131F1F1B2222220C. The last byte is different.


That's interesting but I think it's mostly a quirk of your example.
Afaics the difference is only that the en_US locale ignores
punctuation like : and /  (or at least treats them as less significant
than alphabetic characters). If you had strings that had less
punctuation or differences that didn't happen to arrive shortly after
the 8-byte boundary then it wouldn't make any difference.

And we still have to run strfrm at least once, write out the whole
binary blob to memory somewhere and if it spills to disk we still have
to write and read much more data. I think recognizing cases where
equality is the only thing we're interested in and locale-sensitive
sorting isn't necessary and using a memcmp would be a clear win.

I'm not immediately clear on what the cleanest way to integrate it
would be. A btree opclass function like the cmp function but that
doesn't need to be consistent with < and >, only = ? Or perhaps a flag
on the btree opclass that indicates that the data types can safely be
compared with memcmp when equality is all that's needed? The latter is
pretty tempting since it would tell code something interesting about
the data type's internal storage that may lead to other optimizations.
On the other hand the former is nice in that the operator could maybe
handle other cases like padding by doing memcmp on only the
significant bits.



-- 
greg


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizer on sort aggregate

Reply via email to