On Sun, Mar 19, 2017 at 8:40 AM, Greg Stark <st...@mit.edu> wrote: >> Out of idle curiosity, I decided to generate disassembly of both >> macaddr_cmp_internal(), and the patch's abbreviated comparator. The >> former consists of 49 x86-64 instructions at -02 on my machine, >> totaling 135 bytes of object code. The latter consists of only 10 >> instructions, or 24 bytes of object code. > > I wonder if there's something that could be optimized out of the > normal cmp function but we're defeating some compiler optimizations > with all our casts and aliasing.
There was one shl instruction for every left shift (hibits() or lowbits() call) that appears in macaddr_cmp_internal(). I suppose that it's possible that that could have been better optimized on a big-endian machine, where abbreviated keys do not need to be byteswaped to make the abbreviated comparator work. Perhaps the compiler could have recognized that macaddr is a struct that consists of 6 unsigned bytes as digits. One thing that I've noticed makes a relatively big difference to instruction count in comparators is varlena overhead, which does come up here, since macaddr is a type that doesn't have a varlena header (it was recently suggested by Tom that this is a mistake on practical grounds, though). I've informally considered the possibility of providing alternative versions of comparators that do not detoast or work with anything other than 1-byte header varlenas, because tuplesort has detected that that happens to be generally safe. I doubt that I'll ever get around to posting a patch to do that, since the cost savings are probably still marginal. I could probably find something better to work on. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers