On Sun, Mar 19, 2017 at 8:40 AM, Greg Stark <st...@mit.edu> wrote:
>> Out of idle curiosity, I decided to generate disassembly of both
>> macaddr_cmp_internal(), and the patch's abbreviated comparator. The
>> former consists of 49 x86-64 instructions at -02 on my machine,
>> totaling 135 bytes of object code. The latter consists of only 10
>> instructions, or 24 bytes of object code.
>
> I wonder if there's something that could be optimized out of the
> normal cmp function but we're defeating some compiler optimizations
> with all our casts and aliasing.

There was one shl instruction for every left shift (hibits() or
lowbits() call) that appears in macaddr_cmp_internal(). I suppose that
it's possible that that could have been better optimized on a
big-endian machine, where abbreviated keys do not need to be
byteswaped to make the abbreviated comparator work. Perhaps the
compiler could have recognized that macaddr is a struct that consists
of 6 unsigned bytes as digits.

One thing that I've noticed makes a relatively big difference to
instruction count in comparators is varlena overhead, which does come
up here, since macaddr is a type that doesn't have a varlena header
(it was recently suggested by Tom that this is a mistake on practical
grounds, though). I've informally considered the possibility of
providing alternative versions of comparators that do not detoast or
work with anything other than 1-byte header varlenas, because
tuplesort has detected that that happens to be generally safe. I doubt
that I'll ever get around to posting a patch to do that, since the
cost savings are probably still marginal. I could probably find
something better to work on.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to