At 2015-01-02 16:46:29 +0200, hlinnakan...@vmware.com wrote:
>
> In the slicing-by-8 version, I wonder if it would be better to do
> single-byte loads to c0-c7, instead of two 4-byte loads and shifts.

Nope. I did some tests, and the sb8 code is slightly slower if I remove
the 0-7byte alignment loop, and significantly slower if I switch to one
byte loads for the whole thing. So I think we should leave that part as
it is, but:

> Would it even make sense to keep the crc variable in different byte
> order, and only do the byte-swap once in END_CRC32() ?

…this certainly does make a noticeable difference. Will investigate.

> The comments need some work. I note that there is no mention of the
> slicing-by-8 algorithm anywhere in the comments (in the first patch).

Will fix. (Unfortunately the widely cited original Intel paper about
slice-by-8 seems to have gone AWOL, but I'll find something.)

> Instead of checking for "defined(__GNUC__) || defined(__clang__)",
> should add an explicit configure test for __builtin_bswap32().

Will do.

Thanks again.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to