At 2015-01-02 16:46:29 +0200, hlinnakan...@vmware.com wrote: > > In the slicing-by-8 version, I wonder if it would be better to do > single-byte loads to c0-c7, instead of two 4-byte loads and shifts.
Nope. I did some tests, and the sb8 code is slightly slower if I remove the 0-7byte alignment loop, and significantly slower if I switch to one byte loads for the whole thing. So I think we should leave that part as it is, but: > Would it even make sense to keep the crc variable in different byte > order, and only do the byte-swap once in END_CRC32() ? …this certainly does make a noticeable difference. Will investigate. > The comments need some work. I note that there is no mention of the > slicing-by-8 algorithm anywhere in the comments (in the first patch). Will fix. (Unfortunately the widely cited original Intel paper about slice-by-8 seems to have gone AWOL, but I'll find something.) > Instead of checking for "defined(__GNUC__) || defined(__clang__)", > should add an explicit configure test for __builtin_bswap32(). Will do. Thanks again. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers