On 01/01/2015 09:17 AM, Abhijit Menon-Sen wrote:
Hi.

OK, here are the patches with the various suggestions applied.

I found that the alignment didn't seem to make much difference for the
CRC32* instructions, so I changed to process (len/8)*8bytes followed by
(len%8)*1bytes, the way the Linux kernel does.

Ok.

In the slicing-by-8 version, I wonder if it would be better to do single-byte loads to c0-c7, instead of two 4-byte loads and shifts. 4-byte loads are presumably faster than single byte loads, but then you'd avoid the shifts. And then you could go straight into the 8-bytes-at-a-time loop, without the initial single-byte processing to get the start address aligned. (the Linux implementation doesn't do that, so maybe it's a bad idea, but might be worth testing..)

Looking at the Linux implementation, I think it only does the bswap once per call, not inside the hot loop. Would it even make sense to keep the crc variable in different byte order, and only do the byte-swap once in END_CRC32() ?

The comments need some work. I note that there is no mention of the slicing-by-8 algorithm anywhere in the comments (in the first patch).

Instead of checking for "defined(__GNUC__) || defined(__clang__)", should add an explicit configure test for __builtin_bswap32().

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to