On 3/6/13 6:34 AM, Heikki Linnakangas wrote:
Another thought is that perhaps something like CRC32C would be faster to calculate on modern hardware, and could be safely truncated to 16-bits using the same technique you're using to truncate the Fletcher's Checksum. Greg's tests showed that the overhead of CRC calculation is significant in some workloads, so it would be good to spend some time to optimize that. It'd be difficult to change the algorithm in a future release without breaking on-disk compatibility, so let's make sure we pick the best one.
Simon sent over his first rev of this using a quick to compute 16 bit checksum as a reasonable trade-off, one that it's possible to do right now. It's not optimal in a few ways, but it catches single bit errors that are missed right now, and Fletcher-16 computes quickly and without a large amount of code. It's worth double-checking that the code is using the best Fletcher-16 approach available. I've started on that, but I'm working on your general performance concerns first, with the implementation that's already there.
From what I've read so far, I think picking Fletcher-16 instead of the main alternative, CRC-16-IBM AKA CRC-16-ANSI, is a reasonable choice. There's a good table showing the main possibilities here at https://en.wikipedia.org/wiki/Cyclic_redundancy_check
One day I hope that in-place upgrade learns how to do page format upgrades, with the sort of background conversion tools and necessary tracking metadata we've discussed for that work. When that day comes, I would expect it to be straightforward to upgrade pages from 16 bit Fletcher checksums to 32 bit CRC-32C ones. Ideally we would be able to jump on the CRC-32C train today, but there's nowhere to put all 32 bits. Using a Fletcher 16 bit checksum for 9.3 doesn't prevent the project from going that way later though, once page header expansion is a solved problem.
The problem with running CRC32C in software is that the standard fast approach uses a "slicing" technique that requires a chunk of pre-computed data be around, a moderately large lookup table. I don't see that there's any advantage to having all that baggage around if you're just going to throw away half of the result anyway. More on CRC32Cs in my next message.
-- Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers