On Sat, Apr 13, 2013 at 5:58 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Andres Freund <and...@2ndquadrant.com> writes: >> On 2013-04-13 09:14:26 -0400, Bruce Momjian wrote: >>> As I understand it, SIMD is just a CPU-optimized method for producing a >>> CRC checksum. Is that right? Does it produce the same result as a >>> non-CPU-optimized CRC calculation? > >> No we are talking about a different algorithm that results in different >> results, thats why its important to choose now since we can't change it >> later without breaking pg_upgrade in further releases. >> http://en.wikipedia.org/wiki/SIMD_%28hash_function%29 > > [ squint... ] We're talking about a *cryptographic* hash function? > Why in the world was this considered a good idea for page checksums? > > In the first place, it's probably not very fast compared to some > alternatives, and in the second place, the criteria by which people > would consider it a good crypto hash function have approximately nothing > to do with what we need for a checksum function. What we want for a > checksum function is high probability of detection of common hardware > failure modes, such as burst errors and all-zeroes. This is > particularly critical when we're going with only a 16-bit checksum --- > the probabilities need to be skewed in the right direction, or it's not > going to be all that terribly useful. > > CRCs are known to be good for that sort of thing; it's what they were > designed for. I'd like to see some evidence that any substitute > algorithm has similar properties. Without that, I'm going to vote > against this idea.
Sorry for creating confusion here by playing fast and loose with the terminology. We are not talking about that hash function at all. What we are talking about here is Fowler-Noll-Vo-ish (http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function) hash function that is restructured to be parallelisable with SIMD instructions with the explicit goal of being as fast as possible. The resulting hash function is roughly two orders of magnitude faster than 1-byte-at-a-time CRC32 currently in use. Performance is about comparable with optimized fixed size memcpy running in cache. Based on current analysis, it is particularly good at detecting single bit errors, as good at detecting burst errors as can be expected from 16 bits and not horrible at detecting burst writes of zeroes. It is quite bad at detecting multiple uncorrelated single bit errors and extremely bad at detecting repeating patterns of errors in low order bits. All in all I would say that the performance is worth the loss in detection capability as we are not talking about using the checksum to prove correctness. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers