On 09/10/14 10:13, Andres Freund wrote:
On 2014-10-08 22:13:46 +0300, Heikki Linnakangas wrote:
Hmm. So the simple, non-table driven, calculation gives the same result as
using the lookup table using the reflected lookup code. That's expected; the
lookup method is supposed to be the same, just faster. However, using the
"normal" lookup code, but with a "reflected" lookup table, produces the same
result as Postgres' algorithm. Indeed, that's what we do in PostgreSQL. But
AFAICS, that's an incorrect combination. You're supposed to the
non-reflected lookup table with the non-reflected lookup code; you can't mix
and match.

As far as I can tell, PostgreSQL's so-called CRC algorithm doesn't
correspond to any bit-by-bit CRC variant and polynomial. My math skills are
not strong enough to determine what the consequences of that are. It might
still be a decent checksum. Or not. I couldn't tell if the good error
detection properties of the normal CRC-32 polynomial apply to our algorithm
or not.
Additional interesting datapoints are that hstore and ltree contain the
same tables - but properly use the reflected computation.

Thoughts?
It clearly seems like a bad idea to continue with this - I don't think
anybody here knows which guarantees this gives us.

The question is how can we move away from this. There's unfortunately
two places that embed PGC32 that are likely to prove problematic when
fixing the algorithm: pg_trgm and tsgist both seem to include crc's in
their logic in a persistent way.  I think we should provide
INIT/COMP/FIN_PG32 using the current algorithm for these.

If we're switching to a saner computation, we should imo also switch to
a better polynom - CRC-32C has better error detection capabilities than
CRC32 and is available in hardware. As we're paying the price pf
breaking compat anyway...

Arguably we could also say that given that there's been little evident
problems with the borked computation we could also switch to a much
faster hash instead of continuing to use crc...

Greetings,

Andres Freund

Could a 64 bit variant of some kind be useful as an option - in addition to a correct 32 bit?

As most people have 64 bit processors and storage is less constrained now-a-days, as well as we tend to store much larger chunks of data.


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to