On 04.03.2013 09:11, Simon Riggs wrote:
On 3 March 2013 18:24, Greg Smith<g...@2ndquadrant.com>  wrote:

The 16-bit checksum feature seems functional, with two sources of overhead.
There's some CPU time burned to compute checksums when pages enter the
system.  And there's extra overhead for WAL logging hint bits.  I'll
quantify both of those better in another message.

It's crunch time. Do you and Jeff believe this patch should be
committed to Postgres core?

Are there objectors?

In addition to my hostility towards this patch in general, there are some specifics in the patch I'd like to raise (read out in a grumpy voice):

If you enable checksums, the free space map never gets updated in a standby. It will slowly drift to be completely out of sync with reality, which could lead to significant slowdown and bloat after failover.

Since the checksums are an all-or-nothing cluster-wide setting, the three extra flags in the page header, PD_CHECKSUMS1, PD_CHECKSUM2 and PD_HEADERCHECK, are not needed. Let's leave them out. That keeps the code simpler, and leaves the bits free for future use. If we want to enable such per-page setting in the future, we can add it later. For a per-relation scheme, they're not needed.

+ * The checksum algorithm is a modified Fletcher 64-bit (which is
+ * order-sensitive). The modification is because, at the end, we have two
+ * 64-bit sums, but we only have room for a 16-bit checksum. So, instead of
+ * using a modulus of 2^32 - 1, we use 2^8 - 1; making it also resemble a
+ * Fletcher 16-bit. We don't use Fletcher 16-bit directly, because processing
+ * single bytes at a time is slower.

How does the error detection rate of this compare with e.g CRC-16? Is there any ill effect from truncating the Fletcher sums like this?

+       /*
+        * Store the sums as bytes in the checksum. We add one to shift the 
range
+        * from 0..255 to 1..256, to make zero invalid for checksum bytes (which
+        * seems wise).
+        */
+       p8Checksum[0] = (sum1 % 255) + 1;
+       p8Checksum[1] = (sum2 % 255) + 1;

That's a bit odd. We don't avoid zero in the WAL crc, and I don't recall seeing that in other checksum implementations either. 16-bits is not very wide for a checksum, and this eats about 1% of the space of valid values.

I can see that it might be a handy debugging aid to avoid 0. But there's probably no need to avoid 0 in both bytes, it seems enough to avoid a completely zero return value.

XLogCheckBuffer() and XLogCheckBufferNeedsBackup() read the page LSN without a lock. That's not atomic, so it could incorrectly determine that a page doesn't need to be backed up. We used to always hold an exclusive lock on the buffer when it's called, which prevents modifications to the LSN, but that's no longer the case.

Shouldn't SetBufferCommitInfoNeedsSave() check the BM_PERMANENT flag? I think it will generate WAL records for unlogged tables as it is.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to