On Thu, Mar 14, 2019 at 03:23:59PM +0100, Magnus Hagander wrote: > Are you suggesting we should support running with a master with checksums > on and a standby with checksums off in the same cluster? That seems.. Very > fragile.
Well, saying that it is supported is a too big term for that. What I am saying is that the problems you are pointing out are not as bad as you seem to mean they are as long as an operator does not copy on-disk pages from one node to the other one. Knowing that checksums apply only to pages flushed on disk on a local node, everything going through WAL for availability is actually able to work fine: - PITR - archive recovery. - streaming replication. Reading the code I understand that. I have as well done some tests with a primary/standby configuration to convince myself, using pgbench on both nodes (read-write for the primary, read-only on the standby), with checkpoint (or restart point) triggered on each node every 20s. If one node has checksum enabled and the other checksum disabled, then I am not seeing any inconsistency. However, anything which does a physical copy of pages could get things easily messed up if one node has checksum disabled and the other enabled. One such tool is pg_rewind. If the promoted standby has checksums disabled (becoming the source), and the old master to rewind has checksums enabled, then the rewind could likely copy pages which have not their checksums set correctly, resulting in incorrect checksums on the old master. So yes, it is easy to mess up things, however this does not apply to all configurations. The suggestion from Christoph to enable checksums on both nodes separately would work, and personally I find the suggestion to update the system ID after enabling or disabling checksums an over-engineered design because of the reasons in the first part of this email (it is technically doable to enable checksums with a minimum downtime and a failover), so my recommendation would be to document that when enabling checksums on one instance in a cluster, it should be applied to all instances as it could cause problems with any tools performing a physical copy of relation files or blocks. -- Michael
signature.asc
Description: PGP signature