On 2018-04-06 17:59:28 -0700, Andres Freund wrote: > + /* > + * Create a database list. We don't need to concern ourselves with > + * rebuilding this list during runtime since any database created after > + * this process started will be running with checksums turned on from > the > + * start. > + */ > > Why is this true? What if somebody runs CREATE DATABASE while the > launcher / worker are processing a different database? It'll copy the > template database on the filesystem level, and it very well might not > yet have checksums set? Afaict the second time we go through this list > that's not cought.
*caught It's indeed trivial to reproduce this, just slowing down a checksum run and copying the database yields: ./pg_verify_checksums -D /srv/dev/pgdev-dev pg_verify_checksums: checksum verification failed in file "/srv/dev/pgdev-dev/base/16385/2703", block 0: calculated checksum 45A7 but expected 0 pg_verify_checksums: checksum verification failed in file "/srv/dev/pgdev-dev/base/16385/2703", block 1: calculated checksum 8C7D but expected 0 further complaints: The new isolation test cannot be re-run on an existing cluster. That's because the first test expects isolationtests to be disabled. As even remarked upon: # The checksum_enable suite will enable checksums for the cluster so should # not run before anything expecting the cluster to have checksums turned off How's that ok? You can leave database wide objects around, but the cluster-wide stuff needs to be cleaned up. The tests don't actually make sure that no checksum launcher / apply is running anymore. They just assume that it's gone once the GUC shows checksums have been set. If you wanted to make the tests stable, you'd need to wait for that to show true *and* then check that no workers are around anymore. If it's not obvious: This isn't ready, should be reverted, cleaned up, and re-submitted for v12. Greetings, Andres Freund