Hi, On Thu, 23 Apr 2026 at 15:47, Daniel Gustafsson <[email protected]> wrote:
> While performing extensive post-commit testing using the built-in TAP > tests in > checksum_extended mode, Tomas observed a couple issues. The testing was > done > on multiple machines, ranging from a fast x86 machine to a much slower > rpi4, > but they all performed a very large number of iterations. The combination > of > hardware and the large number of tests executed is likely why some of the > issues went unnoticed. > > Thanks for the patchset. I applied the eight proposed patches on top of current master and did a review/test pass. I do have one concern about the duplicate-launcher race discussed earlier, though. I don't think 0006 fully addresses the duplicate-launcher exit race [1]. I was able to reproduce the issue with a deterministic test using the existing test_checksums delay hooks: delay the initial StartDataChecksumsWorkerLauncher() path so two quick pg_enable_data_checksums() calls can queue two launchers, delay the final transition to on, then fire follow-up pg_enable_data_checksums() calls while the real launcher is still active. The relevant logs from that run: [37940] DEBUG: background worker "datachecksums launcher" started [37941] DEBUG: background worker "datachecksums launcher" started [37940] LOG: enabling data checksums requested, starting data checksum calculation [37941] LOG: background worker "datachecksums launcher" already running, exiting [37940] LOG: initiating data checksum processing in database "postgres" [37952] DEBUG: background worker "datachecksums launcher" started [37952] LOG: enabling data checksums requested, starting data checksum calculation [37952] LOG: initiating data checksum processing in database "postgres" [37952] WARNING: cannot set data checksums to "on", current state is not "inprogress-on", disabling [37989] DEBUG: background worker "datachecksums launcher" started [37989] LOG: enabling data checksums requested, starting data checksum calculation [38006] DEBUG: background worker "datachecksums launcher" started [38006] LOG: enabling data checksums requested, starting data checksum calculation So, at least the duplicate-launcher part is still reproducible after the fixup series. The warning also suggests the state can be perturbed by the duplicate launcher, although in my run the final state still ended up as on. A few other smaller comments/nits: 1. In src/backend/access/transam/xlog.c, the comment in SetDataChecksumsOff() says the branch implies the state is inprogress-on or inprogress-off, but after the patch inprogress-on is handled by the preceding if condition. It looks like this should probably only mention inprogress-off. 2. There is a repeated typo in comments: src/backend/utils/init/postinit.c: "interrups" -> "interrupts" src/backend/postmaster/auxprocess.c: "interrups" -> "interrupts" 3. A few commit messages could use a quick polish pass: 0002: "onine" -> "online" 0004: "Additionelly" -> "Additionally" 0006: "being enabled of disabled" -> "being enabled or disabled" 0006: "change it's operation" -> "change its operation" 0006: "ss now avoided" -> "is now avoided" 4. One minor question on 0006: the runtime cost-parameter update path is only meaningful while checksums are being enabled. That looks fine from the current control flow, but would it be worth adding a short comment or assertion near that update path to make the assumption explicit? Other than the above concern and cleanup items, the approach and behavior looked good to me. Regards, Ayush [1] PostgreSQL: [BUG] Race in online checksums launcher_exit() <https://www.postgresql.org/message-id/CAJTYsWWg6tFrdMhWs5PkwESTNeeUUsMuY17O4UmPPh771c3stA%40mail.gmail.com>
