On 03/04/2026 18:33, Daniel Gustafsson wrote:
The attached rebase with a PG_CONTROL_VERSION bump is what I have staged for
later tonight, submitting here to have the (hopefully) final patch archived as
well as another CFBot run.

A few more small comments, I'm sorry about drip-feeding these:

+/*
+ * launcher_exit
+ *
+ * Internal routine for cleaning up state when the launcher process exits. We
+ * need to clean up the abort flag to ensure that processing started again if
+ * it was previously aborted (note: started again, *not* restarted from where
+ * it left off).
+ */
+static void
+launcher_exit(int code, Datum arg)
+{
+       abort_requested = false;
+
+       if (launcher_running)
+       {
+               LWLockAcquire(DataChecksumsWorkerLock, LW_EXCLUSIVE);
+               launcher_running = false;
+               DataChecksumState->launcher_running = false;
+
+               if (DataChecksumState->worker_running != InvalidPid)
+               {
+                       ereport(LOG,
+                                       errmsg("data checksums launcher exiting 
while worker is still running, signalling worker"));
+                       kill(DataChecksumState->worker_running, SIGTERM);
+               }
+               LWLockRelease(DataChecksumsWorkerLock);
+       }
+
+       /*
+        * If the launcher is exiting before data checksums are enabled then set
+        * the state to off since processing cannot be resumed.
+        */
+       if (DataChecksumsInProgress())
+               SetDataChecksumsOff();
+}

Is there still a race condition if the launcher is killed, it gets here, sends SIGTERM to the worker process, but before the worker process has exited, the user calls pg_enable_data_checksums() again and a new launcher is started? What happens?

+       /*
+        * Is a worker process currently running?  This is set by the worker
+        * launcher when it starts waiting for a worker process to finish.
+        */
+       int                     worker_running;

'worker_running' sounds like a boolean, but it's actually a PID. Especially when 'launcher_running' really is a boolean. Maybe rename to 'worker_pid' or 'worker_running_pid' or 'running_worker_pid' or something.

+bool
+DataChecksumsInProgress(void)
+{
+       return LocalDataChecksumState == PG_DATA_CHECKSUM_INPROGRESS_ON;
+}

Perhaps this should be DataChecksumsInProgressOn()? I saw the caller in launcher_exit() first, and had to look up the implementation to check if it returns true for 'inprogress-off' state.

diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index ff417d5ae3e..fe5d30b4349 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,20 @@
#include "storage/block.h" +/*
+ * Checksum state 0 is used for when data checksums are disabled (OFF).
+ * PG_DATA_CHECKSUM_INPROGRESS_{ON|OFF} defines that data checksums are either
+ * currently being enabled or disabled, and PG_DATA_CHECKSUM_VERSION defines
+ * that data checksums are enabled.
+ */
+typedef enum ChecksumStateType
+{
+       PG_DATA_CHECKSUM_OFF = 0,
+       PG_DATA_CHECKSUM_VERSION,
+       PG_DATA_CHECKSUM_INPROGRESS_OFF,
+       PG_DATA_CHECKSUM_INPROGRESS_ON,
+} ChecksumStateType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.

It'd be good to mention that this value is stored in the control file, so changing it needs a catversion bump. Also it's important that PG_DATA_CHECKSUM_VERSION = 1, for backwards-compatibility in pg_upgrade. I'd suggest assigning explicit values 1, 2, 3, 4 for each of the enum constants, to emphasize that they values are fixed.

There's an #include "storage/bufpage.h" in src/bin/pg_upgrade/controldata.c and src/backend/access/rmgrdesc/xlogdesc.c. I think they should both include "storage/checksum.h" directly instead. And "bufpage.h" probably doesn't need to include "storage/checksum.h".

- Heikki



Reply via email to