It'd be good to print a LOG message when the checksums state changes, to have a trail in the log of when checksums were enabled/disabled. Something like:

LOG:  enabling checksums was requested, starting checksum calculation
...
LOG:  checksums calculation finished, checksums are now enabled


On 02/04/2026 02:01, Daniel Gustafsson wrote:
+               if (result == DATACHECKSUMSWORKER_FAILED)
+               {
+                       /*
+                        * Disable checksums on cluster, because we failed one 
of the
+                        * databases and this is an all or nothing process.
+                        */
+                       SetDataChecksumsOff();
+                       ereport(ERROR,
+                                       errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+                                       errmsg("data checksums failed to get enabled 
in all databases, aborting"),
+                                       errhint("The server log might have more 
information on the cause of the error."));
+               }

This got me thinking, what happens if the the data checksums launcher encounters some other error, for example if you SIGTERM it? The system is left in 'inprogress-on' state, but because the launcher is gone it will never finish and 'pg_stat_progress_data_checksums' will be empty.

Perhaps launcher_exit() should call SetDataChecksumsOff()?

                /*
                 * TODO: how to really handle the worker still running when the
                 * launcher exits?
                 */
                if (DataChecksumsWorkerShmem->worker_running)
                        ereport(LOG,
                                        errmsg("data checksums launcher exiting 
while worker is still running"));

That TODO should be addressed somehow.

+               /*
+                * As of now we only update the block counter for main forks in 
order
+                * to not cause too frequent calls. TODO: investigate whether we
+                * should do it more frequent?
+                */
+               if (forkNum == MAIN_FORKNUM)
+                       
pgstat_progress_update_param(PROGRESS_DATACHECKSUMS_BLOCKS_DONE,
+                                                                               
 (blknum + 1));

We're updating it for every block in the main fork, but not at all for other forks? What a bizarre way of avoiding too frequent calls :-). I think you could just call this on every page, pgstat_progress_update_param() is supposed to be very fast. For comparison, e.g. index build calls it for every tuple.

+/*
+ * Configuration of conditions which must match when absorbing a procsignal
+ * barrier during data checksum enable/disable operations.  A single function
+ * is used for absorbing all barriers, and the set of conditions to use is
+ * looked up in the checksum_barriers struct.  The struct member for the target
+ * state defines which state the backend must currently be in, and which it
+ * must not be in.
+ *
+ * The reason for this explicit checking is to ensure that processing cannot
+ * be started such that it breaks the assumptions of the state machine.
+ *
+ * MAX_BARRIER_CONDITIONS must match largest number of sets in barrier_eq and
+ * barrier_ne in the below checksum_barriers definition.
+ */
+#define MAX_BARRIER_CONDITIONS 2
+typedef struct ChecksumBarrierCondition
+{
+       /* The target state of the barrier */
+       int                     target;
+       /* A set of states in which at least one MUST match the current state */
+       int                     barrier_eq[MAX_BARRIER_CONDITIONS];
+       /* The number of elements in the barrier_eq set */
+       int                     barrier_eq_sz;
+       /* A set of states which all MUST NOT match the current state */
+       int                     barrier_ne[MAX_BARRIER_CONDITIONS];
+       /* The number of elements in the barrier_ne set */
+       int                     barrier_ne_sz;
+} ChecksumBarrierCondition;
+
+static const ChecksumBarrierCondition checksum_barriers[4] =
+{
+       /*
+        * When disabling checksums, either inprogress state is Ok but checksums
+        * must not be in the enabled state.
+        */
+       {
+               .target = PG_DATA_CHECKSUM_OFF,
+               .barrier_eq = {PG_DATA_CHECKSUM_INPROGRESS_ON, 
PG_DATA_CHECKSUM_INPROGRESS_OFF},
+               .barrier_eq_sz = 2,
+               .barrier_ne = {PG_DATA_CHECKSUM_VERSION},
+               .barrier_ne_sz = 1
+       },
+       /* When enabling the current state must be inprogress-on */
+       {
+               .target = PG_DATA_CHECKSUM_VERSION,
+               .barrier_eq = {PG_DATA_CHECKSUM_INPROGRESS_ON},
+               .barrier_eq_sz = 1,
+               {0}, 0
+       },
+
+       /*
+        * When moving to inprogress-on the current state cannot enabled, but 
when
+        * moving to inprogress-off the current state must be enabled.
+        */
+       {
+               .target = PG_DATA_CHECKSUM_INPROGRESS_ON,
+               {0}, 0,
+               .barrier_ne = {PG_DATA_CHECKSUM_VERSION},
+               .barrier_ne_sz = 1
+       },
+       {
+               .target = PG_DATA_CHECKSUM_INPROGRESS_OFF,
+               .barrier_eq = {PG_DATA_CHECKSUM_VERSION},
+               .barrier_eq_sz = 1,
+               {0}, 0
+       },
+};

I find this to still be a pretty complicated and unclear way of representing the allowed transitions. There are only 16 possible transitions, and only 6 of them are allowed. How about listing the allowed ones directly:

/* Allowed transitions: from, to */
{
    /*
     * Disabling checksums: If checksums are currently enabled,
     * disabling must go through the 'inprogress-off' state.
     */
    {PG_DATA_CHECKSUM_VERSION, PG_DATA_CHECKSUM_INPROGRESS_OFF},
    {PG_DATA_CHECKSUM_INPROGRESS_OFF, PG_DATA_CHECKSUM_OFF},

    /*
     * If checksums are in the process of being enabled, but are
     * not yet being verified, we can abort by going back to 'off'
     * state.
     */
    {PG_DATA_CHECKSUM_INPROGRESS_ON, PG_DATA_CHECKSUM_OFF},

    /*
     * Enabling checksums must normally go through the 'inprogress-on'
     * state.
     */
    {PG_DATA_CHECKSUM_OFF, PG_DATA_CHECKSUM_INPROGRESS_ON},
    {PG_DATA_CHECKSUM_INPROGRESS_ON, PG_DATA_CHECKSUM_VERSION},

    /*
     * If checksums are being disabled but all backends are still
     * computing checksums, we can go straight back to 'on'
     */
    {PG_DATA_CHECKSUM_INPROGRESS_OFF, PG_DATA_CHECKSUM_VERSION},
}

+/*
+ * Signaling between backends calling pg_enable/disable_data_checksums, the
+ * checksums launcher process, and the checksums worker process.
+ *
+ * This struct is protected by DataChecksumsWorkerLock
+ */
+typedef struct DataChecksumsWorkerShmemStruct
+{
+       /*
+        * These are set by pg_{enable|disable|verify}_data_checksums, to tell 
the
+        * launcher what the target state is.
+        */
+       DataChecksumsWorkerOperation launch_operation;
+       int                     launch_cost_delay;
+       int                     launch_cost_limit;

The naming feels a little weird with this struct. It's called "DataChecksumsWorkerShmemStruct", but it's also accessed by the backends calling pg_enable/disable_data_checksums(). And "DataChecksumsWorkerOperation" is not accessed by workers at all. Or I guess the "operation" global variable is used in DataChecksumsWorkerMain(), but it's always set to ENABLE_DATACHECKSUMS in the worker. Do you need the "operation" global variable at all?

+void
+SetDataChecksumsOn(void)
+{
+       uint64          barrier;
+
+       Assert(ControlFile != NULL);
+
+       SpinLockAcquire(&XLogCtl->info_lck);
+
+       /*
+        * The only allowed state transition to "on" is from "inprogress-on" 
since
+        * that state ensures that all pages will have data checksums written.
+        */
+       if (XLogCtl->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON)
+       {
+               SpinLockRelease(&XLogCtl->info_lck);
+               elog(PANIC, "checksums not in \"inprogress-on\" mode");
+       }
+
+       SpinLockRelease(&XLogCtl->info_lck);

The PANIC seems a little harsh, you haven't done anything destructive here. It's unexpected for this to be called in any other state, so this is a "can't happen" scenario, but I don't think we usually PANIC on those.

+       <para>
+        If <parameter>cost_delay</parameter> and 
<parameter>cost_limit</parameter> are
+        specified, the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum 
Delay</link>.
+       </para>

Ugh, yet another place where we expose the "cost delay/limit" throttling mechanism. I agree it's good to be consistent here and use the same method we use for vacuum, I just wish we had something more user-friendly..


Grammar / spelling:

+ * state will also be set of "off".

+        * When moving to inprogress-on the current state cannot enabled, but 
when

+        * If a worker process currently running?  This is set by the worker

+        * These are set by pg_{enable|disable|verify}_data_checksums, to tell 
the

there is no "pg_verify_data_checksums" function.

"calcuated" in commit message

- Heikki



Reply via email to