It'd be good to print a LOG message when the checksums state changes, to
have a trail in the log of when checksums were enabled/disabled.
Something like:
LOG: enabling checksums was requested, starting checksum calculation
...
LOG: checksums calculation finished, checksums are now enabled
On 02/04/2026 02:01, Daniel Gustafsson wrote:
+ if (result == DATACHECKSUMSWORKER_FAILED)
+ {
+ /*
+ * Disable checksums on cluster, because we failed one
of the
+ * databases and this is an all or nothing process.
+ */
+ SetDataChecksumsOff();
+ ereport(ERROR,
+ errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+ errmsg("data checksums failed to get enabled
in all databases, aborting"),
+ errhint("The server log might have more
information on the cause of the error."));
+ }
This got me thinking, what happens if the the data checksums launcher
encounters some other error, for example if you SIGTERM it? The system
is left in 'inprogress-on' state, but because the launcher is gone it
will never finish and 'pg_stat_progress_data_checksums' will be empty.
Perhaps launcher_exit() should call SetDataChecksumsOff()?
/*
* TODO: how to really handle the worker still running when the
* launcher exits?
*/
if (DataChecksumsWorkerShmem->worker_running)
ereport(LOG,
errmsg("data checksums launcher exiting
while worker is still running"));
That TODO should be addressed somehow.
+ /*
+ * As of now we only update the block counter for main forks in
order
+ * to not cause too frequent calls. TODO: investigate whether we
+ * should do it more frequent?
+ */
+ if (forkNum == MAIN_FORKNUM)
+
pgstat_progress_update_param(PROGRESS_DATACHECKSUMS_BLOCKS_DONE,
+
(blknum + 1));
We're updating it for every block in the main fork, but not at all for
other forks? What a bizarre way of avoiding too frequent calls :-). I
think you could just call this on every page,
pgstat_progress_update_param() is supposed to be very fast. For
comparison, e.g. index build calls it for every tuple.
+/*
+ * Configuration of conditions which must match when absorbing a procsignal
+ * barrier during data checksum enable/disable operations. A single function
+ * is used for absorbing all barriers, and the set of conditions to use is
+ * looked up in the checksum_barriers struct. The struct member for the target
+ * state defines which state the backend must currently be in, and which it
+ * must not be in.
+ *
+ * The reason for this explicit checking is to ensure that processing cannot
+ * be started such that it breaks the assumptions of the state machine.
+ *
+ * MAX_BARRIER_CONDITIONS must match largest number of sets in barrier_eq and
+ * barrier_ne in the below checksum_barriers definition.
+ */
+#define MAX_BARRIER_CONDITIONS 2
+typedef struct ChecksumBarrierCondition
+{
+ /* The target state of the barrier */
+ int target;
+ /* A set of states in which at least one MUST match the current state */
+ int barrier_eq[MAX_BARRIER_CONDITIONS];
+ /* The number of elements in the barrier_eq set */
+ int barrier_eq_sz;
+ /* A set of states which all MUST NOT match the current state */
+ int barrier_ne[MAX_BARRIER_CONDITIONS];
+ /* The number of elements in the barrier_ne set */
+ int barrier_ne_sz;
+} ChecksumBarrierCondition;
+
+static const ChecksumBarrierCondition checksum_barriers[4] =
+{
+ /*
+ * When disabling checksums, either inprogress state is Ok but checksums
+ * must not be in the enabled state.
+ */
+ {
+ .target = PG_DATA_CHECKSUM_OFF,
+ .barrier_eq = {PG_DATA_CHECKSUM_INPROGRESS_ON,
PG_DATA_CHECKSUM_INPROGRESS_OFF},
+ .barrier_eq_sz = 2,
+ .barrier_ne = {PG_DATA_CHECKSUM_VERSION},
+ .barrier_ne_sz = 1
+ },
+ /* When enabling the current state must be inprogress-on */
+ {
+ .target = PG_DATA_CHECKSUM_VERSION,
+ .barrier_eq = {PG_DATA_CHECKSUM_INPROGRESS_ON},
+ .barrier_eq_sz = 1,
+ {0}, 0
+ },
+
+ /*
+ * When moving to inprogress-on the current state cannot enabled, but
when
+ * moving to inprogress-off the current state must be enabled.
+ */
+ {
+ .target = PG_DATA_CHECKSUM_INPROGRESS_ON,
+ {0}, 0,
+ .barrier_ne = {PG_DATA_CHECKSUM_VERSION},
+ .barrier_ne_sz = 1
+ },
+ {
+ .target = PG_DATA_CHECKSUM_INPROGRESS_OFF,
+ .barrier_eq = {PG_DATA_CHECKSUM_VERSION},
+ .barrier_eq_sz = 1,
+ {0}, 0
+ },
+};
I find this to still be a pretty complicated and unclear way of
representing the allowed transitions. There are only 16 possible
transitions, and only 6 of them are allowed. How about listing the
allowed ones directly:
/* Allowed transitions: from, to */
{
/*
* Disabling checksums: If checksums are currently enabled,
* disabling must go through the 'inprogress-off' state.
*/
{PG_DATA_CHECKSUM_VERSION, PG_DATA_CHECKSUM_INPROGRESS_OFF},
{PG_DATA_CHECKSUM_INPROGRESS_OFF, PG_DATA_CHECKSUM_OFF},
/*
* If checksums are in the process of being enabled, but are
* not yet being verified, we can abort by going back to 'off'
* state.
*/
{PG_DATA_CHECKSUM_INPROGRESS_ON, PG_DATA_CHECKSUM_OFF},
/*
* Enabling checksums must normally go through the 'inprogress-on'
* state.
*/
{PG_DATA_CHECKSUM_OFF, PG_DATA_CHECKSUM_INPROGRESS_ON},
{PG_DATA_CHECKSUM_INPROGRESS_ON, PG_DATA_CHECKSUM_VERSION},
/*
* If checksums are being disabled but all backends are still
* computing checksums, we can go straight back to 'on'
*/
{PG_DATA_CHECKSUM_INPROGRESS_OFF, PG_DATA_CHECKSUM_VERSION},
}
+/*
+ * Signaling between backends calling pg_enable/disable_data_checksums, the
+ * checksums launcher process, and the checksums worker process.
+ *
+ * This struct is protected by DataChecksumsWorkerLock
+ */
+typedef struct DataChecksumsWorkerShmemStruct
+{
+ /*
+ * These are set by pg_{enable|disable|verify}_data_checksums, to tell
the
+ * launcher what the target state is.
+ */
+ DataChecksumsWorkerOperation launch_operation;
+ int launch_cost_delay;
+ int launch_cost_limit;
The naming feels a little weird with this struct. It's called
"DataChecksumsWorkerShmemStruct", but it's also accessed by the backends
calling pg_enable/disable_data_checksums(). And
"DataChecksumsWorkerOperation" is not accessed by workers at all. Or I
guess the "operation" global variable is used in
DataChecksumsWorkerMain(), but it's always set to ENABLE_DATACHECKSUMS
in the worker. Do you need the "operation" global variable at all?
+void
+SetDataChecksumsOn(void)
+{
+ uint64 barrier;
+
+ Assert(ControlFile != NULL);
+
+ SpinLockAcquire(&XLogCtl->info_lck);
+
+ /*
+ * The only allowed state transition to "on" is from "inprogress-on"
since
+ * that state ensures that all pages will have data checksums written.
+ */
+ if (XLogCtl->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON)
+ {
+ SpinLockRelease(&XLogCtl->info_lck);
+ elog(PANIC, "checksums not in \"inprogress-on\" mode");
+ }
+
+ SpinLockRelease(&XLogCtl->info_lck);
The PANIC seems a little harsh, you haven't done anything destructive
here. It's unexpected for this to be called in any other state, so this
is a "can't happen" scenario, but I don't think we usually PANIC on those.
+ <para>
+ If <parameter>cost_delay</parameter> and
<parameter>cost_limit</parameter> are
+ specified, the process is throttled using the same principles as
+ <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum
Delay</link>.
+ </para>
Ugh, yet another place where we expose the "cost delay/limit" throttling
mechanism. I agree it's good to be consistent here and use the same
method we use for vacuum, I just wish we had something more user-friendly..
Grammar / spelling:
+ * state will also be set of "off".
+ * When moving to inprogress-on the current state cannot enabled, but
when
+ * If a worker process currently running? This is set by the worker
+ * These are set by pg_{enable|disable|verify}_data_checksums, to tell
the
there is no "pg_verify_data_checksums" function.
"calcuated" in commit message
- Heikki