On 14/10/2025 11:11, Andrei Lepikhov wrote:
For me, the ideal place for such a hook is CheckPointGuts, right between the CheckPointBuffers call and fsyncs. I think that to demonstrate how this hook can work, the pg_stat_statements storage may need to be redesigned slightly.
There are two patches: 0001, which is the checkpoint hook itself, and 0002, which includes an example and a trivial test.

During development, I attempted to apply it in my different modules and realised that the hook is preferred over an RMGR callback - I don't actually want to be forced to register RMGR in each project and have it loadable on an instance startup. In lightweight modules, I want to keep my knowledge base relatively close to the current state of the instance. Nevertheless, the plan freezing extension (for example) needs to ensure that the user's query plan is 'frozen' after the function call. Therefore, we need to store the decision made in the WAL, which requires dumping the state into a file before performing the WAL cut. Additionally, I'd like to experiment with synchronising an extension state between master and replica through WAL records, as most optimisation recommendations are relevant to both instances.

Patch 0001 contains a hook that is called once after all checkpoint preparations have finished. I recall that people mentioned it might be helpful for AMs as well - feel free to propose changes to this patch.

Patch 2 adds an example to the test_dsm_registry module, as it is precisely the way I write the code: named DSM segment -> shared HTAB -> file dump. So, it looks natural and opens a room to extend this example by employing RMGR and xact callbacks to keep the extension state as close to the committed changes as possible.

The test looks pretty trivial so far - feel free to propose ideas on how to extend it.

--
regards, Andrei Lepikhov,
pgEdge
From a0e8d75223fa95dbec1e422eacaef336e45c2008 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <[email protected]>
Date: Thu, 13 Nov 2025 15:00:43 +0100
Subject: [PATCH 1/2] Add a hook for Checkpoint processing.

There are many situations in which a Postgres plugin may need to maintain its
internal state across restarts or crashes. Sometimes it wants to synchronise
its state on logical replicas or be saved in a backup employing custom RMGR and
WAL records.

For statistical extensions, such as pg_stat_statements, it is okay to save
their state on postmaster shutdown. However, business extensions may want
to maintain more actual state, periodically dumping it to a disk file or using
WAL and xact callbacks to be as close as possible to the current
database state.

Checkpoint is a key moment where the DBMS performs disk synchronisation and
cuts the WAL. It is a good time to do the same thing for a plugin, too.
Moreover, the plugin is sure that nothing important will be lost with
the WAL cut.

Discussion: 
https://www.postgresql.org/message-id/CANbhV-E4pTWeF-DsdaGsOrjJNFWPaR%2BDstjrnkqvf9JFFgOKKQ%40mail.gmail.com
---
 src/backend/access/transam/xlog.c | 15 +++++++++++++++
 src/include/access/xlog.h         |  4 ++++
 src/tools/pgindent/typedefs.list  |  1 +
 3 files changed, 20 insertions(+)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 22d0a2e8c3a..c7c0b226724 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -157,6 +157,13 @@ int                        wal_segment_size = 
DEFAULT_XLOG_SEG_SIZE;
  */
 int                    CheckPointSegments;
 
+/*
+ * Hook for plugins to take control during checkpoint processing. All
+ * preparation procedures have already been done, and only the sync needs
+ * to be done.
+ */
+Checkpoint_hook_type Checkpoint_hook = NULL;
+
 /* Estimated distance between checkpoints, in bytes */
 static double CheckPointDistanceEstimate = 0;
 static double PrevCheckPointDistance = 0;
@@ -7594,6 +7601,14 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
        CheckPointPredicate();
        CheckPointBuffers(flags);
 
+       /*
+        * Allow a plugin that depends on a custom RMGR to retain its state 
through
+        * reboots or crashes by following specific steps, ensuring that 
essential
+        * WAL records are not truncated.
+        */
+       if (Checkpoint_hook)
+               Checkpoint_hook(checkPointRedo, flags);
+
        /* Perform all queued up fsyncs */
        TRACE_POSTGRESQL_BUFFER_CHECKPOINT_SYNC_START();
        CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 605280ed8fb..5c071974557 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,6 +198,10 @@ typedef enum WALAvailability
 struct XLogRecData;
 struct XLogReaderState;
 
+/* Hook for plugins to get control at the end of a CheckPoint */
+typedef void (*Checkpoint_hook_type)(XLogRecPtr checkPointRedo, int flags);
+extern PGDLLIMPORT Checkpoint_hook_type Checkpoint_hook;
+
 extern XLogRecPtr XLogInsertRecord(struct XLogRecData *rdata,
                                                                   XLogRecPtr 
fpw_lsn,
                                                                   uint8 flags,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23bce72ae64..6ca05499081 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -413,6 +413,7 @@ CatalogIndexState
 ChangeVarNodes_callback
 ChangeVarNodes_context
 CheckPoint
+Checkpoint_hook_type
 CheckPointStmt
 CheckpointStatsData
 CheckpointerRequest
-- 
2.51.2

From f92abbcc3667103628608d248870867200087e16 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <[email protected]>
Date: Fri, 14 Nov 2025 16:35:21 +0100
Subject: [PATCH 2/2] Testing module

---
 src/test/modules/test_dsm_registry/Makefile   |   1 +
 .../test_dsm_registry/t/001_file_storage.pl   |  31 ++++
 .../test_dsm_registry/test_dsm_registry.c     | 163 ++++++++++++++++++
 3 files changed, 195 insertions(+)
 create mode 100644 src/test/modules/test_dsm_registry/t/001_file_storage.pl

diff --git a/src/test/modules/test_dsm_registry/Makefile 
b/src/test/modules/test_dsm_registry/Makefile
index b13e99a354f..9aae8b98aba 100644
--- a/src/test/modules/test_dsm_registry/Makefile
+++ b/src/test/modules/test_dsm_registry/Makefile
@@ -10,6 +10,7 @@ EXTENSION = test_dsm_registry
 DATA = test_dsm_registry--1.0.sql
 
 REGRESS = test_dsm_registry
+TAP_TESTS = 1
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/src/test/modules/test_dsm_registry/t/001_file_storage.pl 
b/src/test/modules/test_dsm_registry/t/001_file_storage.pl
new file mode 100644
index 00000000000..0e82d0adcf7
--- /dev/null
+++ b/src/test/modules/test_dsm_registry/t/001_file_storage.pl
@@ -0,0 +1,31 @@
+# Copyright (c) 2023-2025, PostgreSQL Global Development Group
+use strict;
+use warnings FATAL => 'all';
+use Config;
+use PostgreSQL::Test::Utils;
+use PostgreSQL::Test::Cluster;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('node');
+
+$node->init();
+$node->append_conf('postgresql.conf',
+                                                       
"shared_preload_libraries = 'test_dsm_registry'");
+$node->start();
+
+$node->safe_psql('postgres', "CREATE EXTENSION test_dsm_registry");
+
+my $result;
+
+$node->safe_psql('postgres', "SELECT set_val_in_hash('test-1', '1414')");
+$node->safe_psql('postgres', 'CHECKPOINT');
+$node->safe_psql('postgres', "SELECT set_val_in_hash('test-2', '1415')");
+$node->stop('immediate');
+$node->start();
+
+$result = $node->safe_psql('postgres', "SELECT get_val_in_hash('test-1')");
+is($result, '1414', "Value inserted before the checkpoint was restored");
+$result = $node->safe_psql('postgres', "SELECT get_val_in_hash('test-2')");
+is($result, '', "Value inserted after the checkpoint was lost");
+
+done_testing();
diff --git a/src/test/modules/test_dsm_registry/test_dsm_registry.c 
b/src/test/modules/test_dsm_registry/test_dsm_registry.c
index 4cc2ccdac3f..2d7fd35a74d 100644
--- a/src/test/modules/test_dsm_registry/test_dsm_registry.c
+++ b/src/test/modules/test_dsm_registry/test_dsm_registry.c
@@ -12,13 +12,22 @@
  */
 #include "postgres.h"
 
+#include "access/xlog.h"
 #include "fmgr.h"
+#include "pgstat.h"
 #include "storage/dsm_registry.h"
+#include "storage/fd.h"
 #include "storage/lwlock.h"
 #include "utils/builtins.h"
+#include "utils/hsearch.h"
 
 PG_MODULE_MAGIC;
 
+/* Location of permanent storage file (valid on checkpoint) */
+#define TDR_DUMP_FILE  PGSTAT_STAT_PERMANENT_DIRECTORY 
"/pg_stat_statements.stat"
+/* Magic number identifying the stats file format */
+static const uint32 TDR_FILE_HEADER = 0x20251114;
+
 typedef struct TestDSMRegistryStruct
 {
        int                     val;
@@ -43,6 +52,11 @@ static const dshash_parameters dsh_params = {
        dshash_strcpy
 };
 
+static Checkpoint_hook_type    prev_Checkpoint_hook = NULL;
+
+static void load_htab(void);
+static void pgss_Checkpoint(XLogRecPtr checkPointRedo, int flags);
+
 static void
 init_tdr_dsm(void *ptr)
 {
@@ -66,7 +80,14 @@ tdr_attach_shmem(void)
                tdr_dsa = GetNamedDSA("test_dsm_registry_dsa", &found);
 
        if (tdr_hash == NULL)
+       {
+               LWLockAcquire(&tdr_dsm->lck, LW_EXCLUSIVE);
                tdr_hash = GetNamedDSHash("test_dsm_registry_hash", 
&dsh_params, &found);
+               if (!found)
+                       load_htab();
+
+               LWLockRelease(&tdr_dsm->lck);
+       }
 }
 
 PG_FUNCTION_INFO_V1(set_val_in_shmem);
@@ -144,3 +165,145 @@ get_val_in_hash(PG_FUNCTION_ARGS)
 
        PG_RETURN_TEXT_P(val);
 }
+
+/*
+ * Load any pre-existing entries from file.
+ */
+static void
+load_htab(void)
+{
+       bool    found;
+       FILE   *file = NULL;
+       uint32  header;
+       char   *val = palloc(1);
+
+       Assert(tdr_dsa != NULL && tdr_hash != NULL);
+
+       /*
+        * Attempt to load old entries from the dump file.
+        */
+       file = AllocateFile(TDR_DUMP_FILE, PG_BINARY_R);
+       if (file == NULL)
+       {
+               if (errno != ENOENT)
+                       goto read_error;
+               /* No existing persisted file, so we're done */
+               return;
+       }
+
+       if (fread(&header, sizeof(uint32), 1, file) != 1 ||
+               header != TDR_FILE_HEADER)
+               goto read_error;
+
+       while (!feof(file))
+       {
+               TestDSMRegistryHashEntry *entry;
+               char    key[64];
+               int             keylen = offsetof(TestDSMRegistryHashEntry, 
val);
+               int32   vlen;
+
+               if (fread(key, keylen, 1, file) != 1 ||
+                       fread(&vlen, sizeof(int32), 1, file) != 1)
+                       goto read_error;
+
+               val = repalloc(val, vlen);
+               if (fread(val, vlen, 1, file) != 1)
+                       goto read_error;
+
+               Assert(val[vlen - 1] == '\0');
+
+               entry = (TestDSMRegistryHashEntry *)
+                                                               
dshash_find_or_insert(tdr_hash, key, &found);
+               Assert(!found);
+
+               entry->val = dsa_allocate(tdr_dsa, strlen(val) + 1);
+               strcpy(dsa_get_address(tdr_dsa, entry->val), val);
+
+               dshash_release_lock(tdr_hash, entry);
+       }
+
+       FreeFile(file);
+       return;
+
+read_error:
+       ereport(LOG,
+                       (errcode_for_file_access(),
+                        errmsg("could not read from file \"%s\": %m", 
TDR_DUMP_FILE)));
+       if (file)
+               FreeFile(file);
+       /* If possible, throw away the bogus file; ignore any error */
+       unlink(TDR_DUMP_FILE);
+}
+
+/*
+ * Dump hash table into file.
+ *
+ */
+static void
+pgss_Checkpoint(XLogRecPtr checkPointRedo, int flags)
+{
+       FILE                                       *file;
+       dshash_seq_status                       hstat;
+       TestDSMRegistryHashEntry   *entry;
+
+       if (flags & CHECKPOINT_END_OF_RECOVERY)
+               return;
+
+       tdr_attach_shmem();
+
+       file = AllocateFile(TDR_DUMP_FILE ".tmp", PG_BINARY_W);
+       if (file == NULL)
+               goto error;
+       if (fwrite(&TDR_FILE_HEADER, sizeof(uint32), 1, file) != 1)
+               goto error;
+
+       dshash_seq_init(&hstat, tdr_hash, false);
+       while ((entry = dshash_seq_next(&hstat)) != NULL)
+       {
+               int             keylen = offsetof(TestDSMRegistryHashEntry, 
val);
+               char   *val;
+               int32   vlen;
+
+               val = (char *) dsa_get_address(tdr_dsa, entry->val);
+               vlen = strlen(val) + 1;
+               if (fwrite(entry->key, keylen, 1, file) != 1 ||
+                       fwrite(&vlen, sizeof(int32), 1, file) != 1 ||
+                       fwrite(val, vlen, 1, file) != 1)
+               {
+                       dshash_seq_term(&hstat);
+                       goto error;
+               }
+       }
+       dshash_seq_term(&hstat);
+
+       if (FreeFile(file))
+       {
+               file = NULL;
+               goto error;
+       }
+
+       /*
+        * Rename file into place, so we atomically replace any old one.
+        */
+       (void) durable_rename(TDR_DUMP_FILE ".tmp", TDR_DUMP_FILE, LOG);
+       return;
+
+error:
+       ereport(LOG,
+                       (errcode_for_file_access(),
+                        errmsg("could not write file \"%s\": %m",
+                                       TDR_DUMP_FILE ".tmp")));
+       if (file)
+               FreeFile(file);
+       unlink(TDR_DUMP_FILE ".tmp");
+}
+
+/*
+ * Entry point for this module.
+ */
+void
+_PG_init(void)
+{
+       prev_Checkpoint_hook = Checkpoint_hook;
+       Checkpoint_hook = pgss_Checkpoint;
+}
-- 
2.51.2

Reply via email to