On Wed, Apr 29, 2026 at 8:11 PM Chao Li <[email protected]> wrote:
>
>
>
> > On Apr 29, 2026, at 09:28, Chao Li <[email protected]> wrote:
> >
> >
> >
> >> On Apr 29, 2026, at 05:15, Masahiko Sawada <[email protected]> wrote:
> >>
> >> Hi all,
> >>
> >> I found a race condition issue between XLogLogicalInfo and ProcSignal
> >> initialization while reviewing another issue[1]. I'm starting a
> >> separate thread for the subject as it's not related to the issue
> >> reported on that thread.
> >>
> >> The issue is that child processes could miss the
> >> PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO signal during the
> >> initialization and end up in an inconsistent state because
> >> InitializeProcessXLogLogicalInfo() is called (in BaseInit()) before
> >> ProcSignalInit(). If the startup emits the signal to a process who is
> >> between two steps, the process would not reflect the latest
> >> XLogLogicalInfo state. I think we should move
> >> InitializeProcessXLogLogicalInfo() after ProcSignalInit() like we do
> >> so for InitLocalDataChecksumState().
> >
> > I think this is correct.
> >
> > After moving InitializeProcessXLogLogicalInfo() out of BaseInit(),
> > background worker processes (BackgroundWorkerMain) will no longer hold a
> > valid value of XLogLogicalInfo, but I guess that is fine as those processes
> > don’t call ProcSignalInit() anyway.
No, even after moving the InitializeProcessXLogLogicalInfo(),
bgworkers who connected a database will call InitPostgres(),
initializing the proc signals and XLogLogicalInfo.
>
> I met Zhijie Hou at HOW 2026 a few days ago. When we talked about a feature
> requirement I recently heard from a DBA, Zhijie pointed me to 67c20979ce
> (Toggle logical decoding dynamically based on logical slot presence).
>
> The requirement is that storage is expensive today, and users are sensitive
> to the total size of WAL. In some deployments, users may only want to
> replicate a small set of tables intermittently, but to enable logical
> replication, they still have to set wal_level to logical, which significantly
> increases the total WAL volume. I believe this feature could help address
> that concern, so I reviewed the code and played a bit with it.
>
> I found an issue related to this patch, so I am sharing my findings here,
> although the problem also exists before this patch.
>
> In InitPostgres(), in the standalone backend path, StartupXLOG() is called,
> but XLogLogicalInfo is not updated. As a result, if we switch to standalone
> mode for some emergency maintenance, make data changes, and then switch back
> to normal mode, changes made during standalone mode would not include logical
> replication metadata, which may potentially break future logical replication.
>
> To verify that, I did a test like:
>
> * Start a new instance with wal_level = replica
> * Create a table, insert some data, then create a logical replication slot
> ```
> evantest=# CREATE TABLE t1(id int);
> CREATE TABLE
> evantest=# INSERT INTO t1 VALUES (1), (2);
> INSERT 0 2
> evantest=# SELECT * FROM pg_create_logical_replication_slot('s1',
> 'test_decoding');
> slot_name | lsn
> -----------+------------
> s1 | 0/01D6E6D0
> (1 row)
> ```
>
> * Stop the server, and start with standalone mode, and truncate the table:
> ```
> % postgres --single -F -D . evantest
>
> PostgreSQL stand-alone backend 19devel
> backend> show effective_wal_level;
> 1: effective_wal_level (typeid = 25, len = -1, typmod = -1, byval =
> f)
> ----
> 1: effective_wal_level = "replica" (typeid = 25, len = -1,
> typmod = -1, byval = f)
> ----
> backend> truncate t1;
> backend> 2026-04-29 21:13:24.625 CST [68316] LOG: checkpoint starting:
> shutdown fast
> ```
>
> * Start the server normally, and real WAL through the logical slot.
> ```
> evantest=# SELECT data FROM pg_logical_slot_get_changes('s1', NULL, NULL);
> data
> ------------
> BEGIN 721
> COMMIT 721
> (2 rows)
> ```
>
> The TRUNCATE does not appear, which I think is wrong. To fix that, we only
> need to call InitializeProcessXLogLogicalInfo()after StartupXLOG() in the
> standalone path. Since the fix is based on this patch, I added it as 0002 in
> this patch set.
Good catch. I've updated the patch.
>
> One more thought: I think this feature partially addresses the user
> requirement I described earlier. When wal_level is replicaand some logical
> slots are created, the extra WAL data should only be enabled for tables
> included in those slots. That avoids generating unnecessary WAL data for
> tables that are not targets of replication, and therefore saves storage.
> WDYT? Maybe a candidate for v20?
>
This would require additional functionality to logical replication
slots so that they include the specific tables, and then when writing
WAL records each backend process needs to figure out whether the table
is included in any replication slots. While the idea sounds
interesting, it also sounds complex and potentially introduces
overheads.
> BTW, in 0001, I helped fix the typos.
Thank you!
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From 019569149d28f78c90833bcc6164400c9d1edda5 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <[email protected]>
Date: Fri, 24 Apr 2026 10:36:55 -0700
Subject: [PATCH v3] Fix race condition in XLogLogicalInfo and ProcSignal
initialization.
Previously, InitializeProcessXLogLogicalInfo() was called before
ProcSignalInit(). This created a window where a process could miss a
signal barrier if it was issued between these two calls. As a result,
the process could fail to update its local XLogLogicalInfo cache,
leading to an inconsistent logical decoding state.
This commit fixes this by moving InitializeProcessXLogLogicalInfo()
after ProcSignalInit(). This ensures that the process is registered to
participate in signal barriers before its state is initialized,
preventing it from missing any state change propagated during the
startup sequence.
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Matthias van de Meent <[email protected]>
Discussion: https://postgr.es/m/cad21aobzdesylsspm5e6ysn1r8qzp8u_brmnlvuap_s8qxs...@mail.gmail.com
Discussion: https://postgr.es/m/cad21aobj+zkvgw_q8gjr4ybkccw_ume3ofq5+kt246fhuun...@mail.gmail.com
---
src/backend/postmaster/auxprocess.c | 8 ++++++++
src/backend/utils/init/postinit.c | 13 ++++++++++---
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index ba8c9add67a..9803a0ee2a1 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -98,6 +98,14 @@ AuxiliaryProcessMainCommon(void)
RESUME_INTERRUPTS();
+ /*
+ * Initialize the process-local logical info WAL logging state.
+ *
+ * This must be called after ProcSignalInit() so that the process can
+ * participate in procsignal-based barriers that update this state.
+ */
+ InitializeProcessXLogLogicalInfo();
+
/*
* Auxiliary processes don't run transactions, but they may need a
* resource owner anyway to manage buffer pins acquired outside
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index ecf78b9a986..2460e550f96 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -662,9 +662,6 @@ BaseInit(void)
/* Initialize lock manager's local structs */
InitLockManagerAccess();
- /* Initialize logical info WAL logging state */
- InitializeProcessXLogLogicalInfo();
-
/*
* Initialize replication slots after pgstat. The exit hook might need to
* drop ephemeral slots, which in turn triggers stats reporting.
@@ -833,6 +830,16 @@ InitPostgres(const char *in_dbname, Oid dboid,
before_shmem_exit(ShutdownXLOG, 0);
}
+ /*
+ * Initialize the process-local logical info WAL logging state.
+ *
+ * This must be called after ProcSignalInit() so that the process can
+ * participate in procsignal-based barriers that update this state.
+ * Furthermore, in !IsUnderPostmaster cases, this must occur after
+ * StartupXLOG() where the shared state is first established.
+ */
+ InitializeProcessXLogLogicalInfo();
+
/*
* Initialize the relation cache and the system catalog caches. Note that
* no catalog access happens here; we only set up the hashtable structure.
--
2.54.0