Hi, On Wed, May 27, 2026 at 5:20 PM Zhijie Hou (Fujitsu) <[email protected]> wrote:
> Hi,
>
> While testing the slot release logic, I noticed a bug in
> ReplicationSlotRelease() where it may access a replication slot array
> entry that
> has already been released by itself.
>
> The detail is: When releasing an ephemeral replication slot,
> ReplicationSlotRelease() first drops the slot via
> ReplicationSlotDropAcquired().
> After this point, the slot's shared memory slot array entry can be
> immediately
> reused by another backend creating a new slot.
>
> However, ReplicationSlotRelease() continued executing common cleanup code
> that
> still dereferenced the old slot pointer and updated shared memory fields
> such as
> effective_xmin. If the slot array entry had already been reallocated, these
> writes could inadvertently affect a different, unrelated slot.
>
> I am attaching a patch that avoids touching slot shared-memory state after
> dropping an ephemeral slot. Keep the post-release shared-memory updates
> only for
> non-ephemeral slots, where the slot remains valid after release.
>
> To reproduce, we can use the following steps:
>
> 1. Attach gdb to the backend and set a breakpoint in
> ReplicationSlotRelease()
> right after ReplicationSlotDropAcquired() is called.
> 2. Create an ephemeral slot in the above backend with an invalid output
> plugin:
> SELECT pg_create_logical_replication_slot('test_slot_dropped',
> 'pgoutput2', false, false, true);
> 3. Once the breakpoint is hit, start another backend and create a new slot
> named 'test_slot_created'.
> 4. Release the breakpoint and allow the first backend to continue. At this
> point, you will see it updating the new slot 'test_slot_created' ->
> active_proc
> (and effective_xmin, if a snapshot is being exported) to invalid values.
> 5. Start a third backend and attempt to acquire the same slot
> 'test_slot_created' ? this should not be possible under normal
> circumstances,
> but the bug allows it.
>
patch LGTM.
>
> I haven't attached a test for this fix, as the change is straightforward
> and the
> likelihood of encountering this bug is low, so it may not be worth adding
> test
> cycles for it. However, if others feel differently, I'm OK to add one.
>
+1 for a test. The fix is just an else, so a future refactor could change
it and silently
reintroduce the corruption, since it scribbles on an unrelated reused slot,
nothing
would catch it. Injection points make it deterministic; I've attached a
diff patch that adds
a test that fails without the fix and passes with it.
--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
nocfbot-test.patch
Description: Binary data
