Hi Hackers, SQL-callable replication slot functions acquire a slot (setting the process-global MyReplicationSlot) but can then ERROR before reaching ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL EXCEPTION block (which uses a subtransaction), MyReplicationSlot remains set because there is no subtransaction-level cleanup hook for replication slots.
Any subsequent slot operation in the same session then hits
Assert(MyReplicationSlot == NULL) and crashes the backend on assert
enabled builds. In release builds the stale MyReplicationSlot is silently
overwritten,
permanently orphaning the old slot as "active." The orphaned slot blocks
any other
session from acquiring it, vacuum and WAL deletion.
Repro:
SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
DO $$ BEGIN
PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn);
EXCEPTION WHEN others THEN
RAISE NOTICE 'caught: %', SQLERRM;
END $$;
SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL, NULL);
2026-05-09 19:45:06.619 UTC [1096805] STATEMENT: SELECT
pg_create_logical_replication_slot('adv_test', 'test_decoding');
TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", Line:
638, PID: 1096805
Attached a patch to address this by wrapping error-prone paths in
PG_TRY/PG_CATCH blocks
and call ReplicationSlotRelease().
Thanks,
Satya
v1-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patch
Description: Binary data
