Hi Alvaro, Sami, Thank you both for the feedback.
Sami, thanks for identifying the use-after-free with on_proc_exit and providing the v2 patch with before_shmem_exit. That helped narrow down the right approach. On Sun, May 17, 2026 at 3:46 PM Alvaro Herrera <[email protected]> wrote: > I think a better fix for this is to use PG_ENSURE_ERROR_CLEANUP(). That > way we avoid leaving callbacks in place, which would not be great if the > same backend does a lot of REPACKs: after a dozen or so, it dies with > > FATAL: out of before_shmem_exit slots > > Alvaro, you're right that PG_ENSURE_ERROR_CLEANUP is the better approach here. Attached is v3 which uses it instead of before_shmem_exit or on_proc_exit. Summary of the issues with v1 and v2: - v1 (on_proc_exit): Crashes with assertions enabled because memory contexts are already destroyed by the time on_proc_exit callbacks run. The worker handle is clobbered with 0x7f (CLOBBER_FREED_MEMORY). - v2 (before_shmem_exit): Works correctly for a single REPACK, but leaks a callback slot on each successful REPACK (CONCURRENTLY). After ~15 REPACKs in the same session: FATAL: out of before_shmem_exit slots. v3 uses PG_ENSURE_ERROR_CLEANUP which: - Handles both ERROR and FATAL exits - Automatically cancels the callback on normal completion (no slot leak) - Runs before memory contexts are destroyed (no use-after-free) The existing PG_TRY/PG_FINALLY block is replaced with PG_ENSURE_ERROR_CLEANUP for the concurrent path, and a plain rebuild_relation() call for the non-concurrent path. Tested with --enable-cassert --enable-debug --enable-injection-points: - All 245 regression tests pass (including cluster) - All 8 injection point tests pass (including repack and repack_toast) - pg_terminate_backend cleans up worker and slot without crash - 20 REPACK (CONCURRENTLY) in same session completes without slot exhaustion I have not added a dedicated regression test for the pg_terminate_backend scenario yet, but I can write one using injection points if needed. Thanks, Baji Shaik
v3-0001-Fix-REPACK-decoding-worker-not-cleaned-up-on-FATAL-exit.patch
Description: Binary data
