On Fri, May 15, 2026 at 11:03 AM Xuneng Zhou <[email protected]> wrote:
>
> Hi Hackers,
>
> pg_sync_replication_slots() now retries inside a single SQL function
> call.  Unlike the slotsync worker, it does not get a transaction boundary
> between retry cycles, so allocations made while fetching and synchronizing
> remote slots can accumulate until the function returns.
>
> The existing list_free_deep(remote_slots) is not enough to bound this.
> It frees the List cells and the RemoteSlot structs stored as list
> elements, but it does not recursively free allocations owned by those
> structs, such as the copied slot name, plugin name, and database name.
> It also does not release unrelated per-cycle allocations made while
> fetching and processing the remote slots.
>
> This is probably modest in normal cases, as the retained memory is roughly
> proportional to the number of retries times the number of remote slots.
> Still, the function can wait for a long time on a lagging or
> misconfigured standby, so the growth is unbounded for that call.
>
> The attached patch runs each manual retry cycle in a short-lived memory
> context and resets it before the next attempt.  The slot name list needed
> across retries is copied outside that context.
>
> It also adds two local cleanups.  Tuple slots created with
> MakeSingleTupleTableSlot() are explicitly released with
> ExecDropSingleTupleTableSlot() before clearing the walreceiver result.
> The memory context would reclaim the storage eventually, but the slot also
> holds a reference to the result tuple descriptor, so releasing it at the
> matching ownership boundary seems clearer and follows nearby code.
>
> drop_local_obsolete_slots() now frees the temporary list container it
> builds.  The retry-cycle context would reclaim this list too, but freeing
> it locally would make the helper self-contained.
>
> Politely CCed the original authors.
>
>

I like the core idea of this patch. It makes the flow cleaner and
protects the flow from memory-leaks when dealing with a high number of
slots. I think during implementation, we considered having a separate
memory context, but since we were freeing the remote_slots then, we
thought allocating a new memory context might not be required. But on
re-thinking and reading the details above, IMO, it is a good
improvement. But let's see what others have to say.

A couple of minor comments:

1. I think we need to delete the new memory context in
slotsync_failure_callback() as well.

2. Also, it will be good to add a comment before switiching to
old-context before extract_slot_names(). Or better we can move the
swicthing inside extract_slot_names() with a comment.

thanks
Shveta


Reply via email to