Hi Hackers, pg_sync_replication_slots() now retries inside a single SQL function call. Unlike the slotsync worker, it does not get a transaction boundary between retry cycles, so allocations made while fetching and synchronizing remote slots can accumulate until the function returns.
The existing list_free_deep(remote_slots) is not enough to bound this. It frees the List cells and the RemoteSlot structs stored as list elements, but it does not recursively free allocations owned by those structs, such as the copied slot name, plugin name, and database name. It also does not release unrelated per-cycle allocations made while fetching and processing the remote slots. This is probably modest in normal cases, as the retained memory is roughly proportional to the number of retries times the number of remote slots. Still, the function can wait for a long time on a lagging or misconfigured standby, so the growth is unbounded for that call. The attached patch runs each manual retry cycle in a short-lived memory context and resets it before the next attempt. The slot name list needed across retries is copied outside that context. It also adds two local cleanups. Tuple slots created with MakeSingleTupleTableSlot() are explicitly released with ExecDropSingleTupleTableSlot() before clearing the walreceiver result. The memory context would reclaim the storage eventually, but the slot also holds a reference to the result tuple descriptor, so releasing it at the matching ownership boundary seems clearer and follows nearby code. drop_local_obsolete_slots() now frees the temporary list container it builds. The retry-cycle context would reclaim this list too, but freeing it locally would make the helper self-contained. Politely CCed the original authors. -- Regards, Xuneng Zhou HighGo Software Co., Ltd.
v1-0001-Bound-memory-usage-during-manual-slot-sync-retrie.patch
Description: Binary data
