On Sat, May 16, 2026 at 7:35 PM Xuneng Zhou <[email protected]> wrote:
>
> On Fri, May 15, 2026 at 9:20 PM Xuneng Zhou <[email protected]> wrote:
> >
> > Hi Amit,
> >
> > On Fri, May 15, 2026 at 5:23 PM Amit Kapila <[email protected]> wrote:
> > >
> > > On Fri, May 15, 2026 at 11:02 AM Xuneng Zhou <[email protected]> wrote:
> > > >
> > > > pg_sync_replication_slots() now retries inside a single SQL function
> > > > call.  Unlike the slotsync worker, it does not get a transaction 
> > > > boundary
> > > > between retry cycles, so allocations made while fetching and 
> > > > synchronizing
> > > > remote slots can accumulate until the function returns.
> > > >
> > > > The existing list_free_deep(remote_slots) is not enough to bound this.
> > > > It frees the List cells and the RemoteSlot structs stored as list
> > > > elements, but it does not recursively free allocations owned by those
> > > > structs, such as the copied slot name, plugin name, and database name.
> > > >
> > >
> > > Right.
> > >
> > > > It also does not release unrelated per-cycle allocations made while
> > > > fetching and processing the remote slots.
> > > >
> > >
> > > BTW, did you notice via test or otherwise, what and how much other
> > > unrelated memory is being allocated during each sync cycle if we
> > > manually free the allocations you observed? This is mainly to learn
> > > the impact of not doing all these allocations in some short-duration
> > > memory context.
> >
> > I noticed this by reading the feature code while walking through the
> > PG 19 release notes, not by observing actual memory growth in a
> > running system. Besides the RemoteSlot field strings, there seems to
> > be a few smaller per-cycle allocations that also accumulate:
> > quote_literal_cstr() strings used to build the filtered query, a
> > temporary TextDatumGetCString() result for invalidation_reason, the
> > standalone TupleTableSlot in fetch_remote_slots(), and the list
> > container built by get_local_synced_slots(). I chose a per-cycle
> > memory context over manual pfrees to make the retry-cycle lifetime
> > explicit and avoid maintaining a destructor for every current and
> > future allocation in this path. It may be worth measuring how much
> > extra memory actually accumulates during an extended wait to confirm
> > the practical impact.
>
> I did some measurements for the memory growth in the manual
> pg_sync_replication_slots() retry path.
>
> The test used 100 failover logical slots and forced
> pg_sync_replication_slots() to keep retrying on the standby. Memory
> was sampled with pg_log_backend_memory_contexts() on the backend
> running the function.
>
> On HEAD, the short run showed:
> timestamp total_bytes delta
> 2026-05-16T10:47:54Z,63422,1339920,
> 2026-05-16T10:47:56Z,63422,1405456,65536
> 2026-05-16T10:47:59Z,63422,1405456,0
> 2026-05-16T10:48:01Z,63422,1405456,0
> 2026-05-16T10:48:03Z,63422,1536528,131072
> 2026-05-16T10:48:05Z,63422,1536528,0
>
> So the total increase was about 192 KiB.
>
> After adding a targeted cleanup for the copied RemoteSlot strings, the
> same test showed:
>
> timestamp total_bytes delta
> 2026-05-16T11:04:58Z,77000,1339920,
> 2026-05-16T11:05:00Z,77000,1339920,0
> 2026-05-16T11:05:02Z,77000,1405456,65536
> 2026-05-16T11:05:04Z,77000,1405456,0
>
> So the increase dropped to about 64 KiB.
>
> With a per-retry memory context around the fetch/synchronize cycle,
> the same test stayed flat:
>
> 2026-05-16T11:09:10Z,84600,1315344,
> 2026-05-16T11:09:12Z,84600,1315344,0
> 2026-05-16T11:09:14Z,84600,1315344,0
> 2026-05-16T11:09:16Z,84600,1315344,0
> 2026-05-16T11:09:18Z,84600,1315344,0
> 2026-05-16T11:09:20Z,84600,1315344,0
>
> Looking at the memory-context dumps, the growth on HEAD was visible
> under ExprContext. The grand-total increase matched the ExprContex
> increase, which is consistent with retry-cycle allocations surviving
> for the duration of the single SQL function call.
>
> That said, the practical severity looks small. This is mainly because
> the retry loop is not a tight loop. With no slot activity,
> wait_for_slot_activity() doubles the sleep time up to 30 seconds.
>
> So after about 51 seconds it retries only about twice per minute. For
> 100 slots and no slot activity, a rough 1-hour test from the short
> runs is on the order of a few MB on HEAD, and around 1 MB with the
> manual RemoteSlot cleanup. For installations with fewer slots, it
> should be smaller.
>
> So my read is:
> the accumulation is real;
> it is modest because of the retry backoff;
> manually freeing RemoteSlot’s copied strings removes most of the
> observed growth;
> a per-retry memory context fully bounds the retry-cycle lifetime

Expectedly, the memory accumulation is much more pronounced with a
churning slot to disablet the backoff. I'll share the findings later.

-- 
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.


Reply via email to