On Fri, May 15, 2026 at 9:20 PM Xuneng Zhou <[email protected]> wrote: > > Hi Amit, > > On Fri, May 15, 2026 at 5:23 PM Amit Kapila <[email protected]> wrote: > > > > On Fri, May 15, 2026 at 11:02 AM Xuneng Zhou <[email protected]> wrote: > > > > > > pg_sync_replication_slots() now retries inside a single SQL function > > > call. Unlike the slotsync worker, it does not get a transaction boundary > > > between retry cycles, so allocations made while fetching and synchronizing > > > remote slots can accumulate until the function returns. > > > > > > The existing list_free_deep(remote_slots) is not enough to bound this. > > > It frees the List cells and the RemoteSlot structs stored as list > > > elements, but it does not recursively free allocations owned by those > > > structs, such as the copied slot name, plugin name, and database name. > > > > > > > Right. > > > > > It also does not release unrelated per-cycle allocations made while > > > fetching and processing the remote slots. > > > > > > > BTW, did you notice via test or otherwise, what and how much other > > unrelated memory is being allocated during each sync cycle if we > > manually free the allocations you observed? This is mainly to learn > > the impact of not doing all these allocations in some short-duration > > memory context. > > I noticed this by reading the feature code while walking through the > PG 19 release notes, not by observing actual memory growth in a > running system. Besides the RemoteSlot field strings, there seems to > be a few smaller per-cycle allocations that also accumulate: > quote_literal_cstr() strings used to build the filtered query, a > temporary TextDatumGetCString() result for invalidation_reason, the > standalone TupleTableSlot in fetch_remote_slots(), and the list > container built by get_local_synced_slots(). I chose a per-cycle > memory context over manual pfrees to make the retry-cycle lifetime > explicit and avoid maintaining a destructor for every current and > future allocation in this path. It may be worth measuring how much > extra memory actually accumulates during an extended wait to confirm > the practical impact.
I did some measurements for the memory growth in the manual pg_sync_replication_slots() retry path. The test used 100 failover logical slots and forced pg_sync_replication_slots() to keep retrying on the standby. Memory was sampled with pg_log_backend_memory_contexts() on the backend running the function. On HEAD, the short run showed: timestamp total_bytes delta 2026-05-16T10:47:54Z,63422,1339920, 2026-05-16T10:47:56Z,63422,1405456,65536 2026-05-16T10:47:59Z,63422,1405456,0 2026-05-16T10:48:01Z,63422,1405456,0 2026-05-16T10:48:03Z,63422,1536528,131072 2026-05-16T10:48:05Z,63422,1536528,0 So the total increase was about 192 KiB. After adding a targeted cleanup for the copied RemoteSlot strings, the same test showed: timestamp total_bytes delta 2026-05-16T11:04:58Z,77000,1339920, 2026-05-16T11:05:00Z,77000,1339920,0 2026-05-16T11:05:02Z,77000,1405456,65536 2026-05-16T11:05:04Z,77000,1405456,0 So the increase dropped to about 64 KiB. With a per-retry memory context around the fetch/synchronize cycle, the same test stayed flat: 2026-05-16T11:09:10Z,84600,1315344, 2026-05-16T11:09:12Z,84600,1315344,0 2026-05-16T11:09:14Z,84600,1315344,0 2026-05-16T11:09:16Z,84600,1315344,0 2026-05-16T11:09:18Z,84600,1315344,0 2026-05-16T11:09:20Z,84600,1315344,0 Looking at the memory-context dumps, the growth on HEAD was visible under ExprContext. The grand-total increase matched the ExprContex increase, which is consistent with retry-cycle allocations surviving for the duration of the single SQL function call. That said, the practical severity looks small. This is mainly because the retry loop is not a tight loop. With no slot activity, wait_for_slot_activity() doubles the sleep time up to 30 seconds. So after about 51 seconds it retries only about twice per minute. For 100 slots and no slot activity, a rough 1-hour test from the short runs is on the order of a few MB on HEAD, and around 1 MB with the manual RemoteSlot cleanup. For installations with fewer slots, it should be smaller. So my read is: the accumulation is real; it is modest because of the retry backoff; manually freeing RemoteSlot’s copied strings removes most of the observed growth; a per-retry memory context fully bounds the retry-cycle lifetime -- Regards, Xuneng Zhou HighGo Software Co., Ltd.
mfree.patch
Description: Binary data
#!/usr/bin/env bash
set -euo pipefail
pg_bin=""
workdir=""
port_base=55432
slots=100
samples=20
interval=30
dump_timeout=45
keep=0
primary_port=""
standby_port=""
primary_dir=""
standby_dir=""
primary_log=""
standby_log=""
sync_psql_pid=""
usage()
{
cat <<EOF
Usage:
$0 --pg-bin DIR [--workdir DIR] [--slots N] [--samples N]
[--interval SEC] [--dump-timeout SEC] [--port-base PORT] [--keep]
Creates a primary and standby, creates failover logical slots on the primary,
forces pg_sync_replication_slots() on the standby to wait/retry, then samples
the waiting backend with pg_log_backend_memory_contexts().
Output is CSV: timestamp,backend_pid,total_bytes,delta_bytes
EOF
}
die()
{
echo "error: $*" >&2
exit 1
}
run()
{
echo "+ $*" >&2
"$@"
}
psql_primary()
{
"$pg_bin/psql" -X -v ON_ERROR_STOP=1 \
"host=$workdir port=$primary_port dbname=postgres" "$@"
}
psql_standby()
{
"$pg_bin/psql" -X -v ON_ERROR_STOP=1 \
"host=$workdir port=$standby_port dbname=postgres" "$@"
}
log_size()
{
wc -c < "$standby_log" | tr -d '[:space:]'
}
extract_grand_total_since()
{
local offset=$1
tail -c +"$((offset + 1))" "$standby_log" 2>/dev/null |
awk '
/Grand total:/ {
for (i = 1; i <= NF; i++)
{
if ($i == "total:")
{
gsub(/[^0-9]/, "", $(i + 1));
total = $(i + 1);
}
}
}
END {
if (total != "")
print total;
}
'
}
wait_sql()
{
local node=$1
local sql=$2
local timeout=${3:-60}
for _ in $(seq 1 "$timeout"); do
if [ "$node" = primary ]; then
if psql_primary -Atqc "$sql" 2>/dev/null | grep -qx t; then
return 0
fi
else
if psql_standby -Atqc "$sql" 2>/dev/null | grep -qx t; then
return 0
fi
fi
sleep 1
done
return 1
}
wait_for_log()
{
local pattern=$1
local offset=$2
local timeout=${3:-60}
for _ in $(seq 1 "$timeout"); do
if tail -c +"$((offset + 1))" "$standby_log" 2>/dev/null |
grep "$pattern" >/dev/null; then
return 0
fi
if [ -n "$sync_psql_pid" ] && ! kill -0 "$sync_psql_pid" 2>/dev/null; then
break
fi
sleep 1
done
return 1
}
wait_for_file()
{
local file=$1
local timeout=${2:-60}
for _ in $(seq 1 "$timeout"); do
[ -s "$file" ] && return 0
sleep 1
done
return 1
}
stop_cluster()
{
local datadir=$1
if [ -n "$datadir" ] && [ -d "$datadir" ]; then
"$pg_bin/pg_ctl" -D "$datadir" -m immediate stop >/dev/null 2>&1 || true
fi
}
prepare_workdir()
{
mkdir -p "$workdir"
# Allow reusing a --keep workdir by removing only files created by this
# harness. Leave unrelated files in the directory alone.
stop_cluster "$standby_dir"
stop_cluster "$primary_dir"
rm -rf "$primary_dir" "$standby_dir"
rm -f "$primary_log" "$standby_log"
rm -f "$workdir/create_slots.sql" "$workdir/run_slotsync.sql"
rm -f "$workdir/slotsync.out" "$workdir/slotsync.err"
rm -f "$workdir/slotsync_backend.pid"
}
cleanup()
{
if [ -n "$sync_psql_pid" ] && kill -0 "$sync_psql_pid" 2>/dev/null; then
kill "$sync_psql_pid" 2>/dev/null || true
wait "$sync_psql_pid" 2>/dev/null || true
fi
stop_cluster "$standby_dir"
stop_cluster "$primary_dir"
if [ "$keep" -eq 0 ] && [ -n "$workdir" ] && [ -d "$workdir" ]; then
rm -rf "$workdir"
else
echo "kept workdir: $workdir" >&2
fi
}
while [ $# -gt 0 ]; do
case "$1" in
--pg-bin)
pg_bin=${2:?}
shift 2
;;
--workdir)
workdir=${2:?}
shift 2
;;
--slots)
slots=${2:?}
shift 2
;;
--samples)
samples=${2:?}
shift 2
;;
--interval)
interval=${2:?}
shift 2
;;
--dump-timeout)
dump_timeout=${2:?}
shift 2
;;
--port-base)
port_base=${2:?}
shift 2
;;
--keep)
keep=1
shift
;;
-h|--help)
usage
exit 0
;;
*)
die "unknown option: $1"
;;
esac
done
[ -n "$pg_bin" ] || die "--pg-bin is required"
[ -x "$pg_bin/postgres" ] || die "postgres not found in --pg-bin: $pg_bin"
[ "$slots" -gt 0 ] || die "--slots must be positive"
if [ -z "$workdir" ]; then
workdir=$(mktemp -d "${TMPDIR:-/tmp}/slotsync-memory.XXXXXX")
fi
primary_port=$port_base
standby_port=$((port_base + 1))
primary_dir="$workdir/primary"
standby_dir="$workdir/standby"
primary_log="$workdir/primary.log"
standby_log="$workdir/standby.log"
trap cleanup EXIT
prepare_workdir
echo "workdir=$workdir" >&2
echo "primary_port=$primary_port standby_port=$standby_port slots=$slots" >&2
run "$pg_bin/initdb" -D "$primary_dir" -A trust >/dev/null
cat >> "$primary_dir/postgresql.conf" <<EOF
listen_addresses = ''
unix_socket_directories = '$workdir'
port = $primary_port
wal_level = logical
max_wal_senders = 20
max_replication_slots = $((slots + 10))
max_connections = 100
log_min_messages = debug1
EOF
cat >> "$primary_dir/pg_hba.conf" <<EOF
local all all trust
local replication all trust
EOF
run "$pg_bin/pg_ctl" -D "$primary_dir" -l "$primary_log" -w start >/dev/null
wait_sql primary "SELECT true" 30 || die "primary did not start"
psql_primary -qAtc "SELECT pg_create_physical_replication_slot('standby_slot');" >/dev/null
run "$pg_bin/pg_basebackup" -D "$standby_dir" -h "$workdir" -p "$primary_port" \
-d "dbname=postgres" -X stream >/dev/null
cat >> "$standby_dir/postgresql.conf" <<EOF
listen_addresses = ''
unix_socket_directories = '$workdir'
port = $standby_port
hot_standby = on
hot_standby_feedback = on
primary_conninfo = 'host=$workdir port=$primary_port dbname=postgres application_name=standby'
primary_slot_name = 'standby_slot'
max_replication_slots = $((slots + 10))
max_connections = 100
log_min_messages = debug1
EOF
touch "$standby_dir/standby.signal"
run "$pg_bin/pg_ctl" -D "$standby_dir" -l "$standby_log" -w start >/dev/null
wait_sql standby "SELECT pg_is_in_recovery()" 30 || die "standby did not enter recovery"
slot_sql="$workdir/create_slots.sql"
for i in $(seq 1 "$slots"); do
printf "SELECT pg_create_logical_replication_slot('memslot_%s', 'pgoutput', false, false, true);\n" "$i"
done > "$slot_sql"
psql_primary -q -f "$slot_sql" >/dev/null
psql_primary -q <<'SQL' >/dev/null
CREATE TABLE slotsync_memory_wait(a int);
INSERT INTO slotsync_memory_wait SELECT generate_series(1, 10000);
CHECKPOINT;
SELECT pg_switch_wal();
SQL
target_lsn=$(psql_primary -Atqc "SELECT pg_current_wal_flush_lsn();")
wait_sql standby "SELECT pg_last_wal_replay_lsn() >= '$target_lsn'::pg_lsn" 60 ||
die "standby did not catch up to $target_lsn"
# Advance the standby restart point so the local slot reservation is ahead of
# the remote slot positions, forcing pg_sync_replication_slots() to retry.
psql_standby -qAtc "CHECKPOINT;" >/dev/null || true
pid_file="$workdir/slotsync_backend.pid"
sync_sql="$workdir/run_slotsync.sql"
cat > "$sync_sql" <<EOF
\\pset tuples_only on
\\pset format unaligned
\\o $pid_file
SELECT pg_backend_pid();
\\o
SELECT pg_sync_replication_slots();
EOF
log_offset=$(log_size)
"$pg_bin/psql" -X -v ON_ERROR_STOP=1 \
"host=$workdir port=$standby_port dbname=postgres" \
-f "$sync_sql" > "$workdir/slotsync.out" 2> "$workdir/slotsync.err" &
sync_psql_pid=$!
wait_for_file "$pid_file" 30 || die "timed out waiting for slotsync backend pid"
backend_pid=$(tr -d '[:space:]' < "$pid_file")
wait_for_log "could not synchronize replication slot" "$log_offset" 60 ||
die "pg_sync_replication_slots() did not enter the expected retry path"
echo "timestamp,backend_pid,total_bytes,delta_bytes"
previous_total=""
offset=$(log_size)
for _ in $(seq 1 "$samples"); do
timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
if ! kill -0 "$sync_psql_pid" 2>/dev/null; then
echo "$timestamp,$backend_pid,,"
break
fi
ok=$(psql_standby -Atqc "SELECT pg_log_backend_memory_contexts($backend_pid);" 2>/dev/null || true)
if [ "$ok" != "t" ]; then
echo "warning: pg_log_backend_memory_contexts($backend_pid) returned '$ok'" >&2
echo "$timestamp,$backend_pid,,"
break
fi
total=""
for _ in $(seq 1 "$dump_timeout"); do
total=$(extract_grand_total_since "$offset")
[ -n "$total" ] && break
sleep 1
done
if [ -z "$total" ]; then
echo "warning: no memory dump observed within ${dump_timeout}s for pid $backend_pid" >&2
fi
offset=$(log_size)
if [ -n "$total" ] && [ -n "$previous_total" ]; then
delta=$((total - previous_total))
else
delta=""
fi
echo "$timestamp,$backend_pid,$total,$delta"
if [ -n "$total" ]; then
previous_total=$total
fi
sleep "$interval"
done
