Re: Add progressive backoff to XactLockTableWait functions

Xuneng Zhou Mon, 28 Jul 2025 04:23:58 -0700

Hi,

On Thu, Jul 17, 2025 at 10:54 PM Xuneng Zhou <[email protected]> wrote:
>
> Hi,
>
> After studying proarray and lmgr more closely, I have found several
> critical issues in the two patches and underestimated their complexity
> and subtlety. Sorry for posting immature patches that may have created
> noise.
>
> I now realize that making lock acquisition— (void) LockAcquire(&tag,
> ShareLock, false, false); in XactLockTableWait
> work on a standby, following Andres’ first suggestion, may be simpler
> and require fewer new helpers.
>
> XactLockTableWait fails on a standby simply because we never call
> XactLockTableInsert. I am considering invoking XactLockTableInsert
> when an XID is added to KnownAssignedXids, and XactLockTableDelete(the
> constraint for delete now is not used for main xid in primary) when it
> is removed. A preliminary implementation and test shows this approach
> kinda works, but still need to confirm it is safe on a standby and
> does not break other things.

Adding an xact lock for every tracked assigned XID on a standby—by
enabling XactLockTableInsert and XactLockTableDelete in the
KnownAssignedXids infrastructure— appears less attractive than I first
thought. The main concern is that the actual rate of xact-lock usage
may not justify the added overhead, even if that overhead is modest.

1) Usage
On a primary server, a xact lock is taken for every assigned XID, and
the cost is justified because xact locks are referenced in many code
paths. On a standby, however—especially in versions where logical
decoding is disabled—xact locks are used far less, if at all. The
current call stacks for XactLockTableWait on a standby (HEAD) are
listed in [1]. The only high-frequency caller is the logical walsender
in a cascading standby; all other callers are infrequent. Does that
single use case warrant creating a xact lock for every known assigned
XID? I don't think so, given that typical configurations have few
cascading standbys. In practice, most xact locks may never be touched
after they are created.

2) Cost
Low usage would be fine if the additional cost were negligible, but
that does not appear to be the case AFAICS.

* There are 5 main uses of the KnownAssignedXids data structure:
* * backends taking snapshots - all valid XIDs need to be copied out
* * backends seeking to determine presence of a specific XID
* * startup process adding new known-assigned XIDs
* * startup process removing specific XIDs as transactions end
* * startup process pruning array when special WAL records arrive
*
* This data structure is known to be a hot spot during Hot Standby, so we
* go to some lengths to make these operations as efficient and as concurrent
* as possible.

KnownAssignedXids is a hot spot like the comments state. The existing
design replaces a hash table with a single array and minimizes
exclusive locks and memory barriers to preserve concurrency. To
respect that design, calls to XactLockTableInsert and
XactLockTableDelete would need to occur outside any exclusive lock.
Unfortunately, that re-introduces the polling we are trying to
eliminate in XactLockTableWait: there is a window between registering
the XIDs and adding their xact locks. During that window, a backend
calling XactLockTableWait may need to poll because
TransactionIdIsInProgress shows the transaction as running while the
xact lock is still missing. If the window were short, this might be
acceptable, but it can be lengthy because XIDs are inserted and
deleted from the array in bulk. Some unlucky backends will therefore
spin before they can actually wait.

Placing the XactLockTableInsert/Delete calls inside the exclusive lock
removes the race window but hurts concurrency—the very issue this
module strives to avoid. Operations on the shared-memory hash table in
lmgr are slower and hold the lock longer than simple array updates, so
backends invoking GetSnapshotData could experience longer wait times.

3) Alternative
Adopt the hash-table + condition-variable design from patch v6, with
additional fixes and polishes.
Back-ends that call XactLockTableWait() sleep on the condition
variable. The startup process broadcasts a wake-up as the XID is
removed from KnownAssignedXids. The broadcast happens after releasing
the exclusive lock. This may be safe—sleeping back-ends remain blocked
until the CV signal arrives, so no extra polling occurs even though
the XID has already vanished from the array. For consistency, all
removal paths (including those used during shutdown or promotion,
where contention is minimal) follow the same pattern. The hash table
is sized at 2 × MaxBackends, which is already conservative for today’s
workload. For more potential concurrent use cases in the future, it
could be switched to larger numbers like TOTAL_MAX_CACHED_SUBXIDS if
necessary; even that would consume only a few megabytes.

Feedbacks welcome.

[1]
https://www.postgresql.org/message-id/CABPTF7Wbp7MRPGsqd9NA4GbcSzUcNz1ymgWfir=yf+n0odr...@mail.gmail.com

Best,
Xuneng

v7-0001-Replace-polling-with-waiting-in-XactLockTableWait.patch
Description: Binary data

Re: Add progressive backoff to XactLockTableWait functions

Reply via email to