Hi, On Thu, Jul 17, 2025 at 10:54 PM Xuneng Zhou <xunengz...@gmail.com> wrote: > > Hi, > > After studying proarray and lmgr more closely, I have found several > critical issues in the two patches and underestimated their complexity > and subtlety. Sorry for posting immature patches that may have created > noise. > > I now realize that making lock acquisition— (void) LockAcquire(&tag, > ShareLock, false, false); in XactLockTableWait > work on a standby, following Andres’ first suggestion, may be simpler > and require fewer new helpers. > > XactLockTableWait fails on a standby simply because we never call > XactLockTableInsert. I am considering invoking XactLockTableInsert > when an XID is added to KnownAssignedXids, and XactLockTableDelete(the > constraint for delete now is not used for main xid in primary) when it > is removed. A preliminary implementation and test shows this approach > kinda works, but still need to confirm it is safe on a standby and > does not break other things.
Adding an xact lock for every tracked assigned XID on a standby—by enabling XactLockTableInsert and XactLockTableDelete in the KnownAssignedXids infrastructure— appears less attractive than I first thought. The main concern is that the actual rate of xact-lock usage may not justify the added overhead, even if that overhead is modest. 1) Usage On a primary server, a xact lock is taken for every assigned XID, and the cost is justified because xact locks are referenced in many code paths. On a standby, however—especially in versions where logical decoding is disabled—xact locks are used far less, if at all. The current call stacks for XactLockTableWait on a standby (HEAD) are listed in [1]. The only high-frequency caller is the logical walsender in a cascading standby; all other callers are infrequent. Does that single use case warrant creating a xact lock for every known assigned XID? I don't think so, given that typical configurations have few cascading standbys. In practice, most xact locks may never be touched after they are created. 2) Cost Low usage would be fine if the additional cost were negligible, but that does not appear to be the case AFAICS. * There are 5 main uses of the KnownAssignedXids data structure: * * backends taking snapshots - all valid XIDs need to be copied out * * backends seeking to determine presence of a specific XID * * startup process adding new known-assigned XIDs * * startup process removing specific XIDs as transactions end * * startup process pruning array when special WAL records arrive * * This data structure is known to be a hot spot during Hot Standby, so we * go to some lengths to make these operations as efficient and as concurrent * as possible. KnownAssignedXids is a hot spot like the comments state. The existing design replaces a hash table with a single array and minimizes exclusive locks and memory barriers to preserve concurrency. To respect that design, calls to XactLockTableInsert and XactLockTableDelete would need to occur outside any exclusive lock. Unfortunately, that re-introduces the polling we are trying to eliminate in XactLockTableWait: there is a window between registering the XIDs and adding their xact locks. During that window, a backend calling XactLockTableWait may need to poll because TransactionIdIsInProgress shows the transaction as running while the xact lock is still missing. If the window were short, this might be acceptable, but it can be lengthy because XIDs are inserted and deleted from the array in bulk. Some unlucky backends will therefore spin before they can actually wait. Placing the XactLockTableInsert/Delete calls inside the exclusive lock removes the race window but hurts concurrency—the very issue this module strives to avoid. Operations on the shared-memory hash table in lmgr are slower and hold the lock longer than simple array updates, so backends invoking GetSnapshotData could experience longer wait times. 3) Alternative Adopt the hash-table + condition-variable design from patch v6, with additional fixes and polishes. Back-ends that call XactLockTableWait() sleep on the condition variable. The startup process broadcasts a wake-up as the XID is removed from KnownAssignedXids. The broadcast happens after releasing the exclusive lock. This may be safe—sleeping back-ends remain blocked until the CV signal arrives, so no extra polling occurs even though the XID has already vanished from the array. For consistency, all removal paths (including those used during shutdown or promotion, where contention is minimal) follow the same pattern. The hash table is sized at 2 × MaxBackends, which is already conservative for today’s workload. For more potential concurrent use cases in the future, it could be switched to larger numbers like TOTAL_MAX_CACHED_SUBXIDS if necessary; even that would consume only a few megabytes. Feedbacks welcome. [1] https://www.postgresql.org/message-id/CABPTF7Wbp7MRPGsqd9NA4GbcSzUcNz1ymgWfir=yf+n0odr...@mail.gmail.com Best, Xuneng
v7-0001-Replace-polling-with-waiting-in-XactLockTableWait.patch
Description: Binary data