On Thu, Nov 24, 2022 at 2:26 AM Andres Freund <and...@anarazel.de> wrote: > > Indeed. This is why I was thinking that just alerting for overflowed xact > isn't particularly helpful. You really want to see how much they overflow and > how often.
I think the way of monitoring the subtransaction count and overflow status is helpful at least for troubleshooting purposes. By regularly monitoring user will know which backend(pid) is particularly using more subtransactions and prone to overflow and which backends are actually frequently causing sub-overflow. > I think they're just not always avoidable, even in a very well operated > system. > > > I wonder if we could lower the impact of suboverflowed snapshots by improving > the representation in PGPROC and SnapshotData. What if we > > a) Recorded the min and max assigned subxid in PGPROC > > b) Instead of giving up in GetSnapshotData() once we see a suboverflowed > PGPROC, store the min/max subxid of the proc in SnapshotData. We could > reliably "steal" space for that from ->subxip, as we won't need to store > subxids for that proc. > > c) When determining visibility with a suboverflowed snapshot we use the > ranges from b) to check whether we need to do a subtrans lookup. I think > that'll often prevent subtrans lookups. > > d) If we encounter a subxid whose parent is in progress and not in ->subxid, > and subxcnt isn't the max, add that subxid to subxip. That's not free > because we'd basically need to do an insertion sort, but likely still a lot > cheaper than doing repeated subtrans lookups. > > I think we'd just need a one or two additional fields in SnapshotData. +1 I think this approach will be helpful in many cases, especially when only some of the backend is creating sub-overflow and impacting overall system performance. Now, most of the xids especially the top xid will not fall in that range (unless that sub-overflowing backend is constantly generating subxids and increasing its range) and the lookups for that xids can be done directly in the snapshot's xip array. On another thought, in XidInMVCCSnapshot() in case of sub-overflow why don't we look into the snapshot's xip array first and see if the xid exists there? if not then we can look into the pg_subtrans SLRU and fetch the top xid and relook again into the xip array. It will be more costly in cases where we do not find xid in the xip array because then we will have to search this array twice but I think looking into this array is much cheaper than directly accessing SLRU. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com