On Thu, Apr 16, 2026 at 7:19 PM Rafael Thofehrn Castro
<[email protected]> wrote:
> Column xact_rollback from pg_stat_database gets inconsistently incremented 
> when logical replication is being used (on publisher side).
...
> This is causing inconsistency in monitoring TPS metric of a database where we 
> eventually see sudden spikes of TPS in the order of millions.

This still reproduces on master.

I agree on the root cause: ReorderBufferProcessTXN() ends each decoded
transaction
with AbortCurrentTransaction() for catalog cleanup; in the walsender
that is a top-level
abort, so AtEOXact_PgStat_Database(isCommit=false) increments the backend-local
pgStatXactRollback.

The counts are flushed to shared stats on walsender exit, producing
an acute spike. Result: for production systems with tight alerting on
xact_rollback, this turns routine logical-replication operations
(disabling a subscription, dropping a slot, walsender restart) into
false-positive pages. Also experienced at GitLab [1][2][3].

Attaching a simple patch that adds a backend-local flag pgStatXactSkipCounters
in pgstat_database.c that AtEOXact_PgStat_Database() honors to skip
the counter bump.

Included a TAP test that fails on master with 5/0 and passes with the patch.

If there is agreement on this shape, happy to send patches for all
supported branches. Let me know what you think.

[1] https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/8290
[2] 
https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/work_items/39
[3] https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/406

Nik

Attachment: v1-xact-rollback-decoding.patch
Description: Binary data

Reply via email to