Hi hackers, There is a bug on logical-replication publishers where every decoded committed transaction bumps pg_stat_database.xact_rollback. ReorderBufferProcessTXN() ends each decoded transaction with AbortCurrentTransaction() for catalog cleanup; in the walsender that is a top-level abort, so AtEOXact_PgStat_Database(isCommit=false) increments the backend-local pgStatXactRollback.
The counts are flushed to shared stats on walsender exit, producing an acute spike. Result: for production systems with SREs on call and tight alerting on xact_rollback, this turns routine logical-replication operations (disabling a subscription, dropping a slot, walsender restart) into false-positive pages. Reported in [1]; also experienced at GitLab [2][3][4]. Attaching a simple patch that adds a backend-local flag pgStatXactSkipCounters in pgstat_database.c that AtEOXact_PgStat_Database() honors to skip the counter bump. Added TAP test that fails on master with 5/0 and passes with the patch. If there is agreement on this shape, happy to send patches for all supported branches. Let me know what you think. [1] https://postgr.es/m/CAG0ozMo_xWQn%2BAvv8jzbbhePGp5OnhdO%2BYWTkdg4faWSXz0Jzg%40mail.gmail.com [2] https://gitlab.com/gitlab-com/gl-infra/production/-/work_items/8290 [3] https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/work_items/39 [4] https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/406 Nik
v1-logical-rollback-spike.patch
Description: Binary data
