/me dons flame-proof suit
My goal with this thread is to produce some incremental autovacuum
scheduling improvements for v19, but realistically speaking, I know that
it's a bit of a long-shot. There have been many discussions over the
years, and I've read through a few of them [0] [1] [2] [3] [4], but there
are certainly others I haven't found. Since this seems to be a contentious
topic, I figured I'd start small to see if we can get _something_
committed.
While I am by no means wedded to a specific idea, my current concrete
proposal (proof-of-concept patch attached) is to start by ordering the
tables a worker will process by (M)XID age. Here are the reasons:
* We do some amount of prioritization of databases at risk of wraparound at
database level, per the following comment from autovacuum.c:
* Choose a database to connect to. We pick the database that was least
* recently auto-vacuumed, or one that needs vacuuming to prevent Xid
* wraparound-related data loss. If any db at risk of Xid wraparound is
* found, we pick the one with oldest datfrozenxid, independently of
* autovacuum times; similarly we pick the one with the oldest
datminmxid
* if any is in MultiXactId wraparound. Note that those in Xid
wraparound
* danger are given more priority than those in multi wraparound danger.
However, we do no such prioritization of the tables within a database. In
fact, the ordering of the tables is effectively random. IMHO this gives us
quite a bit of wiggle room to experiment; since we are processing tables in
no specific order today, changing the order to something vacuuming-related
seems more likely to help than it is to harm.
* Prioritizing tables based on their (M)XID age might help avoid more
aggressive vacuums, not to mention wraparound. Of course, there are
scenarios where this doesn't work. For example, the age of a table may
have changed greatly between the time we recorded it and the time we
process it. Or maybe there is another table in a different database that
is more important from a wraparound perspective. We could complicate the
patch to try to handle some of these things, but I maintain that even some
basic, incremental scheduling improvements would be better than the status
quo. And we can always change it further in the future to handle these
problems and to consider other things like bloat.
The attached patch works by storing the maximum of the XID age and the MXID
age in the list with the OIDs and sorting it prior to processing.
Thoughts?
[0]
https://postgr.es/m/CA%2BTgmoafJPjB3WVqB3FrGWUU4NLRc3VHx8GXzLL-JM%2B%2BJPwK%2BQ%40mail.gmail.com
[1]
https://postgr.es/m/CAEG8a3%2B3fwQbgzak%2Bh3Q7Bp%3DvK_aWhw1X7w7g5RCgEW9ufdvtA%40mail.gmail.com
[2]
https://postgr.es/m/CAD21AoBUaSRBypA6pd9ZD%3DU-2TJCHtbyZRmrS91Nq0eVQ0B3BA%40mail.gmail.com
[3]
https://postgr.es/m/CA%2BTgmobT3m%3D%2BdU5HF3VGVqiZ2O%2Bv6P5wN1Gj%2BPrq%2Bhj7dAm9AQ%40mail.gmail.com
[4] https://postgr.es/m/20130124215715.GE4528%40alvh.no-ip.org
--
nathan
>From 7930f6ac213b9145489c4e9253872c8bb4fb5855 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <[email protected]>
Date: Wed, 8 Oct 2025 10:12:14 -0500
Subject: [PATCH v1 1/1] autovacuum: order tables by (m)xid age
---
src/backend/postmaster/autovacuum.c | 47 ++++++++++++++++++++++++-----
src/tools/pgindent/typedefs.list | 1 +
2 files changed, 41 insertions(+), 7 deletions(-)
diff --git a/src/backend/postmaster/autovacuum.c
b/src/backend/postmaster/autovacuum.c
index fb5d3b27224..aeef28abd67 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -309,6 +309,12 @@ static AutoVacuumShmemStruct *AutoVacuumShmem;
static dlist_head DatabaseList = DLIST_STATIC_INIT(DatabaseList);
static MemoryContext DatabaseListCxt = NULL;
+typedef struct
+{
+ Oid oid;
+ uint32 age;
+} TableToProcess;
+
/*
* Dummy pointer to persuade Valgrind that we've not leaked the array of
* avl_dbase structs. Make it global to ensure the compiler doesn't
@@ -1888,6 +1894,15 @@ get_database_list(void)
return dblist;
}
+static int
+TableToProcessComparator(const ListCell *a, const ListCell *b)
+{
+ TableToProcess *t1 = (TableToProcess *) lfirst(a);
+ TableToProcess *t2 = (TableToProcess *) lfirst(b);
+
+ return pg_cmp_u32(t2->age, t1->age);
+}
+
/*
* Process a database table-by-table
*
@@ -1901,7 +1916,7 @@ do_autovacuum(void)
HeapTuple tuple;
TableScanDesc relScan;
Form_pg_database dbForm;
- List *table_oids = NIL;
+ List *tables_to_process = NIL;
List *orphan_oids = NIL;
HASHCTL ctl;
HTAB *table_toast_map;
@@ -2055,9 +2070,17 @@ do_autovacuum(void)
effective_multixact_freeze_max_age,
&dovacuum,
&doanalyze, &wraparound);
- /* Relations that need work are added to table_oids */
+ /* Relations that need work are added to tables_to_process */
if (dovacuum || doanalyze)
- table_oids = lappend_oid(table_oids, relid);
+ {
+ TableToProcess *table = palloc(sizeof(TableToProcess));
+
+ table->oid = relid;
+ table->age = recentXid - classForm->relfrozenxid;
+ table->age = Max(table->age, recentMulti -
classForm->relminmxid);
+
+ tables_to_process = lappend(tables_to_process, table);
+ }
/*
* Remember TOAST associations for the second pass. Note: we
must do
@@ -2149,7 +2172,15 @@ do_autovacuum(void)
/* ignore analyze for toast tables */
if (dovacuum)
- table_oids = lappend_oid(table_oids, relid);
+ {
+ TableToProcess *table = palloc(sizeof(TableToProcess));
+
+ table->oid = relid;
+ table->age = recentXid - classForm->relfrozenxid;
+ table->age = Max(table->age, recentMulti -
classForm->relminmxid);
+
+ tables_to_process = lappend(tables_to_process, table);
+ }
/* Release stuff to avoid leakage */
if (free_relopts)
@@ -2273,6 +2304,8 @@ do_autovacuum(void)
MemoryContextSwitchTo(AutovacMemCxt);
}
+ list_sort(tables_to_process, TableToProcessComparator);
+
/*
* Optionally, create a buffer access strategy object for VACUUM to use.
* We use the same BufferAccessStrategy object for all tables VACUUMed
by
@@ -2301,9 +2334,9 @@ do_autovacuum(void)
/*
* Perform operations on collected tables.
*/
- foreach(cell, table_oids)
+ foreach_ptr(TableToProcess, table, tables_to_process)
{
- Oid relid = lfirst_oid(cell);
+ Oid relid = table->oid;
HeapTuple classTup;
autovac_table *tab;
bool isshared;
@@ -2534,7 +2567,7 @@ deleted:
pg_atomic_test_set_flag(&MyWorkerInfo->wi_dobalance);
}
- list_free(table_oids);
+ list_free_deep(tables_to_process);
/*
* Perform additional work items, as requested by backends.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 02b5b041c45..1abfc338760 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3004,6 +3004,7 @@ TableScanDesc
TableScanDescData
TableSpaceCacheEntry
TableSpaceOpts
+TableToProcess
TablespaceList
TablespaceListCell
TapeBlockTrailer
--
2.39.5 (Apple Git-154)