On Sat, Nov 30, 2019 at 10:04:07AM -0800, Mark Dilger wrote: > On 11/29/19 2:21 PM, David Fetter wrote: > > On Fri, Nov 29, 2019 at 07:01:39PM +0100, David Fetter wrote: > > > Folks, > > > > > > Per a suggestion Christophe made, please find attached a patch to > > > $Subject: > > > > > > Apart from carefully fudging with pg_resetwal, and short of running in > > > production for a few weeks, what would be some good ways to test this? > > > > Per discussion on IRC with Sehrope Sarkuni, please find attached a > > patch with one fewer bug, this one in the repalloc() calls. > > Hello David, > > Here are my initial thoughts. > > Although you appear to be tackling the problem of vacuuming tables > with older Xids first *per database*,
Yes, that's what's come up for me in production, but lately, production has consisted of a single active DB maxing out hardware. I can see how in other situations--multi-tenant, especially--it would make more sense to sort the DBs first. > have you considered changing the logic in building and sorting the > database list in get_database_list and rebuild_database_list? I'm > just curious what your thoughts might be on this subject. I hadn't, but now that you mention it, it seems like a reasonable thing to try. > As far as sorting the list of tables in an array and then copying > that array into a linked list, I think there is no need. The > copying of table_ages into table_oids is followed immediately by > > foreach(cell, table_oids) > > and then table_oids seems not to serve any further purpose. Perhaps > you can just iterate over table_ages directly and avoid the extra > copying. I hadn't looked toward any optimizations in this section, given that the vacuums in question can take hours or days, but I can see how that would make the code cleaner, so please find that change attached. > I have not tested this change, but I may do so later today or perhaps > on Monday. Thanks for looking at this! Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 16aca9efcf1f9402f80acd70701815990327e956 Mon Sep 17 00:00:00 2001 From: David Fetter <da...@fetter.org> Date: Wed, 27 Nov 2019 23:50:48 -0800 Subject: [PATCH v3] Autovacuum tables in descending order of age To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------2.23.0" This is a multi-part message in MIME format. --------------2.23.0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index c1dd8168ca..038b3f1272 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -288,6 +288,15 @@ typedef struct AutoVacuumWorkItem av_workItems[NUM_WORKITEMS]; } AutoVacuumShmemStruct; +/* + * Struct for deciding which tables to vacuum first + */ +typedef struct +{ + uint32 relfrozenxid_age; + Oid table_oid; +} RelFrozenXidAge; + static AutoVacuumShmemStruct *AutoVacuumShmem; /* @@ -319,6 +328,7 @@ static void rebuild_database_list(Oid newdb); static int db_comparator(const void *a, const void *b); static void autovac_balance_cost(void); +static int rfxa_comparator(const void *a, const void *b); static void do_autovacuum(void); static void FreeWorkerInfo(int code, Datum arg); @@ -1936,7 +1946,9 @@ do_autovacuum(void) HeapTuple tuple; TableScanDesc relScan; Form_pg_database dbForm; - List *table_oids = NIL; + RelFrozenXidAge *table_ages; + int table_ages_size = 1024; + int table_ages_count = 0; List *orphan_oids = NIL; HASHCTL ctl; HTAB *table_toast_map; @@ -1951,6 +1963,7 @@ do_autovacuum(void) bool found_concurrent_worker = false; int i; + table_ages = (RelFrozenXidAge *)palloc(table_ages_size * sizeof(RelFrozenXidAge)); /* * StartTransactionCommand and CommitTransactionCommand will automatically * switch to other contexts. We need this one to keep the list of @@ -2102,9 +2115,22 @@ do_autovacuum(void) effective_multixact_freeze_max_age, &dovacuum, &doanalyze, &wraparound); - /* Relations that need work are added to table_oids */ + /* Relations that need work are added to table_ages */ if (dovacuum || doanalyze) - table_oids = lappend_oid(table_oids, relid); + { + RelFrozenXidAge rfxa; + rfxa.relfrozenxid_age = DirectFunctionCall1(xid_age, + classForm->relfrozenxid); + rfxa.table_oid = relid; + if (table_ages_count == table_ages_size) + { + table_ages_size *= 2; + table_ages = (RelFrozenXidAge *)repalloc(table_ages, table_ages_size * sizeof(RelFrozenXidAge)); + } + table_ages[table_ages_count] = rfxa; + table_ages_count++; + } + /* * Remember TOAST associations for the second pass. Note: we must do @@ -2187,7 +2213,20 @@ do_autovacuum(void) /* ignore analyze for toast tables */ if (dovacuum) - table_oids = lappend_oid(table_oids, relid); + { + RelFrozenXidAge rfxa; + + rfxa.relfrozenxid_age = DirectFunctionCall1(xid_age, + classForm->relfrozenxid); + rfxa.table_oid = relid; + + if (table_ages_count == table_ages_size) + { + table_ages_size *= 2; + table_ages = (RelFrozenXidAge *)repalloc(table_ages, table_ages_size * sizeof(RelFrozenXidAge)); + } + table_ages[table_ages_count++] = rfxa; + } } table_endscan(relScan); @@ -2295,12 +2334,15 @@ do_autovacuum(void) "Autovacuum Portal", ALLOCSET_DEFAULT_SIZES); + + qsort(table_ages, table_ages_count, sizeof(table_ages), rfxa_comparator); + /* * Perform operations on collected tables. */ - foreach(cell, table_oids) + for (int i = 0; i < table_ages_count; i++) { - Oid relid = lfirst_oid(cell); + Oid relid = table_ages[i].table_oid; HeapTuple classTup; autovac_table *tab; bool isshared; @@ -2608,6 +2650,22 @@ deleted: CommitTransactionCommand(); } +/* + * qsort comparator for RelFrozenXidAges + * + * N.B. This comparator sorts in descending order. + */ +static int +rfxa_comparator(const void *a, const void *b) +{ + const RelFrozenXidAge *pa = a; + const RelFrozenXidAge *pb = b; + if (pa->relfrozenxid_age == pb->relfrozenxid_age) + return 0; + else + return (pa->relfrozenxid_age > pb->relfrozenxid_age) ? 1 : -1; +} + /* * Execute a previously registered work item. */ --------------2.23.0--