Limit index pages visited in planner's get_actual_variable_range
Hi hackers, It seems the get_actual_variable_range function has a long history of fixes attempting to improve its worst-case behaviour, most recently in 9c6ad5eaa95, which limited the number of heap page fetches to 100. There's currently no limit on the number of index pages fetched. We managed to get into trouble after deleting a large number of rows (~8 million) from the start of an index, which caused planning time to blow up on a hot (~250 calls/s) query. During the incident `explain (analyze, buffers)` looked like this: Planning: Buffers: shared hit=88311 Planning Time: 249.902 ms Execution Time: 0.066 ms The planner was burning a huge amount of CPU time looking through index pages for the first visible tuple. The problem eventually resolved when the affected index was vacuumed, but that took several hours to complete. There's a reproduction with a smaller dataset below. Our current workaround to safely bulk delete from these large tables involves delaying deletion of the minimum row until after a vacuum has run, so there's always a visible tuple near the start of the index. It's not realistic for us to run vacuums more frequently (ie. after deleting a smaller number of rows) because they're so time-consuming. The previous discussion [1] touched on the idea of also limiting the number of index page fetches, but there were doubts about the safety of back-patching and the ugliness of modifying the index AM API to support this. I would like to submit our experience as evidence that the lack of limit on index page fetches is a real problem. Even if a fix for this doesn't get back-patched, it would be nice to see it in a major version. As a starting point, I've updated the WIP index page limit patch from Simon Riggs [2] to apply cleanly to master. Regards, Rian [1] https://www.postgresql.org/message-id/flat/CAKZiRmznOwi0oaV%3D4PHOCM4ygcH4MgSvt8%3D5cu_vNCfc8FSUug%40mail.gmail.com [2] https://www.postgresql.org/message-id/CANbhV-GUAo5cOw6XiqBjsLVBQsg%2B%3DkpcCCWYjdTyWzLP28ZX-Q%40mail.gmail.com =# create table test (id bigint primary key) with (autovacuum_enabled = 'off'); =# insert into test select generate_series(1,1000); =# analyze test; An explain with no dead tuples looks like this: =# explain (analyze, buffers) select id from test where id in (select id from test order by id desc limit 1); Planning: Buffers: shared hit=8 Planning Time: 0.244 ms Execution Time: 0.067 ms But if we delete a large number of rows from the start of the index: =# delete from test where id <= 400; The performance doesn't become unreasonable immediately - it's limited to visiting 100 heap pages while it's setting killed bits on the index tuples: =# explain (analyze, buffers) select id from test where id in (select id from test order by id desc limit 1); Planning: Buffers: shared hit=1 read=168 dirtied=163 Planning Time: 5.910 ms Execution Time: 0.107 ms But the number of index buffers visited increases on each query, and eventually all the killed bits are set: $ for i in {1..500}; do psql test -c 'select id from test where id in (select id from test order by id desc limit 1)' >/dev/null; done =# explain (analyze, buffers) select id from test where id in (select id from test order by id desc limit 1); Planning: Buffers: shared hit=11015 Planning Time: 35.772 ms Execution Time: 0.070 ms With the patch: =# explain (analyze, buffers) select id from test where id in (select id from test order by id desc limit 1); Planning: Buffers: shared hit=107 Planning Time: 0.377 ms Execution Time: 0.045 ms diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index dcd04b813d..8d97a5b0c1 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -333,6 +333,9 @@ index_beginscan_internal(Relation indexRelation, scan->parallel_scan = pscan; scan->xs_temp_snap = temp_snap; + scan->xs_page_limit = 0; + scan->xs_pages_visited = 0; + return scan; } @@ -366,6 +369,9 @@ index_rescan(IndexScanDesc scan, scan->kill_prior_tuple = false; /* for safety */ scan->xs_heap_continue = false; + scan->xs_page_limit = 0; + scan->xs_pages_visited = 0; + scan->indexRelation->rd_indam->amrescan(scan, keys, nkeys, orderbys, norderbys); } diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c index 57bcfc7e4c..4d8b8f1c83 100644 --- a/src/backend/access/nbtree/nbtsearch.c +++ b/src/backend/access/nbtree/nbtsearch.c @@ -2189,6 +2189,11 @@ _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno, ScanDirection dir) BTScanPosInvalidate(so->currPos); return false; } + if (unlikely(scan->xs_page_limit > 0) && ++scan->xs_pages_visited > scan->xs_page_limit) + { +BTScanPosInvalidate(so->currPos); +return false; + } /* check for interrupts while we're not holding any buffer lock */ CHECK_FOR_INTERRUPTS(); /* step right one page */ diff --git a/src/backend/utils/adt/selfuncs.c
[PATCH] pg_dump: Do not dump statistics for excluded tables
Hi hackers, I've attached a patch against master that addresses a small bug in pg_dump. Previously, pg_dump would include CREATE STATISTICS statements for tables that were excluded from the dump, causing reload to fail if any excluded tables had extended statistics. The patch skips the creation of the StatsExtInfo if the associated table does not have the DUMP_COMPONENT_DEFINITION flag set. This is similar to how getPublicationTables behaves if a table is excluded. I've covered this with a regression test by altering one of the CREATE STATISTICS examples to work with the existing 'exclude_test_table' run. Without the fix, that causes the test to fail with: # Failed test 'exclude_test_table: should not dump CREATE STATISTICS extended_stats_no_options' # at t/002_pg_dump.pl line 4934. Regards, RianFrom 0fe06338728981fa727e3e6b99247741deda75fb Mon Sep 17 00:00:00 2001 From: Rian McGuire Date: Wed, 27 Dec 2023 13:09:31 +1100 Subject: [PATCH] pg_dump: Do not dump statistics for excluded tables Previously, pg_dump would include CREATE STATISTICS statements for tables that were excluded from the dump, causing reload to fail. --- src/bin/pg_dump/pg_dump.c| 43 +++- src/bin/pg_dump/t/002_pg_dump.pl | 5 ++-- 2 files changed, 34 insertions(+), 14 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 8c0b5486b9..c67ed416e9 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -7288,11 +7288,13 @@ getExtendedStatistics(Archive *fout) int ntups; int i_tableoid; int i_oid; + int i_stxrelid; int i_stxname; int i_stxnamespace; int i_stxowner; int i_stattarget; - int i; + int i, + j; /* Extended statistics were new in v10 */ if (fout->remoteVersion < 10) @@ -7301,11 +7303,11 @@ getExtendedStatistics(Archive *fout) query = createPQExpBuffer(); if (fout->remoteVersion < 13) - appendPQExpBufferStr(query, "SELECT tableoid, oid, stxname, " + appendPQExpBufferStr(query, "SELECT tableoid, oid, stxrelid, stxname, " "stxnamespace, stxowner, (-1) AS stxstattarget " "FROM pg_catalog.pg_statistic_ext"); else - appendPQExpBufferStr(query, "SELECT tableoid, oid, stxname, " + appendPQExpBufferStr(query, "SELECT tableoid, oid, stxrelid, stxname, " "stxnamespace, stxowner, stxstattarget " "FROM pg_catalog.pg_statistic_ext"); @@ -7315,27 +7317,44 @@ getExtendedStatistics(Archive *fout) i_tableoid = PQfnumber(res, "tableoid"); i_oid = PQfnumber(res, "oid"); + i_stxrelid = PQfnumber(res, "stxrelid"); i_stxname = PQfnumber(res, "stxname"); i_stxnamespace = PQfnumber(res, "stxnamespace"); i_stxowner = PQfnumber(res, "stxowner"); i_stattarget = PQfnumber(res, "stxstattarget"); + /* this allocation may be more than we need */ statsextinfo = (StatsExtInfo *) pg_malloc(ntups * sizeof(StatsExtInfo)); + j = 0; for (i = 0; i < ntups; i++) { - statsextinfo[i].dobj.objType = DO_STATSEXT; - statsextinfo[i].dobj.catId.tableoid = atooid(PQgetvalue(res, i, i_tableoid)); - statsextinfo[i].dobj.catId.oid = atooid(PQgetvalue(res, i, i_oid)); - AssignDumpId([i].dobj); - statsextinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, i_stxname)); - statsextinfo[i].dobj.namespace = + Oid stxrelid; + TableInfo *tbinfo; + + /* +* Only dump extended statistics if we're going to dump the definition +* of the table that they're associated with. +*/ + stxrelid = atoi(PQgetvalue(res, i, i_stxrelid)); + tbinfo = findTableByOid(stxrelid); + if (tbinfo == NULL || !(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)) + continue; + + statsextinfo[j].dobj.objType = DO_STATSEXT; + statsextinfo[j].dobj.catId.tableoid = atooid(PQgetvalue(res, i, i_tableoid)); + statsextinfo[j].dobj.catId.oid = atooid(PQgetvalue(res, i, i_oid)); + AssignDumpId([j].dobj); + statsex