Hi Hackers,
$SUBJECT happens if we crash just after extending the index. Noah
Misch looked into it and came up with the steps to simulate this by
patching pgstattuple to extend each rel it visits:
====
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -254,6 +254,8 @@ pgstat_relation(Relation rel, FunctionCallInfo fcinfo)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot access temporary
tables of other sessions")));
+ ReleaseBuffer(ExtendBufferedRel(BMR_REL(rel), MAIN_FORKNUM,
NULL, EB_CLEAR_SIZE_CACHE));
+
if (RELKIND_HAS_TABLE_AM(rel->rd_rel->relkind) ||
====
Then, in the regression database, this reaches the error for each hash index and
each gist index:
====
CREATE EXTENSION pgstattuple;
CREATE FUNCTION pgstattuple_safe(IN reloid regclass) RETURNS text
LANGUAGE plpgsql AS $$
BEGIN
RETURN pgstattuple($1);
EXCEPTION WHEN OTHERS THEN
RETURN sqlerrm;
END
$$;
SELECT * from (
SELECT pgstattuple_safe(oid), relkind, oid::regclass
FROM pg_class
WHERE pg_relation_filenode(oid) <> 0
)
WHERE pgstattuple_safe NOT LIKE '%supported%'
AND pgstattuple_safe NOT LIKE '(%'
ORDER BY 1;
====
There have been prior reports:
2016-09
https://www.postgresql.org/message-id/CAA4eK1%2BzT16bxVpF3J9UsiZxExCqFa6QxzUfazmJthW6dO78ww%40mail.gmail.com
2016-09
https://www.postgresql.org/message-id/CAE9k0PnCaBkMgsDGuuPnPPTrQUc%3Dy9NiQvvsFFQkDNGcjYSajg%40mail.gmail.com
2021-02 docs
https://www.postgresql.org/message-id/[email protected]
For hash indexes, this error comes from pgstat_hash_page() ->
_hash_getbuf_with_strategy() -> _hash_checkpage() for a zero page.
Similarly gistcheckpage() is the cause for gist indexes.
Since after a crash recovery a zero page is normal (as long as not
referenced from the rest of the index), emitting
ERRCODE_INDEX_CORRUPTED is a false alarm.
To fix this, we propose that we remove the checks from
pgstat_hash_page() and pgstat_gist_page(). This is something which
pgstat_btree_page() already does. It uses a lower-level function
ReadBufferExtended() and avoids using _bt_getbuf() which would check
for the "unexpected zero page".
I'm attaching a patch for the above which does the following :
1. Replaces _hash_getbuf_with_strategy() with ReadBufferExtended() in
pgstat_hash_page(). Then for non-PageIsNew() pages, we explicitly
check the PageGetSpecialSize().
2. Similarly gistcheckpage() is removed in pgstat_gist_page(). And for
non-PageIsNew(), we explicitly check the PageGetSpecialSize() before
checking if it's a leaf page.
I confirmed that this was working by running the above mentioned
simulation steps along with the patch.
Note that _hash_getbuf_with_strategy() also checks if blkno is P_NEW.
But that check seems unnecessary here as P_NEW is the same as
InvalidBlockNo and pgstat_hash_page() is called by pgstat_index() for
the block nos starting from HASH_METAPAGE + 1 and remains <
RelationGetNumberOfBlocks().
An alternative might be that we still call the check functions for the
non-PageIsNew() cases. Following were Noah Misch's comments on these
options :
I mildly lean toward not calling these *check*page functions, for two
reasons. First, pgstattuple() is not a general integrity checker. It
should try its best to count things, and it
shouldn't go out of its way to report corruption. Second, pgstattuple doesn't
depend on access-method-specific index integrity. pgstat_gist_page() is a thin
wrapper around the generic pgstat_index_page(). pgstat_hash_page() does
slightly more, but a page failing the _hash_checkpage() checks still wouldn't
elicit intolerable behavior from pgstat_hash_page().
Let me know what you folks think of this.
Thanks & Regards,
Nitin Motiani
Google
From cdff70751e7b146a73a72842f7be2ada44cfbb73 Mon Sep 17 00:00:00 2001
From: Nitin Motiani <[email protected]>
Date: Fri, 12 Sep 2025 08:49:07 +0000
Subject: [PATCH v1] Fix "unexpected zero page" in pgstattuple for hash and
gist indexes
* pgstattuple calls check functions for all the pages in hash and
gist indexes. These calls are made from pgstat_hash_page() and
pgstat_gist_page(). For zero pages, these call lead to ERRCODE_INDEX_CORRUPTED.
* Since zero pages are normal after a crash, these errors are false alarms.
pgstat_btree_page() avoids these errors by not running these checks.
* In this patch, we do the same for gist and hash indexes. For hash index,
ReadBufferExtended() replaces _hash_getbuf_with_strategy() which was
calling _hash_check_page(). Instead we do the required check for special
size explicitly for the non-PageIsNew() cases.
* Similarly for gist index, we remove gistcheckpage() and explicitly
checks the special size for the non-PageIsNew() cases before checking
if the page is a leaf.
* The comments in the else clauses have been updated accordingly.
Reported-by: Noah Misch <[email protected]>
---
contrib/pgstattuple/pgstattuple.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index b5de68b7232..cb0c6c70758 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -458,10 +458,12 @@ pgstat_hash_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno,
Buffer buf;
Page page;
- buf = _hash_getbuf_with_strategy(rel, blkno, HASH_READ, 0, bstrategy);
+ buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
+ LockBuffer(buf, HASH_READ);
page = BufferGetPage(buf);
- if (PageGetSpecialSize(page) == MAXALIGN(sizeof(HashPageOpaqueData)))
+ if (!PageIsNew(page) &&
+ PageGetSpecialSize(page) == MAXALIGN(sizeof(HashPageOpaqueData)))
{
HashPageOpaque opaque;
@@ -484,7 +486,7 @@ pgstat_hash_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno,
}
else
{
- /* maybe corrupted */
+ /* maybe corrupted or a zero page */
}
_hash_relbuf(rel, buf);
@@ -502,17 +504,18 @@ pgstat_gist_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno,
buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
LockBuffer(buf, GIST_SHARE);
- gistcheckpage(rel, buf);
page = BufferGetPage(buf);
- if (GistPageIsLeaf(page))
+ if (!PageIsNew(page) &&
+ PageGetSpecialSize(page) == MAXALIGN(sizeof(GISTPageOpaqueData)) &&
+ GistPageIsLeaf(page))
{
pgstat_index_page(stat, page, FirstOffsetNumber,
PageGetMaxOffsetNumber(page));
}
else
{
- /* root or node */
+ /* root or node or zero page or corrupted */
}
UnlockReleaseBuffer(buf);
--
2.51.0.384.g4c02a37b29-goog