On Fri, Oct 23, 2020 at 06:06:30PM +0900, Michael Paquier wrote:
> On Fri, Oct 23, 2020 at 04:31:56PM +0800, Julien Rouhaud wrote:
>> Mmm, is it really an improvement to report warnings during this
>> function execution?  Note also that PageIsVerified as-is won't report
>> a warning if a page is found as PageIsNew() but isn't actually all
>> zero, while still being reported as corrupted by the SRF.
> 
> Yep, joining the point of above to just have no WARNINGs at all.

Now that we have d401c57, I got to consider more this one, and opted
for not generating a WARNING for now.  Hence, PageisVerifiedExtended()
is disabled regarding that, but we still report a checksum failure in
it.

I have spent some time reviewing the tests, and as I felt this was
bulky.  In the reworked version attached, I have reduced the number of
tests by half, without reducing the coverage, mainly:
- Removed all the stderr and the return code tests, as we always
expected the commands to succeed, and safe_psql() can do the job
already.
- Merged of the queries using pg_relation_check_pages into a single
routine, with the expected output (set of broken pages returned in the
SRF) in the arguments.
- Added some prefixes to the tests, to generate unique test names.
That makes debug easier.
- The query on pg_stat_database is run once at the beginning, once at
the end with the number of checksum failures correctly updated.
- Added comments to document all the routines, and renamed some of
them mostly for consistency.
- Skipped system relations from the scan of pg_class, making the test
more costly for nothing.
- I ran some tests on Windows, just-in-case.

I have also added a SearchSysCacheExists1() to double-check if the
relation is missing before opening it, added a
CHECK_FOR_INTERRUPTS() within the main check loop (where the error
context is really helpful), indented the code, bumped the catalogs
(mostly a self-reminder), etc.

So, what do you think?
--
Michael
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index f44a09b0c2..e522477780 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*							yyyymmddN */
-#define CATALOG_VERSION_NO	202010201
+#define CATALOG_VERSION_NO	202010271
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index bbcac69d48..a66870bcc0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10958,6 +10958,13 @@
   proallargtypes => '{oid,text,int8,timestamptz}', proargmodes => '{i,o,o,o}',
   proargnames => '{tablespace,name,size,modification}',
   prosrc => 'pg_ls_tmpdir_1arg' },
+{ oid => '9147', descr => 'check pages of a relation',
+  proname => 'pg_relation_check_pages', procost => '10000', prorows => '20',
+  proisstrict => 'f', proretset => 't', provolatile => 'v', proparallel => 'r',
+  prorettype => 'record', proargtypes => 'regclass text',
+  proallargtypes => '{regclass,text,text,int8}', proargmodes => '{i,i,o,o}',
+  proargnames => '{relation,fork,path,failed_block_num}',
+  prosrc => 'pg_relation_check_pages' },
 
 # hash partitioning constraint function
 { oid => '5028', descr => 'hash partition CHECK constraint',
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index ee91b8fa26..a21cab2eaf 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -240,6 +240,9 @@ extern void AtProcExit_LocalBuffers(void);
 
 extern void TestForOldSnapshot_impl(Snapshot snapshot, Relation relation);
 
+extern bool CheckBuffer(struct SMgrRelationData *smgr, ForkNumber forknum,
+						BlockNumber blkno);
+
 /* in freelist.c */
 extern BufferAccessStrategy GetAccessStrategy(BufferAccessStrategyType btype);
 extern void FreeAccessStrategy(BufferAccessStrategy strategy);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 85cd147e21..c6dd084fbc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1300,6 +1300,14 @@ LANGUAGE INTERNAL
 STRICT VOLATILE
 AS 'pg_create_logical_replication_slot';
 
+CREATE OR REPLACE FUNCTION pg_relation_check_pages(
+    IN relation regclass, IN fork text DEFAULT NULL,
+    OUT path text, OUT failed_block_num bigint)
+RETURNS SETOF record
+LANGUAGE internal
+VOLATILE PARALLEL RESTRICTED
+AS 'pg_relation_check_pages';
+
 CREATE OR REPLACE FUNCTION
   make_interval(years int4 DEFAULT 0, months int4 DEFAULT 0, weeks int4 DEFAULT 0,
                 days int4 DEFAULT 0, hours int4 DEFAULT 0, mins int4 DEFAULT 0,
@@ -1444,6 +1452,7 @@ AS 'unicode_is_normalized';
 -- can later change who can access these functions, or leave them as only
 -- available to superuser / cluster owner, if they choose.
 --
+REVOKE EXECUTE ON FUNCTION pg_relation_check_pages(regclass, text) FROM public;
 REVOKE EXECUTE ON FUNCTION pg_start_backup(text, boolean, boolean) FROM public;
 REVOKE EXECUTE ON FUNCTION pg_stop_backup() FROM public;
 REVOKE EXECUTE ON FUNCTION pg_stop_backup(boolean, boolean) FROM public;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 3eee86afe5..eb5c917074 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4585,3 +4585,95 @@ TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
 				(errcode(ERRCODE_SNAPSHOT_TOO_OLD),
 				 errmsg("snapshot too old")));
 }
+
+
+/*
+ * CheckBuffer
+ *
+ * Check the state of a buffer without loading it into the shared buffers. To
+ * avoid torn pages and possible false positives when reading data, a shared
+ * LWLock is taken on the target buffer pool partition mapping, and we check
+ * if the page is in shared buffers or not.  An I/O lock is taken on the block
+ * to prevent any concurrent activity from happening.
+ *
+ * If the page is found as dirty in the shared buffers, it is ignored as
+ * it will be flushed to disk either before the end of the next checkpoint
+ * or during recovery in the event of an unsafe shutdown.
+ *
+ * If the page is found in the shared buffers but is not dirty, we still
+ * check the state of its data on disk, as it could be possible that the
+ * page stayed in shared buffers for a rather long time while the on-disk
+ * data got corrupted.
+ *
+ * If the page is not found in shared buffers, the block is read from disk
+ * while holding the buffer pool partition mapping LWLock.
+ *
+ * The page data is stored a private memory area local to this function while
+ * running the checks.
+ */
+bool
+CheckBuffer(SMgrRelation smgr, ForkNumber forknum, BlockNumber blkno)
+{
+	char		buffer[BLCKSZ];
+	BufferTag	buf_tag;		/* identity of requested block */
+	uint32		buf_hash;		/* hash value for buf_tag */
+	LWLock	   *partLock;		/* buffer partition lock for the buffer */
+	BufferDesc *bufdesc;
+	int			buf_id;
+
+	Assert(smgrexists(smgr, forknum));
+
+	/* create a tag so we can look after the buffer */
+	INIT_BUFFERTAG(buf_tag, smgr->smgr_rnode.node, forknum, blkno);
+
+	/* determine its hash code and partition lock ID */
+	buf_hash = BufTableHashCode(&buf_tag);
+	partLock = BufMappingPartitionLock(buf_hash);
+
+	/* see if the block is in the buffer pool or not */
+	LWLockAcquire(partLock, LW_SHARED);
+	buf_id = BufTableLookup(&buf_tag, buf_hash);
+	if (buf_id >= 0)
+	{
+		uint32		buf_state;
+
+		/*
+		 * Found it.  Now, retrieve its state to know what to do with it, and
+		 * release the pin immediately.  We do so to limit overhead as much as
+		 * possible.  We keep the shared lightweight lock on the target buffer
+		 * mapping partition for now, so this buffer cannot be evicted, and we
+		 * acquire an I/O Lock on the buffer as we may need to read its
+		 * contents from disk.
+		 */
+		bufdesc = GetBufferDescriptor(buf_id);
+
+		LWLockAcquire(BufferDescriptorGetIOLock(bufdesc), LW_SHARED);
+		buf_state = LockBufHdr(bufdesc);
+		UnlockBufHdr(bufdesc, buf_state);
+
+		/* If the page is dirty or invalid, skip it */
+		if ((buf_state & BM_DIRTY) || !(buf_state & BM_TAG_VALID))
+		{
+			LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
+			LWLockRelease(partLock);
+			return true;
+		}
+
+		/* Read the buffer from disk, with the I/O lock still held */
+		smgrread(smgr, forknum, blkno, buffer);
+		LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
+	}
+	else
+	{
+		/*
+		 * Simply read the buffer.  There's no risk of modification on it as
+		 * we are holding the buffer pool partition mapping lock.
+		 */
+		smgrread(smgr, forknum, blkno, buffer);
+	}
+
+	/* buffer lookup done, so now do its check */
+	LWLockRelease(partLock);
+
+	return PageIsVerifiedExtended(buffer, blkno, PIV_REPORT_STAT);
+}
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index b4d55e849b..e2279af1e5 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -69,6 +69,7 @@ OBJS = \
 	oid.o \
 	oracle_compat.o \
 	orderedsetaggs.o \
+	pagefuncs.o \
 	partitionfuncs.o \
 	pg_locale.o \
 	pg_lsn.o \
diff --git a/src/backend/utils/adt/pagefuncs.c b/src/backend/utils/adt/pagefuncs.c
new file mode 100644
index 0000000000..f34d56cf1f
--- /dev/null
+++ b/src/backend/utils/adt/pagefuncs.c
@@ -0,0 +1,230 @@
+/*-------------------------------------------------------------------------
+ *
+ * pagefuncs.c
+ *	  Functions for page related features.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/pagefuncs.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/relation.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/syscache.h"
+
+static void check_one_relation(TupleDesc tupdesc, Tuplestorestate *tupstore,
+							   Oid relid, ForkNumber single_forknum);
+static void check_relation_fork(TupleDesc tupdesc, Tuplestorestate *tupstore,
+								Relation relation, ForkNumber forknum);
+
+/*
+ * callback arguments for check_pages_error_callback()
+ */
+typedef struct CheckPagesErrorInfo
+{
+	char	   *path;
+	BlockNumber blkno;
+} CheckPagesErrorInfo;
+
+/*
+ * Error callback specific to check_relation_fork().
+ */
+static void
+check_pages_error_callback(void *arg)
+{
+	CheckPagesErrorInfo *errinfo = (CheckPagesErrorInfo *) arg;
+
+	errcontext("while checking page %u of path %s",
+			   errinfo->blkno, errinfo->path);
+}
+
+/*
+ * pg_relation_check_pages
+ *
+ * Check the state of all the pages for one or more fork types in the given
+ * relation.
+ */
+Datum
+pg_relation_check_pages(PG_FUNCTION_ARGS)
+{
+	Oid			relid;
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	Tuplestorestate *tupstore;
+	MemoryContext per_query_ctx;
+	MemoryContext oldcontext;
+	ForkNumber	forknum;
+
+	/* Switch into long-lived context to construct returned data structures */
+	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	rsinfo->returnMode = SFRM_Materialize;
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	/* handle arguments */
+	if (PG_ARGISNULL(0))
+	{
+		/* Just leave if nothing is defined */
+		PG_RETURN_VOID();
+	}
+
+	/* By default all the forks of a relation are checked */
+	if (PG_ARGISNULL(1))
+		forknum = InvalidForkNumber;
+	else
+	{
+		const char *forkname = TextDatumGetCString(PG_GETARG_TEXT_PP(1));
+
+		forknum = forkname_to_number(forkname);
+	}
+
+	relid = PG_GETARG_OID(0);
+
+	check_one_relation(tupdesc, tupstore, relid, forknum);
+	tuplestore_donestoring(tupstore);
+
+	return (Datum) 0;
+}
+
+/*
+ * Perform the check on a single relation, possibly filtered with a single
+ * fork.  This function will check if the given relation exists or not, as
+ * a relation could be dropped after checking for the list of relations and
+ * before getting here, and we don't want to error out in this case.
+ */
+static void
+check_one_relation(TupleDesc tupdesc, Tuplestorestate *tupstore,
+				   Oid relid, ForkNumber single_forknum)
+{
+	Relation	relation;
+	ForkNumber	forknum;
+
+	/* Check if relation exists. leaving if there is no such relation */
+	if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
+		return;
+
+	relation = relation_open(relid, AccessShareLock);
+
+	/*
+	 * Sanity checks, returning no results if not support.  Temporary
+	 * relations and relations without storage are out of scope.
+	 */
+	if (!RELKIND_HAS_STORAGE(relation->rd_rel->relkind) ||
+		relation->rd_rel->relpersistence == RELPERSISTENCE_TEMP)
+	{
+		relation_close(relation, AccessShareLock);
+		return;
+	}
+
+	RelationOpenSmgr(relation);
+
+	for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
+	{
+		if (single_forknum != InvalidForkNumber && single_forknum != forknum)
+			continue;
+
+		if (smgrexists(relation->rd_smgr, forknum))
+			check_relation_fork(tupdesc, tupstore, relation, forknum);
+	}
+
+	relation_close(relation, AccessShareLock);
+}
+
+/*
+ * For a given relation and fork, Do the real work of iterating over all pages
+ * and doing the check.  Caller must hold an AccessShareLock lock on the given
+ * relation.
+ */
+static void
+check_relation_fork(TupleDesc tupdesc, Tuplestorestate *tupstore,
+					Relation relation, ForkNumber forknum)
+{
+	BlockNumber blkno,
+				nblocks;
+	SMgrRelation smgr = relation->rd_smgr;
+	char	   *path;
+	CheckPagesErrorInfo errinfo;
+	ErrorContextCallback errcallback;
+
+	/* Number of output arguments in the SRF */
+#define PG_CHECK_RELATION_COLS			2
+
+	Assert(CheckRelationLockedByMe(relation, AccessShareLock, true));
+
+	/*
+	 * We remember the number of blocks here.  Since caller must hold a lock
+	 * on the relation, we know that it won't be truncated while we're
+	 * iterating over the blocks.  Any block added after this function started
+	 * won't be checked, but this is out of scope as such pages will be
+	 * flushed before the next checkpoint's completion.
+	 */
+	nblocks = RelationGetNumberOfBlocksInFork(relation, forknum);
+
+	path = relpathbackend(smgr->smgr_rnode.node,
+						  smgr->smgr_rnode.backend,
+						  forknum);
+
+	/*
+	 * Error context to print some information about blocks and relations
+	 * impacted by corruptions.
+	 */
+	errinfo.path = pstrdup(path);
+	errinfo.blkno = 0;
+	errcallback.callback = check_pages_error_callback;
+	errcallback.arg = (void *) &errinfo;
+	errcallback.previous = error_context_stack;
+	error_context_stack = &errcallback;
+
+	for (blkno = 0; blkno < nblocks; blkno++)
+	{
+		Datum		values[PG_CHECK_RELATION_COLS];
+		bool		nulls[PG_CHECK_RELATION_COLS];
+		int			i = 0;
+
+		/* Update block number for the error context */
+		errinfo.blkno = blkno;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Check the given buffer */
+		if (CheckBuffer(smgr, forknum, blkno))
+			continue;
+
+		memset(values, 0, sizeof(values));
+		memset(nulls, 0, sizeof(nulls));
+
+		values[i++] = CStringGetTextDatum(path);
+		values[i++] = UInt32GetDatum(blkno);
+
+		Assert(i == PG_CHECK_RELATION_COLS);
+
+		/* Save the corrupted blocks in the tuplestore. */
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+
+		pfree(path);
+	}
+
+	/* Pop the error context stack */
+	error_context_stack = errcallback.previous;
+}
diff --git a/src/test/recovery/t/022_page_check.pl b/src/test/recovery/t/022_page_check.pl
new file mode 100644
index 0000000000..7e1f0d1fd8
--- /dev/null
+++ b/src/test/recovery/t/022_page_check.pl
@@ -0,0 +1,234 @@
+# Emulate on-disk corruptions of relation pages and find such corruptions
+# using pg_relation_check_pages().
+
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 20;
+
+our $CHECKSUM_UINT16_OFFSET = 4;
+our $PD_UPPER_UINT16_OFFSET = 7;
+our $BLOCKSIZE;
+our $TOTAL_NB_ERR = 0;
+
+# Grab a relation page worth a size of BLOCKSIZE from given $filename.
+# $blkno is the same block number as for a relation file.
+sub read_page
+{
+	my ($filename, $blkno) = @_;
+	my $block;
+
+	open(my $infile, '<', $filename) or die;
+	binmode($infile);
+
+	my $success = read($infile, $block, $BLOCKSIZE, ($blkno * $BLOCKSIZE));
+	die($!) if !defined($success);
+
+	close($infile);
+
+	return ($block);
+}
+
+# Update an existing page of size BLOCKSIZE with new contents in given
+# $filename.  blkno is the block number assigned in the relation file.
+sub write_page
+{
+	my ($filename, $block, $blkno) = @_;
+
+	open(my $outfile, '>', $filename) or die;
+	binmode($outfile);
+
+	my $nb = syswrite($outfile, $block, $BLOCKSIZE, ($blkno * $BLOCKSIZE));
+
+	die($!) if not defined $nb;
+	die("Write error") if ($nb != $BLOCKSIZE);
+
+	$outfile->flush();
+
+	close($outfile);
+	return;
+}
+
+# Read 2 bytes from relation page at a given offset.
+sub get_uint16_from_page
+{
+	my ($block, $offset) = @_;
+
+	return (unpack("S*", $block))[$offset];
+}
+
+# Write 2 bytes to relation page at a given offset.
+sub set_uint16_to_page
+{
+	my ($block, $data, $offset) = @_;
+
+	my $pack = pack("S", $data);
+
+	# vec with 16B or more won't preserve endianness.
+	vec($block, 2 * $offset, 8) = (unpack('C*', $pack))[0];
+	vec($block, (2 * $offset) + 1, 8) = (unpack('C*', $pack))[1];
+
+	return $block;
+}
+
+# Sanity check on pg_stat_database looking after the number of checksum
+# failures.
+sub check_pg_stat_database
+{
+	my ($node, $test_prefix) = @_;
+
+	my $stdout = $node->safe_psql('postgres',
+		    "SELECT "
+		  . " sum(checksum_failures)"
+		  . " FROM pg_catalog.pg_stat_database");
+	is($stdout, $TOTAL_NB_ERR,
+		"$test_prefix: pg_stat_database should have $TOTAL_NB_ERR error");
+
+	return;
+}
+
+# Run a round of page checks for any relation present in this test run.
+# $expected_broken is the psql output marking all the pages found as
+# corrupted using relname|blkno as format for each tuple returned.  $nb
+# is the new number of added to the global counter matched later with
+# pg_stat_database.
+#
+# Note that this has no need to check system relations as these would have
+# no corruptions: this test does not manipulate them and should by no mean
+# break the cluster.
+sub run_page_checks
+{
+	my ($node, $num_checksum, $expected_broken, $test_prefix) = @_;
+
+	my $stdout = $node->safe_psql('postgres',
+		    "SELECT relname, failed_block_num"
+		  . " FROM (SELECT relname, (pg_catalog.pg_relation_check_pages(oid)).*"
+		  . "   FROM pg_class "
+		  . "   WHERE relkind in ('r','i', 'm') AND oid >= 16384) AS s");
+
+	# Check command result
+	is($stdout, $expected_broken,
+		"$test_prefix: output mismatch with pg_relation_check_pages()");
+
+	$TOTAL_NB_ERR += $num_checksum;
+	return;
+}
+
+# Perform various test that modify a specified block at the specified
+# offset, checking that the page corruption is correctly detected.  The
+# original contents of the page are restored back once done.
+# $broken_pages is the set of pages that are expected to be broken
+# as of the returned result of pg_relation_check_pages().  $num_checksum
+# is the number of checksum failures expected to be added after this
+# function is done.
+sub corrupt_and_test_block
+{
+	my ($node, $filename, $blkno, $offset, $broken_pages, $num_checksum,
+		$test_prefix)
+	  = @_;
+	my $fake_data = hex '0x0000';
+
+	# Stop the server cleanly to flush any pages, and to prevent any
+	# concurrent updates on what is going to be updated.
+	$node->stop;
+	my $original_block = read_page($filename, 0);
+	my $original_data = get_uint16_from_page($original_block, $offset);
+
+	isnt($original_data, $fake_data,
+		"$test_prefix: fake data at offset $offset should be different from the existing one"
+	);
+
+	my $new_block = set_uint16_to_page($original_block, $fake_data, $offset);
+	isnt(
+		$original_data,
+		get_uint16_from_page($new_block, $offset),
+		"$test_prefix: The data at offset $offset should have been changed in memory"
+	);
+
+	write_page($filename, $new_block, 0);
+
+	my $written_data = get_uint16_from_page(read_page($filename, 0), $offset);
+
+	# Some offline checks to validate that the corrupted data is in place.
+	isnt($original_data, $written_data,
+		"$test_prefix: data written at offset $offset should be different from the original one"
+	);
+	is( get_uint16_from_page($new_block, $offset),
+		$written_data,
+		"$test_prefix: data written at offset $offset should be the same as the one in memory"
+	);
+	is($written_data, $fake_data,
+		"$test_prefix: The data written at offset $offset should be the one we wanted to write"
+	);
+
+	# The corruption is in place, start the server to run the checks.
+	$node->start;
+	run_page_checks($node, $num_checksum, $broken_pages, $test_prefix);
+
+	# Stop the server, put the original page back in place.
+	$node->stop;
+
+	$new_block = set_uint16_to_page($original_block, $original_data, $offset);
+	is( $original_data,
+		get_uint16_from_page($new_block, $offset),
+		"$test_prefix: data at offset $offset should have been restored in memory"
+	);
+
+	write_page($filename, $new_block, 0);
+	is( $original_data,
+		get_uint16_from_page(read_page($filename, $blkno), $offset),
+		"$test_prefix: data at offset $offset should have been restored on disk"
+	);
+
+	# There should be no errors now that the contents are back in place.
+	$node->start;
+	run_page_checks($node, 0, '', $test_prefix);
+}
+
+# Data checksums are necessary for this test.
+my $node = get_new_node('main');
+$node->init(extra => ['--data-checksums']);
+$node->start;
+
+my $stdout =
+  $node->safe_psql('postgres', "SELECT" . " current_setting('block_size')");
+
+$BLOCKSIZE = $stdout;
+
+# Basic schema to corrupt and check
+$node->safe_psql(
+	'postgres', q|
+	CREATE TABLE public.t1(id integer);
+	INSERT INTO public.t1 SELECT generate_series(1, 100);
+	CHECKPOINT;
+|);
+
+# Get the path to the relation file that will get manipulated by the
+# follow-up tests with some on-disk corruptions.
+$stdout = $node->safe_psql('postgres',
+	    "SELECT"
+	  . " current_setting('data_directory') || '/' || pg_relation_filepath('t1')"
+);
+
+my $filename = $stdout;
+
+# Normal case without corruptions, this passes, with pg_stat_database
+# reporting no errors.
+check_pg_stat_database($node, 'start');
+
+# Test with a modified checksum.  We use a zero checksum here as it's the only
+# one that cannot exist on a checksummed page.  We also don't have an easy way
+# to compute what the checksum would be after a modification in a random place
+# in the block.
+corrupt_and_test_block($node, $filename, 0, $CHECKSUM_UINT16_OFFSET, 't1|0',
+	1, 'broken checksum');
+
+# Test corruption making the block looks like it's PageIsNew().
+corrupt_and_test_block($node, $filename, 0, $PD_UPPER_UINT16_OFFSET, 't1|0',
+	0, 'new page');
+
+# Check that the number of errors in pg_stat_database match what we
+# expect with the corruptions previously introduced.
+check_pg_stat_database($node, 'end');
diff --git a/src/test/regress/expected/pagefuncs.out b/src/test/regress/expected/pagefuncs.out
new file mode 100644
index 0000000000..38a72b01b3
--- /dev/null
+++ b/src/test/regress/expected/pagefuncs.out
@@ -0,0 +1,72 @@
+--
+-- Tests for functions related to relation pages
+--
+-- Restricted to superusers by default
+CREATE ROLE regress_pgfunc_user;
+SET ROLE regress_pgfunc_user;
+SELECT pg_relation_check_pages('pg_class'); -- error
+ERROR:  permission denied for function pg_relation_check_pages
+SELECT pg_relation_check_pages('pg_class', 'main'); -- error
+ERROR:  permission denied for function pg_relation_check_pages
+RESET ROLE;
+DROP ROLE regress_pgfunc_user;
+-- NULL and simple sanity checks
+SELECT pg_relation_check_pages(NULL); -- empty result
+ pg_relation_check_pages 
+-------------------------
+(0 rows)
+
+SELECT pg_relation_check_pages(NULL, NULL); -- empty result
+ pg_relation_check_pages 
+-------------------------
+(0 rows)
+
+SELECT pg_relation_check_pages('pg_class', 'invalid_fork'); -- error
+ERROR:  invalid fork name
+HINT:  Valid fork names are "main", "fsm", "vm", and "init".
+-- Relation types that are supported
+CREATE TABLE pgfunc_test_tab (id int);
+CREATE INDEX pgfunc_test_ind ON pgfunc_test_tab(id);
+INSERT INTO pgfunc_test_tab VALUES (generate_series(1,1000));
+SELECT pg_relation_check_pages('pgfunc_test_tab');
+ pg_relation_check_pages 
+-------------------------
+(0 rows)
+
+SELECT pg_relation_check_pages('pgfunc_test_ind');
+ pg_relation_check_pages 
+-------------------------
+(0 rows)
+
+DROP TABLE pgfunc_test_tab;
+CREATE MATERIALIZED VIEW pgfunc_test_matview AS SELECT 1;
+SELECT pg_relation_check_pages('pgfunc_test_matview');
+ pg_relation_check_pages 
+-------------------------
+(0 rows)
+
+DROP MATERIALIZED VIEW pgfunc_test_matview;
+CREATE SEQUENCE pgfunc_test_seq;
+SELECT pg_relation_check_pages('pgfunc_test_seq');
+ pg_relation_check_pages 
+-------------------------
+(0 rows)
+
+DROP SEQUENCE pgfunc_test_seq;
+-- pg_relation_check_pages() returns no results if passed relations that
+-- do not support the operation, like relations without storage or temporary
+-- relations.
+CREATE TEMPORARY TABLE pgfunc_test_temp AS SELECT generate_series(1,10) AS a;
+SELECT pg_relation_check_pages('pgfunc_test_temp');
+ pg_relation_check_pages 
+-------------------------
+(0 rows)
+
+DROP TABLE pgfunc_test_temp;
+CREATE VIEW pgfunc_test_view AS SELECT 1;
+SELECT pg_relation_check_pages('pgfunc_test_view');
+ pg_relation_check_pages 
+-------------------------
+(0 rows)
+
+DROP VIEW pgfunc_test_view;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index ae89ed7f0b..7a46a13252 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # ----------
 # Another group of parallel tests
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain pagefuncs
 
 # event triggers cannot run concurrently with any test that runs DDL
 test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 525bdc804f..9a80b80f73 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -197,6 +197,7 @@ test: hash_part
 test: indexing
 test: partition_aggregate
 test: partition_info
+test: pagefuncs
 test: tuplesort
 test: explain
 test: event_trigger
diff --git a/src/test/regress/sql/pagefuncs.sql b/src/test/regress/sql/pagefuncs.sql
new file mode 100644
index 0000000000..12d32eeae4
--- /dev/null
+++ b/src/test/regress/sql/pagefuncs.sql
@@ -0,0 +1,41 @@
+--
+-- Tests for functions related to relation pages
+--
+
+-- Restricted to superusers by default
+CREATE ROLE regress_pgfunc_user;
+SET ROLE regress_pgfunc_user;
+SELECT pg_relation_check_pages('pg_class'); -- error
+SELECT pg_relation_check_pages('pg_class', 'main'); -- error
+RESET ROLE;
+DROP ROLE regress_pgfunc_user;
+
+-- NULL and simple sanity checks
+SELECT pg_relation_check_pages(NULL); -- empty result
+SELECT pg_relation_check_pages(NULL, NULL); -- empty result
+SELECT pg_relation_check_pages('pg_class', 'invalid_fork'); -- error
+
+-- Relation types that are supported
+CREATE TABLE pgfunc_test_tab (id int);
+CREATE INDEX pgfunc_test_ind ON pgfunc_test_tab(id);
+INSERT INTO pgfunc_test_tab VALUES (generate_series(1,1000));
+SELECT pg_relation_check_pages('pgfunc_test_tab');
+SELECT pg_relation_check_pages('pgfunc_test_ind');
+DROP TABLE pgfunc_test_tab;
+
+CREATE MATERIALIZED VIEW pgfunc_test_matview AS SELECT 1;
+SELECT pg_relation_check_pages('pgfunc_test_matview');
+DROP MATERIALIZED VIEW pgfunc_test_matview;
+CREATE SEQUENCE pgfunc_test_seq;
+SELECT pg_relation_check_pages('pgfunc_test_seq');
+DROP SEQUENCE pgfunc_test_seq;
+
+-- pg_relation_check_pages() returns no results if passed relations that
+-- do not support the operation, like relations without storage or temporary
+-- relations.
+CREATE TEMPORARY TABLE pgfunc_test_temp AS SELECT generate_series(1,10) AS a;
+SELECT pg_relation_check_pages('pgfunc_test_temp');
+DROP TABLE pgfunc_test_temp;
+CREATE VIEW pgfunc_test_view AS SELECT 1;
+SELECT pg_relation_check_pages('pgfunc_test_view');
+DROP VIEW pgfunc_test_view;
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f7f401b534..7ef2ec9972 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26182,6 +26182,56 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
   </sect2>
 
+  <sect2 id="functions-data-sanity">
+   <title>Data Sanity Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-data-sanity-table"/>
+    provide ways to check the sanity of data files in the cluster.
+   </para>
+
+   <table id="functions-data-sanity-table">
+    <title>Data Sanity Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_relation_check_pages(<parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type> <literal>DEFAULT</literal> <literal>NULL</literal> ])</function></literal>
+       </entry>
+       <entry><type>setof record</type></entry>
+       <entry>Check the pages of a relation.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <indexterm>
+    <primary>pg_relation_check_pages</primary>
+   </indexterm>
+   <para id="functions-check-relation-note" xreflabel="pg_relation_check_pages">
+    <function>pg_relation_check_pages</function> iterates over all blocks of a
+    given relation and verifies if they are in a state where they can safely
+    be loaded into the shared buffers. If defined,
+    <replaceable>fork</replaceable> specifies that only the pages of the given
+    fork are to be verified. Fork can be <literal>'main'</literal> for the
+    main data fork, <literal>'fsm'</literal> for the free space map,
+    <literal>'vm'</literal> for the visibility map, or
+    <literal>'init'</literal> for the initialization fork.  The default of
+    <literal>NULL</literal> means that all the forks of the relation are
+    checked. The function returns a list of blocks that are considered as
+    corrupted with the path of the related file. Use of this function is
+    restricted to superusers by default but access may be granted to others
+    using <command>GRANT</command>.
+   </para>
+
+  </sect2>
+
   </sect1>
 
   <sect1 id="functions-trigger">
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ff853634bc..b6acade6c6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -332,6 +332,7 @@ CatCacheHeader
 CatalogId
 CatalogIndexState
 ChangeVarNodes_context
+CheckPagesErrorInfo
 CheckPoint
 CheckPointStmt
 CheckpointStatsData

Attachment: signature.asc
Description: PGP signature

Reply via email to