On Fri, Oct 23, 2020 at 06:06:30PM +0900, Michael Paquier wrote: > On Fri, Oct 23, 2020 at 04:31:56PM +0800, Julien Rouhaud wrote: >> Mmm, is it really an improvement to report warnings during this >> function execution? Note also that PageIsVerified as-is won't report >> a warning if a page is found as PageIsNew() but isn't actually all >> zero, while still being reported as corrupted by the SRF. > > Yep, joining the point of above to just have no WARNINGs at all.
Now that we have d401c57, I got to consider more this one, and opted for not generating a WARNING for now. Hence, PageisVerifiedExtended() is disabled regarding that, but we still report a checksum failure in it. I have spent some time reviewing the tests, and as I felt this was bulky. In the reworked version attached, I have reduced the number of tests by half, without reducing the coverage, mainly: - Removed all the stderr and the return code tests, as we always expected the commands to succeed, and safe_psql() can do the job already. - Merged of the queries using pg_relation_check_pages into a single routine, with the expected output (set of broken pages returned in the SRF) in the arguments. - Added some prefixes to the tests, to generate unique test names. That makes debug easier. - The query on pg_stat_database is run once at the beginning, once at the end with the number of checksum failures correctly updated. - Added comments to document all the routines, and renamed some of them mostly for consistency. - Skipped system relations from the scan of pg_class, making the test more costly for nothing. - I ran some tests on Windows, just-in-case. I have also added a SearchSysCacheExists1() to double-check if the relation is missing before opening it, added a CHECK_FOR_INTERRUPTS() within the main check loop (where the error context is really helpful), indented the code, bumped the catalogs (mostly a self-reminder), etc. So, what do you think? -- Michael
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h index f44a09b0c2..e522477780 100644 --- a/src/include/catalog/catversion.h +++ b/src/include/catalog/catversion.h @@ -53,6 +53,6 @@ */ /* yyyymmddN */ -#define CATALOG_VERSION_NO 202010201 +#define CATALOG_VERSION_NO 202010271 #endif diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index bbcac69d48..a66870bcc0 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -10958,6 +10958,13 @@ proallargtypes => '{oid,text,int8,timestamptz}', proargmodes => '{i,o,o,o}', proargnames => '{tablespace,name,size,modification}', prosrc => 'pg_ls_tmpdir_1arg' }, +{ oid => '9147', descr => 'check pages of a relation', + proname => 'pg_relation_check_pages', procost => '10000', prorows => '20', + proisstrict => 'f', proretset => 't', provolatile => 'v', proparallel => 'r', + prorettype => 'record', proargtypes => 'regclass text', + proallargtypes => '{regclass,text,text,int8}', proargmodes => '{i,i,o,o}', + proargnames => '{relation,fork,path,failed_block_num}', + prosrc => 'pg_relation_check_pages' }, # hash partitioning constraint function { oid => '5028', descr => 'hash partition CHECK constraint', diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h index ee91b8fa26..a21cab2eaf 100644 --- a/src/include/storage/bufmgr.h +++ b/src/include/storage/bufmgr.h @@ -240,6 +240,9 @@ extern void AtProcExit_LocalBuffers(void); extern void TestForOldSnapshot_impl(Snapshot snapshot, Relation relation); +extern bool CheckBuffer(struct SMgrRelationData *smgr, ForkNumber forknum, + BlockNumber blkno); + /* in freelist.c */ extern BufferAccessStrategy GetAccessStrategy(BufferAccessStrategyType btype); extern void FreeAccessStrategy(BufferAccessStrategy strategy); diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 85cd147e21..c6dd084fbc 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1300,6 +1300,14 @@ LANGUAGE INTERNAL STRICT VOLATILE AS 'pg_create_logical_replication_slot'; +CREATE OR REPLACE FUNCTION pg_relation_check_pages( + IN relation regclass, IN fork text DEFAULT NULL, + OUT path text, OUT failed_block_num bigint) +RETURNS SETOF record +LANGUAGE internal +VOLATILE PARALLEL RESTRICTED +AS 'pg_relation_check_pages'; + CREATE OR REPLACE FUNCTION make_interval(years int4 DEFAULT 0, months int4 DEFAULT 0, weeks int4 DEFAULT 0, days int4 DEFAULT 0, hours int4 DEFAULT 0, mins int4 DEFAULT 0, @@ -1444,6 +1452,7 @@ AS 'unicode_is_normalized'; -- can later change who can access these functions, or leave them as only -- available to superuser / cluster owner, if they choose. -- +REVOKE EXECUTE ON FUNCTION pg_relation_check_pages(regclass, text) FROM public; REVOKE EXECUTE ON FUNCTION pg_start_backup(text, boolean, boolean) FROM public; REVOKE EXECUTE ON FUNCTION pg_stop_backup() FROM public; REVOKE EXECUTE ON FUNCTION pg_stop_backup(boolean, boolean) FROM public; diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 3eee86afe5..eb5c917074 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -4585,3 +4585,95 @@ TestForOldSnapshot_impl(Snapshot snapshot, Relation relation) (errcode(ERRCODE_SNAPSHOT_TOO_OLD), errmsg("snapshot too old"))); } + + +/* + * CheckBuffer + * + * Check the state of a buffer without loading it into the shared buffers. To + * avoid torn pages and possible false positives when reading data, a shared + * LWLock is taken on the target buffer pool partition mapping, and we check + * if the page is in shared buffers or not. An I/O lock is taken on the block + * to prevent any concurrent activity from happening. + * + * If the page is found as dirty in the shared buffers, it is ignored as + * it will be flushed to disk either before the end of the next checkpoint + * or during recovery in the event of an unsafe shutdown. + * + * If the page is found in the shared buffers but is not dirty, we still + * check the state of its data on disk, as it could be possible that the + * page stayed in shared buffers for a rather long time while the on-disk + * data got corrupted. + * + * If the page is not found in shared buffers, the block is read from disk + * while holding the buffer pool partition mapping LWLock. + * + * The page data is stored a private memory area local to this function while + * running the checks. + */ +bool +CheckBuffer(SMgrRelation smgr, ForkNumber forknum, BlockNumber blkno) +{ + char buffer[BLCKSZ]; + BufferTag buf_tag; /* identity of requested block */ + uint32 buf_hash; /* hash value for buf_tag */ + LWLock *partLock; /* buffer partition lock for the buffer */ + BufferDesc *bufdesc; + int buf_id; + + Assert(smgrexists(smgr, forknum)); + + /* create a tag so we can look after the buffer */ + INIT_BUFFERTAG(buf_tag, smgr->smgr_rnode.node, forknum, blkno); + + /* determine its hash code and partition lock ID */ + buf_hash = BufTableHashCode(&buf_tag); + partLock = BufMappingPartitionLock(buf_hash); + + /* see if the block is in the buffer pool or not */ + LWLockAcquire(partLock, LW_SHARED); + buf_id = BufTableLookup(&buf_tag, buf_hash); + if (buf_id >= 0) + { + uint32 buf_state; + + /* + * Found it. Now, retrieve its state to know what to do with it, and + * release the pin immediately. We do so to limit overhead as much as + * possible. We keep the shared lightweight lock on the target buffer + * mapping partition for now, so this buffer cannot be evicted, and we + * acquire an I/O Lock on the buffer as we may need to read its + * contents from disk. + */ + bufdesc = GetBufferDescriptor(buf_id); + + LWLockAcquire(BufferDescriptorGetIOLock(bufdesc), LW_SHARED); + buf_state = LockBufHdr(bufdesc); + UnlockBufHdr(bufdesc, buf_state); + + /* If the page is dirty or invalid, skip it */ + if ((buf_state & BM_DIRTY) || !(buf_state & BM_TAG_VALID)) + { + LWLockRelease(BufferDescriptorGetIOLock(bufdesc)); + LWLockRelease(partLock); + return true; + } + + /* Read the buffer from disk, with the I/O lock still held */ + smgrread(smgr, forknum, blkno, buffer); + LWLockRelease(BufferDescriptorGetIOLock(bufdesc)); + } + else + { + /* + * Simply read the buffer. There's no risk of modification on it as + * we are holding the buffer pool partition mapping lock. + */ + smgrread(smgr, forknum, blkno, buffer); + } + + /* buffer lookup done, so now do its check */ + LWLockRelease(partLock); + + return PageIsVerifiedExtended(buffer, blkno, PIV_REPORT_STAT); +} diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index b4d55e849b..e2279af1e5 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -69,6 +69,7 @@ OBJS = \ oid.o \ oracle_compat.o \ orderedsetaggs.o \ + pagefuncs.o \ partitionfuncs.o \ pg_locale.o \ pg_lsn.o \ diff --git a/src/backend/utils/adt/pagefuncs.c b/src/backend/utils/adt/pagefuncs.c new file mode 100644 index 0000000000..f34d56cf1f --- /dev/null +++ b/src/backend/utils/adt/pagefuncs.c @@ -0,0 +1,230 @@ +/*------------------------------------------------------------------------- + * + * pagefuncs.c + * Functions for page related features. + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * src/backend/utils/adt/pagefuncs.c + * + *------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "access/relation.h" +#include "funcapi.h" +#include "miscadmin.h" +#include "storage/bufmgr.h" +#include "storage/lmgr.h" +#include "storage/smgr.h" +#include "utils/builtins.h" +#include "utils/syscache.h" + +static void check_one_relation(TupleDesc tupdesc, Tuplestorestate *tupstore, + Oid relid, ForkNumber single_forknum); +static void check_relation_fork(TupleDesc tupdesc, Tuplestorestate *tupstore, + Relation relation, ForkNumber forknum); + +/* + * callback arguments for check_pages_error_callback() + */ +typedef struct CheckPagesErrorInfo +{ + char *path; + BlockNumber blkno; +} CheckPagesErrorInfo; + +/* + * Error callback specific to check_relation_fork(). + */ +static void +check_pages_error_callback(void *arg) +{ + CheckPagesErrorInfo *errinfo = (CheckPagesErrorInfo *) arg; + + errcontext("while checking page %u of path %s", + errinfo->blkno, errinfo->path); +} + +/* + * pg_relation_check_pages + * + * Check the state of all the pages for one or more fork types in the given + * relation. + */ +Datum +pg_relation_check_pages(PG_FUNCTION_ARGS) +{ + Oid relid; + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + TupleDesc tupdesc; + Tuplestorestate *tupstore; + MemoryContext per_query_ctx; + MemoryContext oldcontext; + ForkNumber forknum; + + /* Switch into long-lived context to construct returned data structures */ + per_query_ctx = rsinfo->econtext->ecxt_per_query_memory; + oldcontext = MemoryContextSwitchTo(per_query_ctx); + + /* Build a tuple descriptor for our result type */ + if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) + elog(ERROR, "return type must be a row type"); + + tupstore = tuplestore_begin_heap(true, false, work_mem); + rsinfo->returnMode = SFRM_Materialize; + rsinfo->setResult = tupstore; + rsinfo->setDesc = tupdesc; + + MemoryContextSwitchTo(oldcontext); + + /* handle arguments */ + if (PG_ARGISNULL(0)) + { + /* Just leave if nothing is defined */ + PG_RETURN_VOID(); + } + + /* By default all the forks of a relation are checked */ + if (PG_ARGISNULL(1)) + forknum = InvalidForkNumber; + else + { + const char *forkname = TextDatumGetCString(PG_GETARG_TEXT_PP(1)); + + forknum = forkname_to_number(forkname); + } + + relid = PG_GETARG_OID(0); + + check_one_relation(tupdesc, tupstore, relid, forknum); + tuplestore_donestoring(tupstore); + + return (Datum) 0; +} + +/* + * Perform the check on a single relation, possibly filtered with a single + * fork. This function will check if the given relation exists or not, as + * a relation could be dropped after checking for the list of relations and + * before getting here, and we don't want to error out in this case. + */ +static void +check_one_relation(TupleDesc tupdesc, Tuplestorestate *tupstore, + Oid relid, ForkNumber single_forknum) +{ + Relation relation; + ForkNumber forknum; + + /* Check if relation exists. leaving if there is no such relation */ + if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid))) + return; + + relation = relation_open(relid, AccessShareLock); + + /* + * Sanity checks, returning no results if not support. Temporary + * relations and relations without storage are out of scope. + */ + if (!RELKIND_HAS_STORAGE(relation->rd_rel->relkind) || + relation->rd_rel->relpersistence == RELPERSISTENCE_TEMP) + { + relation_close(relation, AccessShareLock); + return; + } + + RelationOpenSmgr(relation); + + for (forknum = 0; forknum <= MAX_FORKNUM; forknum++) + { + if (single_forknum != InvalidForkNumber && single_forknum != forknum) + continue; + + if (smgrexists(relation->rd_smgr, forknum)) + check_relation_fork(tupdesc, tupstore, relation, forknum); + } + + relation_close(relation, AccessShareLock); +} + +/* + * For a given relation and fork, Do the real work of iterating over all pages + * and doing the check. Caller must hold an AccessShareLock lock on the given + * relation. + */ +static void +check_relation_fork(TupleDesc tupdesc, Tuplestorestate *tupstore, + Relation relation, ForkNumber forknum) +{ + BlockNumber blkno, + nblocks; + SMgrRelation smgr = relation->rd_smgr; + char *path; + CheckPagesErrorInfo errinfo; + ErrorContextCallback errcallback; + + /* Number of output arguments in the SRF */ +#define PG_CHECK_RELATION_COLS 2 + + Assert(CheckRelationLockedByMe(relation, AccessShareLock, true)); + + /* + * We remember the number of blocks here. Since caller must hold a lock + * on the relation, we know that it won't be truncated while we're + * iterating over the blocks. Any block added after this function started + * won't be checked, but this is out of scope as such pages will be + * flushed before the next checkpoint's completion. + */ + nblocks = RelationGetNumberOfBlocksInFork(relation, forknum); + + path = relpathbackend(smgr->smgr_rnode.node, + smgr->smgr_rnode.backend, + forknum); + + /* + * Error context to print some information about blocks and relations + * impacted by corruptions. + */ + errinfo.path = pstrdup(path); + errinfo.blkno = 0; + errcallback.callback = check_pages_error_callback; + errcallback.arg = (void *) &errinfo; + errcallback.previous = error_context_stack; + error_context_stack = &errcallback; + + for (blkno = 0; blkno < nblocks; blkno++) + { + Datum values[PG_CHECK_RELATION_COLS]; + bool nulls[PG_CHECK_RELATION_COLS]; + int i = 0; + + /* Update block number for the error context */ + errinfo.blkno = blkno; + + CHECK_FOR_INTERRUPTS(); + + /* Check the given buffer */ + if (CheckBuffer(smgr, forknum, blkno)) + continue; + + memset(values, 0, sizeof(values)); + memset(nulls, 0, sizeof(nulls)); + + values[i++] = CStringGetTextDatum(path); + values[i++] = UInt32GetDatum(blkno); + + Assert(i == PG_CHECK_RELATION_COLS); + + /* Save the corrupted blocks in the tuplestore. */ + tuplestore_putvalues(tupstore, tupdesc, values, nulls); + + pfree(path); + } + + /* Pop the error context stack */ + error_context_stack = errcallback.previous; +} diff --git a/src/test/recovery/t/022_page_check.pl b/src/test/recovery/t/022_page_check.pl new file mode 100644 index 0000000000..7e1f0d1fd8 --- /dev/null +++ b/src/test/recovery/t/022_page_check.pl @@ -0,0 +1,234 @@ +# Emulate on-disk corruptions of relation pages and find such corruptions +# using pg_relation_check_pages(). + +use strict; +use warnings; + +use PostgresNode; +use TestLib; +use Test::More tests => 20; + +our $CHECKSUM_UINT16_OFFSET = 4; +our $PD_UPPER_UINT16_OFFSET = 7; +our $BLOCKSIZE; +our $TOTAL_NB_ERR = 0; + +# Grab a relation page worth a size of BLOCKSIZE from given $filename. +# $blkno is the same block number as for a relation file. +sub read_page +{ + my ($filename, $blkno) = @_; + my $block; + + open(my $infile, '<', $filename) or die; + binmode($infile); + + my $success = read($infile, $block, $BLOCKSIZE, ($blkno * $BLOCKSIZE)); + die($!) if !defined($success); + + close($infile); + + return ($block); +} + +# Update an existing page of size BLOCKSIZE with new contents in given +# $filename. blkno is the block number assigned in the relation file. +sub write_page +{ + my ($filename, $block, $blkno) = @_; + + open(my $outfile, '>', $filename) or die; + binmode($outfile); + + my $nb = syswrite($outfile, $block, $BLOCKSIZE, ($blkno * $BLOCKSIZE)); + + die($!) if not defined $nb; + die("Write error") if ($nb != $BLOCKSIZE); + + $outfile->flush(); + + close($outfile); + return; +} + +# Read 2 bytes from relation page at a given offset. +sub get_uint16_from_page +{ + my ($block, $offset) = @_; + + return (unpack("S*", $block))[$offset]; +} + +# Write 2 bytes to relation page at a given offset. +sub set_uint16_to_page +{ + my ($block, $data, $offset) = @_; + + my $pack = pack("S", $data); + + # vec with 16B or more won't preserve endianness. + vec($block, 2 * $offset, 8) = (unpack('C*', $pack))[0]; + vec($block, (2 * $offset) + 1, 8) = (unpack('C*', $pack))[1]; + + return $block; +} + +# Sanity check on pg_stat_database looking after the number of checksum +# failures. +sub check_pg_stat_database +{ + my ($node, $test_prefix) = @_; + + my $stdout = $node->safe_psql('postgres', + "SELECT " + . " sum(checksum_failures)" + . " FROM pg_catalog.pg_stat_database"); + is($stdout, $TOTAL_NB_ERR, + "$test_prefix: pg_stat_database should have $TOTAL_NB_ERR error"); + + return; +} + +# Run a round of page checks for any relation present in this test run. +# $expected_broken is the psql output marking all the pages found as +# corrupted using relname|blkno as format for each tuple returned. $nb +# is the new number of added to the global counter matched later with +# pg_stat_database. +# +# Note that this has no need to check system relations as these would have +# no corruptions: this test does not manipulate them and should by no mean +# break the cluster. +sub run_page_checks +{ + my ($node, $num_checksum, $expected_broken, $test_prefix) = @_; + + my $stdout = $node->safe_psql('postgres', + "SELECT relname, failed_block_num" + . " FROM (SELECT relname, (pg_catalog.pg_relation_check_pages(oid)).*" + . " FROM pg_class " + . " WHERE relkind in ('r','i', 'm') AND oid >= 16384) AS s"); + + # Check command result + is($stdout, $expected_broken, + "$test_prefix: output mismatch with pg_relation_check_pages()"); + + $TOTAL_NB_ERR += $num_checksum; + return; +} + +# Perform various test that modify a specified block at the specified +# offset, checking that the page corruption is correctly detected. The +# original contents of the page are restored back once done. +# $broken_pages is the set of pages that are expected to be broken +# as of the returned result of pg_relation_check_pages(). $num_checksum +# is the number of checksum failures expected to be added after this +# function is done. +sub corrupt_and_test_block +{ + my ($node, $filename, $blkno, $offset, $broken_pages, $num_checksum, + $test_prefix) + = @_; + my $fake_data = hex '0x0000'; + + # Stop the server cleanly to flush any pages, and to prevent any + # concurrent updates on what is going to be updated. + $node->stop; + my $original_block = read_page($filename, 0); + my $original_data = get_uint16_from_page($original_block, $offset); + + isnt($original_data, $fake_data, + "$test_prefix: fake data at offset $offset should be different from the existing one" + ); + + my $new_block = set_uint16_to_page($original_block, $fake_data, $offset); + isnt( + $original_data, + get_uint16_from_page($new_block, $offset), + "$test_prefix: The data at offset $offset should have been changed in memory" + ); + + write_page($filename, $new_block, 0); + + my $written_data = get_uint16_from_page(read_page($filename, 0), $offset); + + # Some offline checks to validate that the corrupted data is in place. + isnt($original_data, $written_data, + "$test_prefix: data written at offset $offset should be different from the original one" + ); + is( get_uint16_from_page($new_block, $offset), + $written_data, + "$test_prefix: data written at offset $offset should be the same as the one in memory" + ); + is($written_data, $fake_data, + "$test_prefix: The data written at offset $offset should be the one we wanted to write" + ); + + # The corruption is in place, start the server to run the checks. + $node->start; + run_page_checks($node, $num_checksum, $broken_pages, $test_prefix); + + # Stop the server, put the original page back in place. + $node->stop; + + $new_block = set_uint16_to_page($original_block, $original_data, $offset); + is( $original_data, + get_uint16_from_page($new_block, $offset), + "$test_prefix: data at offset $offset should have been restored in memory" + ); + + write_page($filename, $new_block, 0); + is( $original_data, + get_uint16_from_page(read_page($filename, $blkno), $offset), + "$test_prefix: data at offset $offset should have been restored on disk" + ); + + # There should be no errors now that the contents are back in place. + $node->start; + run_page_checks($node, 0, '', $test_prefix); +} + +# Data checksums are necessary for this test. +my $node = get_new_node('main'); +$node->init(extra => ['--data-checksums']); +$node->start; + +my $stdout = + $node->safe_psql('postgres', "SELECT" . " current_setting('block_size')"); + +$BLOCKSIZE = $stdout; + +# Basic schema to corrupt and check +$node->safe_psql( + 'postgres', q| + CREATE TABLE public.t1(id integer); + INSERT INTO public.t1 SELECT generate_series(1, 100); + CHECKPOINT; +|); + +# Get the path to the relation file that will get manipulated by the +# follow-up tests with some on-disk corruptions. +$stdout = $node->safe_psql('postgres', + "SELECT" + . " current_setting('data_directory') || '/' || pg_relation_filepath('t1')" +); + +my $filename = $stdout; + +# Normal case without corruptions, this passes, with pg_stat_database +# reporting no errors. +check_pg_stat_database($node, 'start'); + +# Test with a modified checksum. We use a zero checksum here as it's the only +# one that cannot exist on a checksummed page. We also don't have an easy way +# to compute what the checksum would be after a modification in a random place +# in the block. +corrupt_and_test_block($node, $filename, 0, $CHECKSUM_UINT16_OFFSET, 't1|0', + 1, 'broken checksum'); + +# Test corruption making the block looks like it's PageIsNew(). +corrupt_and_test_block($node, $filename, 0, $PD_UPPER_UINT16_OFFSET, 't1|0', + 0, 'new page'); + +# Check that the number of errors in pg_stat_database match what we +# expect with the corruptions previously introduced. +check_pg_stat_database($node, 'end'); diff --git a/src/test/regress/expected/pagefuncs.out b/src/test/regress/expected/pagefuncs.out new file mode 100644 index 0000000000..38a72b01b3 --- /dev/null +++ b/src/test/regress/expected/pagefuncs.out @@ -0,0 +1,72 @@ +-- +-- Tests for functions related to relation pages +-- +-- Restricted to superusers by default +CREATE ROLE regress_pgfunc_user; +SET ROLE regress_pgfunc_user; +SELECT pg_relation_check_pages('pg_class'); -- error +ERROR: permission denied for function pg_relation_check_pages +SELECT pg_relation_check_pages('pg_class', 'main'); -- error +ERROR: permission denied for function pg_relation_check_pages +RESET ROLE; +DROP ROLE regress_pgfunc_user; +-- NULL and simple sanity checks +SELECT pg_relation_check_pages(NULL); -- empty result + pg_relation_check_pages +------------------------- +(0 rows) + +SELECT pg_relation_check_pages(NULL, NULL); -- empty result + pg_relation_check_pages +------------------------- +(0 rows) + +SELECT pg_relation_check_pages('pg_class', 'invalid_fork'); -- error +ERROR: invalid fork name +HINT: Valid fork names are "main", "fsm", "vm", and "init". +-- Relation types that are supported +CREATE TABLE pgfunc_test_tab (id int); +CREATE INDEX pgfunc_test_ind ON pgfunc_test_tab(id); +INSERT INTO pgfunc_test_tab VALUES (generate_series(1,1000)); +SELECT pg_relation_check_pages('pgfunc_test_tab'); + pg_relation_check_pages +------------------------- +(0 rows) + +SELECT pg_relation_check_pages('pgfunc_test_ind'); + pg_relation_check_pages +------------------------- +(0 rows) + +DROP TABLE pgfunc_test_tab; +CREATE MATERIALIZED VIEW pgfunc_test_matview AS SELECT 1; +SELECT pg_relation_check_pages('pgfunc_test_matview'); + pg_relation_check_pages +------------------------- +(0 rows) + +DROP MATERIALIZED VIEW pgfunc_test_matview; +CREATE SEQUENCE pgfunc_test_seq; +SELECT pg_relation_check_pages('pgfunc_test_seq'); + pg_relation_check_pages +------------------------- +(0 rows) + +DROP SEQUENCE pgfunc_test_seq; +-- pg_relation_check_pages() returns no results if passed relations that +-- do not support the operation, like relations without storage or temporary +-- relations. +CREATE TEMPORARY TABLE pgfunc_test_temp AS SELECT generate_series(1,10) AS a; +SELECT pg_relation_check_pages('pgfunc_test_temp'); + pg_relation_check_pages +------------------------- +(0 rows) + +DROP TABLE pgfunc_test_temp; +CREATE VIEW pgfunc_test_view AS SELECT 1; +SELECT pg_relation_check_pages('pgfunc_test_view'); + pg_relation_check_pages +------------------------- +(0 rows) + +DROP VIEW pgfunc_test_view; diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index ae89ed7f0b..7a46a13252 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -112,7 +112,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr # ---------- # Another group of parallel tests # ---------- -test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain +test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain pagefuncs # event triggers cannot run concurrently with any test that runs DDL test: event_trigger diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule index 525bdc804f..9a80b80f73 100644 --- a/src/test/regress/serial_schedule +++ b/src/test/regress/serial_schedule @@ -197,6 +197,7 @@ test: hash_part test: indexing test: partition_aggregate test: partition_info +test: pagefuncs test: tuplesort test: explain test: event_trigger diff --git a/src/test/regress/sql/pagefuncs.sql b/src/test/regress/sql/pagefuncs.sql new file mode 100644 index 0000000000..12d32eeae4 --- /dev/null +++ b/src/test/regress/sql/pagefuncs.sql @@ -0,0 +1,41 @@ +-- +-- Tests for functions related to relation pages +-- + +-- Restricted to superusers by default +CREATE ROLE regress_pgfunc_user; +SET ROLE regress_pgfunc_user; +SELECT pg_relation_check_pages('pg_class'); -- error +SELECT pg_relation_check_pages('pg_class', 'main'); -- error +RESET ROLE; +DROP ROLE regress_pgfunc_user; + +-- NULL and simple sanity checks +SELECT pg_relation_check_pages(NULL); -- empty result +SELECT pg_relation_check_pages(NULL, NULL); -- empty result +SELECT pg_relation_check_pages('pg_class', 'invalid_fork'); -- error + +-- Relation types that are supported +CREATE TABLE pgfunc_test_tab (id int); +CREATE INDEX pgfunc_test_ind ON pgfunc_test_tab(id); +INSERT INTO pgfunc_test_tab VALUES (generate_series(1,1000)); +SELECT pg_relation_check_pages('pgfunc_test_tab'); +SELECT pg_relation_check_pages('pgfunc_test_ind'); +DROP TABLE pgfunc_test_tab; + +CREATE MATERIALIZED VIEW pgfunc_test_matview AS SELECT 1; +SELECT pg_relation_check_pages('pgfunc_test_matview'); +DROP MATERIALIZED VIEW pgfunc_test_matview; +CREATE SEQUENCE pgfunc_test_seq; +SELECT pg_relation_check_pages('pgfunc_test_seq'); +DROP SEQUENCE pgfunc_test_seq; + +-- pg_relation_check_pages() returns no results if passed relations that +-- do not support the operation, like relations without storage or temporary +-- relations. +CREATE TEMPORARY TABLE pgfunc_test_temp AS SELECT generate_series(1,10) AS a; +SELECT pg_relation_check_pages('pgfunc_test_temp'); +DROP TABLE pgfunc_test_temp; +CREATE VIEW pgfunc_test_view AS SELECT 1; +SELECT pg_relation_check_pages('pgfunc_test_view'); +DROP VIEW pgfunc_test_view; diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index f7f401b534..7ef2ec9972 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -26182,6 +26182,56 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8'); </sect2> + <sect2 id="functions-data-sanity"> + <title>Data Sanity Functions</title> + + <para> + The functions shown in <xref linkend="functions-data-sanity-table"/> + provide ways to check the sanity of data files in the cluster. + </para> + + <table id="functions-data-sanity-table"> + <title>Data Sanity Functions</title> + <tgroup cols="3"> + <thead> + <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry> + </row> + </thead> + + <tbody> + <row> + <entry> + <literal><function>pg_relation_check_pages(<parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type> <literal>DEFAULT</literal> <literal>NULL</literal> ])</function></literal> + </entry> + <entry><type>setof record</type></entry> + <entry>Check the pages of a relation. + </entry> + </row> + </tbody> + </tgroup> + </table> + + <indexterm> + <primary>pg_relation_check_pages</primary> + </indexterm> + <para id="functions-check-relation-note" xreflabel="pg_relation_check_pages"> + <function>pg_relation_check_pages</function> iterates over all blocks of a + given relation and verifies if they are in a state where they can safely + be loaded into the shared buffers. If defined, + <replaceable>fork</replaceable> specifies that only the pages of the given + fork are to be verified. Fork can be <literal>'main'</literal> for the + main data fork, <literal>'fsm'</literal> for the free space map, + <literal>'vm'</literal> for the visibility map, or + <literal>'init'</literal> for the initialization fork. The default of + <literal>NULL</literal> means that all the forks of the relation are + checked. The function returns a list of blocks that are considered as + corrupted with the path of the related file. Use of this function is + restricted to superusers by default but access may be granted to others + using <command>GRANT</command>. + </para> + + </sect2> + </sect1> <sect1 id="functions-trigger"> diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index ff853634bc..b6acade6c6 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -332,6 +332,7 @@ CatCacheHeader CatalogId CatalogIndexState ChangeVarNodes_context +CheckPagesErrorInfo CheckPoint CheckPointStmt CheckpointStatsData
signature.asc
Description: PGP signature