Hi Robert,

Thanks for the review. Please find my comments inline:

On Sat, Aug 1, 2020 at 12:18 AM Robert Haas <robertmh...@gmail.com> wrote:
>
> On Fri, Jul 31, 2020 at 8:52 AM Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
> > Attached is the new version of patch that addresses the comments from 
> > Andrey and Beena.
>
> +PGFILEDESC = "pg_surgery - perform surgery on the damaged heap table"
>
> the -> a
>
> I also suggest: heap table -> relation, because we might want to
> extend this to other cases later.
>

Corrected.

> +-- toast table.
> +begin;
> +create table ttab(a text);
> +insert into ttab select string_agg(chr(floor(random() * 26)::int +
> 65), '') from generate_series(1,10000);
> +select * from ttab where xmin = 2;
> + a
> +---
> +(0 rows)
> +
> +select heap_force_freeze('ttab'::regclass, ARRAY['(0, 1)']::tid[]);
> + heap_force_freeze
> +-------------------
> +
> +(1 row)
> +
>
> I don't understand the point of this. You're not testing the function
> on the TOAST table; you're testing it on the main table when there
> happens to be a TOAST table that is probably getting used for
> something. But that's not really relevant to what is being tested
> here, so as written this seems redundant with the previous cases.
>

Yeah, it's being tested on the main table, not on a toast table. I've
removed this test-case and also restricted direct access to the toast
table using heap_force_kill/freeze functions. I think we shouldn't be
using these functions to do any changes in the toast table. We will
only use these functions with the main table and let VACUUM remove the
corresponding data chunks (pointed by the tuple that got removed from
the main table).

Another option would be to identify all the data chunks corresponding
to the tuple (ctid) being killed from the main table and remove them
one by one. We will only do this if the tuple from the main table that
has been marked as killed has an external storage. We will have to add
a bunch of code for this otherwise we can let VACUUM do this for us.
Let me know your thoughts on this.

> +-- test pg_surgery functions with the unsupported relations. Should fail.
>
> Please name the specific functions being tested here in case we add
> more in the future that are tested separately.
>

Done.

> +++ b/contrib/pg_surgery/heap_surgery_funcs.c
>
> I think we could drop _funcs from the file name.
>

Done.

> +#ifdef PG_MODULE_MAGIC
> +PG_MODULE_MAGIC;
> +#endif
>
> The #ifdef here is not required, and if you look at other contrib
> modules you'll see that they don't have it.
>

Okay, done.

> I don't like all the macros at the top of the file much. CHECKARRVALID
> is only used in one place, so it seems to me that you might as well
> just inline it and lose the macro. Likewise for SORT and ARRISEMPTY.
>

Done.

> Once you do that, heap_force_common() can just compute the number of
> array elements once, instead of doing it once inside ARRISEMPTY, then
> again inside SORT, and then a third time to initialize ntids. You can
> just have a local variable in that function that is set once and then
> used as needed. Then you are only doing ARRNELEMS once, so you can get
> rid of that macro too. The same technique can be used to get rid of
> ARRPTR. So then all the macros go away without introducing any code
> duplication.
>

Done.

> +/* Options to forcefully change the state of a heap tuple. */
> +typedef enum HTupleForceOption
> +{
> + FORCE_KILL,
> + FORCE_FREEZE
> +} HTupleForceOption;
>
> I suggest un-abbreviating HTuple to HeapTuple and un-abbreviating the
> enum members to HEAP_FORCE_KILL and HEAP_FORCE_FREE.

Done.

Also, how about
> option -> operation?
>

I think both look okay to me.

> + return heap_force_common(fcinfo, FORCE_KILL);
>
> I think it might be more idiomatic to use PG_RETURN_DATUM here.  I
> know it's the same thing, though, and perhaps I'm even wrong about the
> prevailing style.
>

Done.

> + Assert(force_opt == FORCE_KILL || force_opt == FORCE_FREEZE);
>
> I think this is unnecessary. It's an enum with 2 values.
>

Removed.

> + if (ARRISEMPTY(ta))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("empty tid array")));
>
> I don't see why this should be an error. Why can't we just continue
> normally and if it does nothing, it does nothing? You'd need to change
> the do..while loop to a while loop so that the end condition is tested
> at the top, but that seems fine.
>

I think it's okay to have this check. So, just left it as-is. We do
have such checks in other contrib modules as well wherever the array
is being passed as an input to the function.

> + rel = relation_open(relid, AccessShareLock);
>
> Maybe we should take RowExclusiveLock, since we are going to modify
> rows. Not sure how much it matters, though.
>

I don't know how it would make a difference, but still as you said
replaced AccessShare with RowExclusive.

> + if (!superuser() && GetUserId() != rel->rd_rel->relowner)
> + ereport(ERROR,
> + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> + errmsg("must be superuser or object owner to use %s.",
> + force_opt == FORCE_KILL ? "heap_force_kill()" :
> + "heap_force_freeze()")));
>
> This is the wrong way to do a permissions check, and it's also the
> wrong way to write an error message about having failed one. To see
> the correct method, grep for pg_class_aclcheck. However, I think that
> we shouldn't in general trust the object owner to do this, unless the
> super-user gave permission. This is a data-corrupting operation, and
> only the boss is allowed to authorize it. So I think you should also
> add REVOKE EXECUTE FROM PUBLIC statements to the SQL file, and then
> have this check as a backup. Then, the superuser is always allowed,
> and if they choose to GRANT EXECUTE on this function to some users,
> those users can do it for their own relations, but not others.
>

Done.

> + if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("only heap AM is supported")));
> +
> + check_relation_relkind(rel);
>
> Seems like these checks are in the wrong order.

I don't think there is anything wrong with the order. I can see the
same order in other contrib modules as well.

Also, maybe you could
> call the function something like check_relation_ok() and put the
> permissions test, the relkind test, and the relam test all inside of
> it, just to tighten up the code in this main function a bit.
>

Yeah, I've added a couple of functions named sanity_check_relation and
sanity_check_tid_array and shifted all the sanity checks there.

> + if (noffs > maxoffset)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("number of offsets specified for block %u exceeds the max
> offset number %u",
> + blkno, maxoffset)));
>
> Hmm, this doesn't seem quite right. The actual problem is if an
> individual item pointer's offset number is greater than maxoffset,
> which can be true even if the total number of offsets is less than
> maxoffset. So I think you need to remove this check and add a check
> inside the loop which follows that offnos[i] is in range.
>

Agreed and done.

> The way you've structured that loop is actually problematic -- I don't
> think we want to be calling elog() or ereport() inside a critical
> section. You could fix the case that checks for an invalid force_opt
> by just doing if (op == HEAP_FORCE_KILL) { ... } else { Assert(op ==
> HEAP_FORCE_FREEZE); ... }, or by using a switch with no default. The
> NOTICE case you have here is a bigger problem.

Done.

You really cannot
> modify the buffer like this and then decide, oops, never mind, I think
> I won't mark it dirty or write WAL for the changes. If you do that,
> the buffer is still in memory, but it's now been modified. A
> subsequent operation that modifies it will start with the altered
> state you created here, quite possibly leading to WAL that cannot be
> correctly replayed on the standby. In other words, you've got to
> decide for certain whether you want to proceed with the operation
> *before* you enter the critical section. You also need to emit any
> messages before or after the critical section.  So you could:
>

This is still not clear. I think Robert needs to respond to my earlier comment.

> I believe this violates our guidelines on message construction. Have
> two completely separate messages -- and maybe lose the word "already":
>
> "skipping tid (%u, %u) because it is dead"
> "skipping tid (%u, %u) because it is unused"
>
> The point of this is that it makes it easier for translators.
>

Done.

> I see very little point in what verify_tid() is doing. Before using
> each block number, we should check that it's less than or equal to a
> cached value of RelationGetNumberOfBlocks(rel). That's necessary in
> any case to avoid funny errors; and then the check here against
> specifically InvalidBlockNumber is redundant. For the offset number,
> same thing: we need to check each offset against the page's
> PageGetMaxOffsetNumber(page); and if we do that then we don't need
> these checks.
>

Done.

Please check the attached patch for the changes.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 76ea0329aed2afd2fdfaaaaf38ee3d825de4795e Mon Sep 17 00:00:00 2001
From: ashu <ashutosh.sha...@enterprisedb.com>
Date: Wed, 5 Aug 2020 15:39:51 +0530
Subject: [PATCH] Add contrib/pg_surgery to perform surgery on a damaged heap
 table.

This contrib module basically adds a couple of functions named
heap_force_kill and heap_force_freeze that can be used in the scripts
to perform surgery on the damaged heap tables.

Ashutosh Sharma
---
 contrib/Makefile                           |   1 +
 contrib/pg_surgery/Makefile                |  23 ++
 contrib/pg_surgery/expected/pg_surgery.out |  78 +++++++
 contrib/pg_surgery/heap_surgery.c          | 355 +++++++++++++++++++++++++++++
 contrib/pg_surgery/pg_surgery--1.0.sql     |  18 ++
 contrib/pg_surgery/pg_surgery.control      |   5 +
 contrib/pg_surgery/sql/pg_surgery.sql      |  61 +++++
 7 files changed, 541 insertions(+)
 create mode 100644 contrib/pg_surgery/Makefile
 create mode 100644 contrib/pg_surgery/expected/pg_surgery.out
 create mode 100644 contrib/pg_surgery/heap_surgery.c
 create mode 100644 contrib/pg_surgery/pg_surgery--1.0.sql
 create mode 100644 contrib/pg_surgery/pg_surgery.control
 create mode 100644 contrib/pg_surgery/sql/pg_surgery.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d41..07d5734 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -35,6 +35,7 @@ SUBDIRS = \
 		pg_standby	\
 		pg_stat_statements \
 		pg_trgm		\
+		pg_surgery	\
 		pgcrypto	\
 		pgrowlocks	\
 		pgstattuple	\
diff --git a/contrib/pg_surgery/Makefile b/contrib/pg_surgery/Makefile
new file mode 100644
index 0000000..ecf2e20
--- /dev/null
+++ b/contrib/pg_surgery/Makefile
@@ -0,0 +1,23 @@
+# contrib/pg_surgery/Makefile
+
+MODULE_big = pg_surgery
+OBJS = \
+	$(WIN32RES) \
+	heap_surgery.o
+
+EXTENSION = pg_surgery
+DATA = pg_surgery--1.0.sql
+PGFILEDESC = "pg_surgery - perform surgery on a damaged relation"
+
+REGRESS = pg_surgery
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_surgery
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_surgery/expected/pg_surgery.out b/contrib/pg_surgery/expected/pg_surgery.out
new file mode 100644
index 0000000..8391e88
--- /dev/null
+++ b/contrib/pg_surgery/expected/pg_surgery.out
@@ -0,0 +1,78 @@
+create extension pg_surgery;
+--
+-- check that using heap_force_kill and heap_force_freeze functions with the
+-- normal heap table passes.
+--
+-- normal heap table.
+begin;
+create table htab(a int);
+insert into htab values (100), (200), (300), (400), (500);
+select * from htab where xmin = 2;
+ a 
+---
+(0 rows)
+
+select heap_force_freeze('htab'::regclass, ARRAY['(0, 4)']::tid[]);
+ heap_force_freeze 
+-------------------
+ 
+(1 row)
+
+select ctid, xmax from htab where xmin = 2;
+ ctid  | xmax 
+-------+------
+ (0,4) |    0
+(1 row)
+
+select heap_force_kill('htab'::regclass, ARRAY['(0, 4)']::tid[]);
+ heap_force_kill 
+-----------------
+ 
+(1 row)
+
+select * from htab where ctid = '(0, 4)';
+ a 
+---
+(0 rows)
+
+rollback;
+--
+-- check that using heap_force_kill and heap_force_freeze functions with the
+-- unsupported relations fails.
+--
+-- partitioned tables (the parent table) doesn't contain any tuple.
+create table ptab (a int) partition by list (a);
+select heap_force_kill('ptab'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR:  only heap AM is supported
+select heap_force_freeze('ptab'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR:  only heap AM is supported
+create index ptab_idx on ptab (a);
+-- indexes are not supported, should fail.
+select heap_force_kill('ptab_idx'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR:  only heap AM is supported
+select heap_force_freeze('ptab_idx'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR:  only heap AM is supported
+create view vw as select 1;
+-- views are not supported as well. so, all these should fail.
+select heap_force_kill('vw'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR:  only heap AM is supported
+select heap_force_freeze('vw'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR:  only heap AM is supported
+create materialized view mvw as select 10;
+-- materialized views are not supported as well. so, all these should fail.
+select heap_force_kill('mvw'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR:  "mvw" is not a table
+select heap_force_freeze('mvw'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR:  "mvw" is not a table
+create sequence seq;
+-- sequences are not supported as well. so, all these functions should fail.
+select heap_force_kill('seq'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR:  only heap AM is supported
+select heap_force_freeze('seq'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR:  only heap AM is supported
+-- cleanup.
+drop materialized view mvw;
+drop table ptab;
+drop view vw;
+drop sequence seq;
+drop extension pg_surgery;
diff --git a/contrib/pg_surgery/heap_surgery.c b/contrib/pg_surgery/heap_surgery.c
new file mode 100644
index 0000000..b72417b
--- /dev/null
+++ b/contrib/pg_surgery/heap_surgery.c
@@ -0,0 +1,355 @@
+/*-------------------------------------------------------------------------
+ *
+ * heap_surgery.c
+ *	  Functions to perform surgery on the damaged heap table.
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/pg_surgery/heap_surgery.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "catalog/pg_am_d.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "utils/acl.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+/* Options to forcefully change the state of a heap tuple. */
+typedef enum HeapTupleForceOption
+{
+	HEAP_FORCE_KILL,
+	HEAP_FORCE_FREEZE
+} HeapTupleForceOption;
+
+PG_FUNCTION_INFO_V1(heap_force_kill);
+PG_FUNCTION_INFO_V1(heap_force_freeze);
+
+static int32 tidcmp(const void *a, const void *b);
+static Datum heap_force_common(FunctionCallInfo fcinfo,
+							   HeapTupleForceOption heap_force_opt);
+static void sanity_check_tid_array(ArrayType *ta, int *ntids);
+static void sanity_check_relation(Relation rel);
+static BlockNumber tids_same_page_fetch_offnums(ItemPointer tids, int ntids,
+												OffsetNumber *next_start_ptr,
+												OffsetNumber *offnos);
+
+/*-------------------------------------------------------------------------
+ * heap_force_kill()
+ *
+ * Force kill the tuple(s) pointed to by the item pointer(s) stored in the
+ * given tid array.
+ *
+ * Usage: SELECT heap_force_kill(regclass, tid[]);
+ *-------------------------------------------------------------------------
+ */
+Datum
+heap_force_kill(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_DATUM(heap_force_common(fcinfo, HEAP_FORCE_KILL));
+}
+
+/*-------------------------------------------------------------------------
+ * heap_force_freeze()
+ *
+ * Force freeze the tuple(s) pointed to by the item pointer(s) stored in the
+ * given tid array.
+ *
+ * Usage: SELECT heap_force_freeze(regclass, tid[]);
+ *-------------------------------------------------------------------------
+ */
+Datum
+heap_force_freeze(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_DATUM(heap_force_common(fcinfo, HEAP_FORCE_FREEZE));
+}
+
+/*-------------------------------------------------------------------------
+ * heap_force_common()
+ *
+ * Common code for heap_force_kill and heap_force_freeze
+ *-------------------------------------------------------------------------
+ */
+static Datum
+heap_force_common(FunctionCallInfo fcinfo, HeapTupleForceOption heap_force_opt)
+{
+	Oid				relid = PG_GETARG_OID(0);
+	ArrayType	   *ta = PG_GETARG_ARRAYTYPE_P_COPY(1);
+	ItemPointer		tids;
+	int				ntids;
+	Relation		rel;
+	Buffer			buf;
+	Page			page;
+	ItemId			itemid;
+	BlockNumber		blkno;
+	OffsetNumber   *offnos;
+	OffsetNumber	offno,
+					noffs,
+					curr_start_ptr,
+					next_start_ptr,
+					maxoffset;
+	int				i,
+					nskippedItems,
+					nblocks;
+
+	if (RecoveryInProgress())
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("recovery is in progress"),
+				 errhint("heap surgery functions cannot be executed during recovery.")));
+
+	/* Basic sanity checking. */
+	sanity_check_tid_array(ta, &ntids);
+
+	rel = relation_open(relid, RowExclusiveLock);
+
+	sanity_check_relation(rel);
+
+	tids = ((ItemPointer) ARR_DATA_PTR(ta));
+
+	/*
+	 * If there is more than one tid in the array, sort it so that we can
+	 * easily fetch all the tids belonging to one particular page from the
+	 * array.
+	 */
+	if (ntids > 1)
+		qsort((void*) tids, ntids, sizeof(ItemPointerData), tidcmp);
+
+	offnos = (OffsetNumber *) palloc(ntids * sizeof(OffsetNumber));
+	noffs = curr_start_ptr = next_start_ptr = 0;
+	nblocks = RelationGetNumberOfBlocks(rel);
+
+	do
+	{
+		/*
+		 * Get the offset numbers from the tids belonging to one particular page
+		 * and process them one by one.
+		 */
+		blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
+											 offnos);
+
+		if (blkno < 0 || blkno >= nblocks)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("block number %u is out of range for relation \"%s\"",
+							blkno, RelationGetRelationName(rel))));
+
+		CHECK_FOR_INTERRUPTS();
+
+		buf = ReadBuffer(rel, blkno);
+		LockBufferForCleanup(buf);
+
+		page = BufferGetPage(buf);
+
+		/* Calculate the number of offsets stored in offnos array. */
+		noffs = next_start_ptr - curr_start_ptr;
+
+		maxoffset = PageGetMaxOffsetNumber(page);
+
+		nskippedItems = 0;
+
+		/* No ereport(ERROR) from here until all the changes are logged. */
+		START_CRIT_SECTION();
+
+		for (i = 0; i < noffs; i++)
+		{
+			if (offnos[i] > maxoffset)
+			{
+				ereport(NOTICE,
+						 errmsg("skipping tid (%u, %u) because it contains an invalid offset",
+								blkno, offnos[i]));
+				continue;
+			}
+
+			itemid = PageGetItemId(page, offnos[i]);
+
+			/* Follow any redirections until we find something useful. */
+			while (ItemIdIsRedirected(itemid))
+			{
+				offno = ItemIdGetRedirect(itemid);
+				itemid = PageGetItemId(page, offno);
+				CHECK_FOR_INTERRUPTS();
+			}
+
+			/* Nothing to do if the itemid is unused or already dead. */
+			if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+			{
+				nskippedItems++;
+
+				if (!ItemIdIsUsed(itemid))
+					ereport(NOTICE,
+							(errmsg("skipping tid (%u, %u) because it is marked unused",
+									blkno, offnos[i])));
+				else
+					ereport(NOTICE,
+							(errmsg("skipping tid (%u, %u) because it is marked dead",
+									blkno, offnos[i])));
+				continue;
+			}
+
+			Assert(ItemIdIsNormal(itemid));
+
+			if (heap_force_opt == HEAP_FORCE_KILL)
+				ItemIdSetDead(itemid);
+			else
+			{
+				HeapTupleHeader htup;
+
+				Assert(heap_force_opt == HEAP_FORCE_FREEZE);
+
+				htup = (HeapTupleHeader) PageGetItem(page, itemid);
+
+				HeapTupleHeaderSetXmin(htup, FrozenTransactionId);
+				HeapTupleHeaderSetXmax(htup, InvalidTransactionId);
+
+				/*
+				 * Clear all the visibility-related bits of this tuple and mark
+				 * it as frozen.
+				 */
+				htup->t_infomask &= ~HEAP_XACT_MASK;
+				htup->t_infomask |= (HEAP_XMIN_FROZEN | HEAP_XMAX_INVALID);
+			}
+		}
+
+		if (nskippedItems == noffs)
+			goto skip_wal;
+
+		/* Mark buffer dirty before we write WAL. */
+		MarkBufferDirty(buf);
+
+		/* XLOG stuff */
+		if (RelationNeedsWAL(rel))
+			log_newpage_buffer(buf, true);
+
+skip_wal:
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * Update the current start pointer before fetching next set of offset
+		 * numbers from the tids belonging to the same page.
+		 */
+		curr_start_ptr = next_start_ptr;
+	} while (next_start_ptr != ntids);
+
+	relation_close(rel, RowExclusiveLock);
+
+	pfree(ta);
+	pfree(offnos);
+
+	PG_RETURN_VOID();
+}
+
+/*-------------------------------------------------------------------------
+ * tidcmp()
+ *
+ * Compare two item pointers, return -1, 0, or +1.
+ *
+ * See ItemPointerCompare for details.
+ * ------------------------------------------------------------------------
+ */
+static int32
+tidcmp(const void *a, const void *b)
+{
+	ItemPointer iptr1 = ((const ItemPointer) a);
+	ItemPointer iptr2 = ((const ItemPointer) b);
+
+	return ItemPointerCompare(iptr1, iptr2);
+}
+
+/*-------------------------------------------------------------------------
+ * sanity_check_tid_array()
+ *
+ * Perform sanity check on the given tid array.
+ * ------------------------------------------------------------------------
+ */
+static void
+sanity_check_tid_array(ArrayType *ta, int *ntids)
+{
+	if (ARR_HASNULL(ta) && array_contains_nulls(ta))
+		ereport(ERROR,
+				(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+				 errmsg("array must not contain nulls")));
+
+	*ntids = ArrayGetNItems(ARR_NDIM(ta), ARR_DIMS(ta));
+
+	if (*ntids == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("empty tid array")));
+}
+
+/*-------------------------------------------------------------------------
+ * sanity_check_relation()
+ *
+ * Perform sanity check on the given relation.
+ * ------------------------------------------------------------------------
+ */
+static void
+sanity_check_relation(Relation rel)
+{
+	if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("only heap AM is supported")));
+
+	if (rel->rd_rel->relkind != RELKIND_RELATION)
+		ereport(ERROR,
+				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+				 errmsg("\"%s\" is not a table",
+						RelationGetRelationName(rel))));
+
+	if (pg_class_aclcheck(RelationGetRelid(rel), GetUserId(), ACL_ALL_RIGHTS_RELATION) != ACLCHECK_OK)
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("permission denied for relation \"%s\"",
+						RelationGetRelationName(rel))));
+}
+
+/*-------------------------------------------------------------------------
+ * tids_same_page_fetch_offnums()
+ *
+ * Find out all the tids residing in the same page as tids[next_start_ptr] and
+ * fetch the offset number stored in each of them into a caller-allocated offset
+ * number array.
+ * ------------------------------------------------------------------------
+ */
+static BlockNumber
+tids_same_page_fetch_offnums(ItemPointer tids, int ntids,
+							 OffsetNumber *next_start_ptr, OffsetNumber *offnos)
+{
+	int				i;
+	BlockNumber		prev_blkno,
+					blkno;
+	OffsetNumber	offno;
+
+	for (i = *next_start_ptr; i < ntids; i++)
+	{
+		ItemPointerData tid = tids[i];
+
+		if (!ItemPointerIsValid(&tid))
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid item pointer")));
+
+		blkno = ItemPointerGetBlockNumberNoCheck(&tid);
+		offno = ItemPointerGetOffsetNumberNoCheck(&tid);
+
+		if (i == *next_start_ptr || (prev_blkno == blkno))
+			offnos[i - *next_start_ptr] = offno;
+		else
+			break;
+
+		prev_blkno = blkno;
+	}
+
+	*next_start_ptr = i;
+	return prev_blkno;
+}
\ No newline at end of file
diff --git a/contrib/pg_surgery/pg_surgery--1.0.sql b/contrib/pg_surgery/pg_surgery--1.0.sql
new file mode 100644
index 0000000..2ae7f22
--- /dev/null
+++ b/contrib/pg_surgery/pg_surgery--1.0.sql
@@ -0,0 +1,18 @@
+/* contrib/pg_surgery/pg_surgery--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_surgery" to load this file. \quit
+
+CREATE FUNCTION heap_force_kill(reloid regclass, tids tid[])
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'heap_force_kill'
+LANGUAGE C STRICT;
+
+REVOKE EXECUTE ON FUNCTION heap_force_kill(regclass, tid[]) FROM PUBLIC;
+
+CREATE FUNCTION heap_force_freeze(reloid regclass, tids tid[])
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'heap_force_freeze'
+LANGUAGE C STRICT;
+
+REVOKE EXECUTE ON FUNCTION heap_force_freeze(regclass, tid[]) FROM PUBLIC;
\ No newline at end of file
diff --git a/contrib/pg_surgery/pg_surgery.control b/contrib/pg_surgery/pg_surgery.control
new file mode 100644
index 0000000..2bcdad1
--- /dev/null
+++ b/contrib/pg_surgery/pg_surgery.control
@@ -0,0 +1,5 @@
+# pg_surgery extension
+comment = 'extension to perform surgery on a damaged relation'
+default_version = '1.0'
+module_pathname = '$libdir/pg_surgery'
+relocatable = true
diff --git a/contrib/pg_surgery/sql/pg_surgery.sql b/contrib/pg_surgery/sql/pg_surgery.sql
new file mode 100644
index 0000000..bcd1878
--- /dev/null
+++ b/contrib/pg_surgery/sql/pg_surgery.sql
@@ -0,0 +1,61 @@
+create extension pg_surgery;
+
+--
+-- check that using heap_force_kill and heap_force_freeze functions with the
+-- normal heap table passes.
+--
+
+-- normal heap table.
+begin;
+create table htab(a int);
+insert into htab values (100), (200), (300), (400), (500);
+
+select * from htab where xmin = 2;
+select heap_force_freeze('htab'::regclass, ARRAY['(0, 4)']::tid[]);
+select ctid, xmax from htab where xmin = 2;
+
+select heap_force_kill('htab'::regclass, ARRAY['(0, 4)']::tid[]);
+select * from htab where ctid = '(0, 4)';
+rollback;
+
+--
+-- check that using heap_force_kill and heap_force_freeze functions with the
+-- unsupported relations fails.
+--
+
+-- partitioned tables (the parent table) doesn't contain any tuple.
+create table ptab (a int) partition by list (a);
+
+select heap_force_kill('ptab'::regclass, ARRAY['(0, 1)']::tid[]);
+select heap_force_freeze('ptab'::regclass, ARRAY['(0, 1)']::tid[]);
+
+create index ptab_idx on ptab (a);
+
+-- indexes are not supported, should fail.
+select heap_force_kill('ptab_idx'::regclass, ARRAY['(0, 1)']::tid[]);
+select heap_force_freeze('ptab_idx'::regclass, ARRAY['(0, 1)']::tid[]);
+
+create view vw as select 1;
+
+-- views are not supported as well. so, all these should fail.
+select heap_force_kill('vw'::regclass, ARRAY['(0, 1)']::tid[]);
+select heap_force_freeze('vw'::regclass, ARRAY['(0, 1)']::tid[]);
+
+create materialized view mvw as select 10;
+
+-- materialized views are not supported as well. so, all these should fail.
+select heap_force_kill('mvw'::regclass, ARRAY['(0, 1)']::tid[]);
+select heap_force_freeze('mvw'::regclass, ARRAY['(0, 1)']::tid[]);
+
+create sequence seq;
+
+-- sequences are not supported as well. so, all these functions should fail.
+select heap_force_kill('seq'::regclass, ARRAY['(0, 1)']::tid[]);
+select heap_force_freeze('seq'::regclass, ARRAY['(0, 1)']::tid[]);
+
+-- cleanup.
+drop materialized view mvw;
+drop table ptab;
+drop view vw;
+drop sequence seq;
+drop extension pg_surgery;
-- 
1.8.3.1

Reply via email to