Hello all,

I think the discussion went a little sideways, so let me recap what I'm
suggesting:

   1. I mentioned that there is a missing callback when the filenode is
   unlinked and this is particularly evident when dropping a table.
   2. It was correctly pointed out to me that an implementor need to ensure
   that dropping a table is transactional.
   3. I argued that the callback is still correct and outlined how this can
   be handled by a table access method using xact callbacks, if necessary.

I see huge potential in the table access method and would like to do my
part in helping it in succeeding. I noted that the API is biased in how the
existing heap implementation works and is also very focused on
implementations of "local" storage engines. For more novel architectures
(for example, various sorts of distributed architectures) and to be easier
to work with, I think that the API can be improved in a few places. This is
a first step in the direction of making the API both easier to use as well
as enabling more novel use-cases.

Writing implementations with ease is more about having the right callbacks
in the right places and providing the right information, but in some cases
it is just not possible to implement efficient functionality with the
current interface. I think it can be useful to separate these two kinds of
enhancements when discussing the API, but I think both are important for
the table access methods to be practically usable and to leverage the full
power of this concept.

On Tue, Aug 2, 2022 at 1:44 AM Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> On 2021-04-05 21:57:12 +0200, Mats Kindahl wrote:
> >    2. In the storage layer, the function RelationDropStorage is called,
> >    which will record the table to be dropped in the pendingDeletes
> >
> > When committing (or aborting) the transaction, there are two calls that
> are
> > interesting, in this order:
> >
> >    1. CallXactCallbacks which calls registered callbacks
> >    2. smgrDoPendingDeletes, which calls the storage layer directly to
> >    perform the actual deletion, if necessary.
> >
> > Now, suppose that we want to replace the storage layer with a different
> > one. It is straightforward to replace it by implementing the Table AM
> > methods above, but we are missing a callback on dropping the table. If we
> > have that, we can record the table-to-be-dropped in a similar manner to
> how
> > the heap AM does it and register a transaction callback using
> > RegisterXactCallback.
>
>   don't think implementing dropping relation data at-commit/rollback using
> xact callbacks can be correct. The dropping needs to be integrated with the
> commit / abort records, so it is redone during crash recovery - that's not
> possible with xact callbacks.
>

Yes, but this patch is about making the extension aware that a file node is
being unlinked.


> To me it still seems fundamentally the wrong direction to implement a "drop
> relation callback" tableam callback.
>

This is not really a "drop table" callback, it is just the most obvious
case where this is missing. So, just to recap the situation as it looks
right now.

Here is (transactional) truncate table:

   1. Allocate a new file node in the same tablespace as the table
   2. Add the file node to the list of pending node to delete
   3. Overwrite the existing file node in the relation with the new one
   4. Call table_relation_set_new_filenode to tell extension that there is
   a new filenode for the relation

Here is drop table:

   1. Add the existing file node to the list of pending deletes
   2. Remove the table from the catalogs

For an extension writer, the disappearance of the old file node is
"invisible" since there is no callback about this, but it is very clear
when a new file node is allocated. In addition to being inconsistent, it
adds an extra burden on the extension writer. To notice that a file node
has been unlinked you can register a transaction handler and investigate
the pending list at commit or abort time. Even though possible, there are
two problems with this: 1) the table access method is notified "late" in
the transaction that the file node is going away, and 2) it is
unnecessarily complicated to register a transaction handler only for
inspecting this.

Telling the access method that the filenode is unlinked by adding a
callback is by far the best solution since it does not affect existing
extensions and will give the table access methods opportunities to act on
it immediately.

I have attached an updated patch that changes the names of the callbacks
since there was a name change. I had also missed the case of unlinking a
file node when tables were truncated, so I have added a callback for this
as well.

Best wishes,
Mats Kindahl


> Greetings,
>
> Andres Freund
>
From 5023c5d9deea7008e4b74f02564262714fb8c6b6 Mon Sep 17 00:00:00 2001
From: Mats Kindahl <m...@timescale.com>
Date: Mon, 31 Oct 2022 14:27:03 +0100
Subject: [PATCH] Add callback for unlinking file node of table

When dropping or truncating a table, the old file locator is unlinked
and a new file locator created for the relation. This is, however, done
without notifying the table access method that the unlinking is taking
place. This becomes a problem if the table access method has an
internal resource associated with the file locator that also needs to
be unlinked or cleaned up.

This commit add a callback to the table access method for resetting the
file locator when it is dropped and call it to unlink the file locator
so that the table access method and unlink the resource internally, if
necessary. The callback can be undefined (that is, NULL) if the table
access method do not require notification that a resource is unlinked.
---
 src/backend/access/heap/heapam_handler.c |  7 +++++++
 src/backend/catalog/heap.c               | 22 ++++++++++++++++++++-
 src/backend/utils/cache/relcache.c       |  7 ++++++-
 src/include/access/tableam.h             | 25 ++++++++++++++++++++++++
 4 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 41f1ca65d0..d67b1dcfbf 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -615,6 +615,12 @@ heapam_relation_set_new_filelocator(Relation rel,
 	smgrclose(srel);
 }
 
+static void
+heapam_relation_reset_filelocator(Relation rel)
+{
+	RelationDropStorage(rel);
+}
+
 static void
 heapam_relation_nontransactional_truncate(Relation rel)
 {
@@ -2566,6 +2572,7 @@ static const TableAmRoutine heapam_methods = {
 	.index_delete_tuples = heap_index_delete_tuples,
 
 	.relation_set_new_filelocator = heapam_relation_set_new_filelocator,
+	.relation_reset_filelocator = heapam_relation_reset_filelocator,
 	.relation_nontransactional_truncate = heapam_relation_nontransactional_truncate,
 	.relation_copy_data = heapam_relation_copy_data,
 	.relation_copy_for_cluster = heapam_relation_copy_for_cluster,
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 5b49cc5a09..1d1379424f 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1850,7 +1850,27 @@ heap_drop_with_catalog(Oid relid)
 	 * Schedule unlinking of the relation's physical files at commit.
 	 */
 	if (RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
-		RelationDropStorage(rel);
+		switch (rel->rd_rel->relkind)
+		{
+			case RELKIND_VIEW:
+			case RELKIND_COMPOSITE_TYPE:
+			case RELKIND_FOREIGN_TABLE:
+			case RELKIND_PARTITIONED_TABLE:
+			case RELKIND_PARTITIONED_INDEX:
+				Assert(false);
+				break;
+
+			case RELKIND_INDEX:
+			case RELKIND_SEQUENCE:
+				RelationDropStorage(rel);
+				break;
+
+			case RELKIND_RELATION:
+			case RELKIND_TOASTVALUE:
+			case RELKIND_MATVIEW:
+				table_relation_reset_filelocator(rel);
+				break;
+		}
 
 	/* ensure that stats are dropped if transaction commits */
 	pgstat_drop_relation(rel);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index bd6cd4e47b..d39c57b855 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3778,9 +3778,14 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 		smgrdounlinkall(&srel, 1, false);
 		smgrclose(srel);
 	}
+	if (RELKIND_HAS_TABLE_AM(relation->rd_rel->relkind))
+	{
+		table_relation_reset_filelocator(relation);
+	}
 	else
 	{
-		/* Not a binary upgrade, so just schedule it to happen later. */
+		/* Not a binary upgrade and not a relation with a table access method,
+		 * so just schedule it to happen later. */
 		RelationDropStorage(relation);
 	}
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e45d73eae3..5141f9b3b7 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -581,6 +581,15 @@ typedef struct TableAmRoutine
 												 TransactionId *freezeXid,
 												 MultiXactId *minmulti);
 
+	/*
+	 * This callback needs to schedule the removal of all associations with
+	 * the relation `rel` since the relation is about to be dropped or the
+	 * storage recycled for other reasons.
+	 *
+	 * See also table_relation_reset_filelocator().
+	 */
+	void        (*relation_reset_filelocator) (Relation rel);
+
 	/*
 	 * This callback needs to remove all contents from `rel`'s current
 	 * relfilelocator. No provisions for transactional behaviour need to be
@@ -1600,6 +1609,22 @@ table_relation_set_new_filelocator(Relation rel,
 												  minmulti);
 }
 
+/*
+ * Schedule the removal of all association with storage for the relation.
+ *
+ * This is used when a relation is about to be dropped and removed from the
+ * catalog.
+ *
+ * Since not all access methods require this callback, we do not require the
+ * reset callback to be defined and only call it if it is actually available.
+ */
+static inline void
+table_relation_reset_filelocator(Relation rel)
+{
+	if (rel->rd_tableam)
+		rel->rd_tableam->relation_reset_filelocator(rel);
+}
+
 /*
  * Remove all table contents from `rel`, in a non-transactional manner.
  * Non-transactional meaning that there's no need to support rollbacks. This
-- 
2.34.1

Reply via email to