Hi, On 2021-02-22 08:33:21 +0100, Mats Kindahl wrote: > I started to experiment with the table access method interface to see if it > can be used for some ideas I have.
Cool. > The `relation_set_new_filenode` is indirectly called from > `heap_create_with_catalog`, but there is no corresponding callback from > `heap_drop_with_catalog`. It also seems like the intention is that the > callback should call `RelationCreateStorage` itself (makes sense, since the > access method knows about how to use the storage), so it seems natural to > add a `relation_reset_filenode` to the table AM that is called from > `heap_drop_with_catalog` for tables and add that to the heap implementation > (see the attached patch). I don't think that's quite right. It's not exactly obvious from the name, but RelationDropStorage() does not actually drop storage. Instead it *schedules* the storage to be dropped upon commit. The reason for deferring the dropping of table storage is that DDL in postgres is transactional. Therefore we cannot remove the storage at the moment the DROP TABLE is executed - only when the transaction that performed the DDL commits. Therefore just providing you with a callback that runs in heap_drop_with_catalog() doesn't really achieve much - you'd not have a way to execute the "actual" dropping of the relation at the later stage. > Creating new blocks for a table is straightforward to implement by using > the `relation_set_new_filenode` callback where you can create new memory > blocks for a relation, but I cannot find a way to clean up those blocks > when the table is dropped nor a way to handle a change of the schema for a > table. What precisely do you mean with the "handle a change of the schema" bit? I.e. what would you like to do, and what do you think is preventing you from it? But before you answer see my next point below. > Altering the schema does not seem to be covered at all, but this is > something that table access methods need to know about since it might want > to optimize the internal storage when the schema changes. I have not been > able to find any discussions around this, but it seems like a natural thing > to do with a table. Have I misunderstood how this works? Due to postgres' transactional DDL you cannot really change the storage layout of *existing data* when that DDL command is executed - the data still needs to be interpretable in case the DDL is rolled back (including when crashing). Before I explain some more: Could you describe in a bit more detail what kind of optimization you'd like to make? Back to schema change handling: For some schema changes postgres assumes that they can be done "in-place", e.g. adding a column to a table. Other changes, e.g. changing the type of a column "sufficiently", will cause a so called table rewrite. Which means that a new relation will be created (including a call to relation_set_new_filenode()), then that new relation will get all the new data inserted, and then pg_class->relfilenode for the "original" relation will be changed to the "rewritten" table (there's two variants of this, once for rewrites due to ALTER TABLE and a separate one for VACUUM FULL/CLUSTER). When the transaction containing such a rewrite commits that ->relfilenode change becomes visible for everyone, and the old relfilenode will be deleted. This means that right now there's no easy way to store the data anywhere but in the file referenced by pg_class.relfilenode. I don't think anybody would object on principle to making the necessary infrastructure changes to support storing data elsewhere - but I think it'll also not quite as simple as the change you suggested :(. Greetings, Andres Freund