(This is mostly directed at Vadim, but kibitzing is welcome.)

Here's what I plan to do to make DROP TABLE rollbackable and clean up
the handling of CREATE TABLE rollback.  Comments?


Overview:

1. smgrcreate() will create the file for the relation same as it does now,
and will add the rel's RelFileNode information to an smgr-private list of
rels created in the current transaction.

2. smgrunlink() normally will NOT immediately delete the file; instead it
will perform smgrclose() and then add the rel's RelFileNode information to
an smgr-private list of rels to be deleted at commit.  However, if the
file appears in the list created by smgrcreate() --- ie, the rel is being
created and deleted in the same xact --- then we can delete it
immediately.  In this case we remove the file from the smgrcreate list
and do not put it on the unlink list.

3. smgrcommit() will delete all the files mentioned in the list created
by smgrunlink, then discard both lists.

4. smgrabort() will delete all the files mentioned in the list created
by smgrcreate, then discard both lists.

Points 1 and 4 will replace the existing relcache-based mechanism for
deleting files created in the current xact when the xact aborts.


Various details:

To support deleting files at xact commit/abort, we will need something
like an "mdblindunlink" entrypoint to md.c.  I am inclined to simply
redefine mdunlink to take a RelFileNode instead of a complete Relation,
rather than supporting two entrypoints --- I don't think there'll be any
future use for the existing mdunlink.  Objections?

bufmgr.c's ReleaseRelationBuffers drops any dirty buffers for the target
rel, and therefore it must NOT be called inside the transaction (else,
rollback would mean we'd lost data).  I am inclined to let it continue
to behave that way, but to call it from smgrcommit/smgrabort, not from
anywhere else.  This would mean changing its API to take a RelFileNode,
but that seems no big problem.  This way, dirty buffers for a doomed
relation will be allowed to live until transaction commit, in the hopes
that we will be able to discard them unwritten.

Will remove notices in DROP TABLE etc. warning that these operations
are not rollbackable.  Note that CREATE/DROP DATABASE is still not
rollback-able, and so those two ops will continue to elog(ERROR) when
called in a transaction block.  Ditto for VACUUM; probably also ditto
for REINDEX, though I haven't looked closely at that yet.

The temp table name mapper will need to be modified so that it can
undo all current-xact changes to its name mapping list at xact abort.
Currently I think it only handles undoing additions, not
deletions/renames.  This does not need to be WAL-aware, does it?


WAL:

AFAICS, things will behave properly if calls to smgrcreate/smgrunlink
are logged as WAL events.  For redo, they are executed just the same
as normal, except they shouldn't complain if the target file already
exists (or already doesn't exist, for unlink).  Undo of smgrcreate
is just immediate mdunlink; undo of smgrunlink is a no-op.

I have not studied the WAL code enough to be prepared to add the
logging/undo/redo code, and it looks like you haven't implemented that
anyway yet for smgr.c, so I will leave that part to you, OK?

                        regards, tom lane

Reply via email to