Simon Riggs wrote:
On Wed, 2007-10-17 at 17:36 +0100, Heikki Linnakangas wrote:
Simon Riggs wrote:
On Wed, 2007-10-17 at 15:02 +0100, Heikki Linnakangas wrote:
Simon Riggs wrote:
If you've got a better problem statement it would be good to get that
right first before we discuss solutions.
Reusing a relfilenode of a deleted relation, before next checkpoint
following the commit of the deleting transaction, for an operation that
doesn't WAL log the contents of the new relation, leads to data loss on
recovery.
OK, thanks.
I wasn't aware we reused refilenode ids. The code in GetNewOid() doesn't
look deterministic to me, or at least isn't meant to be.
GetNewObjectId() should be cycling around, so although the oid index
scan using SnapshotDirty won't see committed deleted rows that shouldn't
matter for 2^32 oids. So what gives?
I don't think you still quite understand what's happening.
Clearly. It's not a problem to admit that.
GetNewOid()
is not interesting here, look at GetNewRelFileNode() instead. And
neither are snapshots or MVCC visibility rules.
Which calls GetNewOid() in all cases, AFAICS.
How does the reuse you say is happening come about? Seems like the bug
is in the reuse, not in how we cope with potential reuse.
After a table is dropped, the dropping transaction has been committed,
and the relation file has been deleted, there's nothing preventing the
reuse. There's no trace of that relfilenode in the system (except in the
WAL, which we never look into except on WAL replay). There's a dead row
in pg_class with that relfilenode, but even that could be vacuumed away
(not that it matters because we don't examine that).
Now the problem is that there's a record in the WAL to delete a relation
file with that relfilenode. If that relfilenode was reused, we delete
the contents of the new relation file when we replay that WAL record.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings