Tom Lane wrote:
ISTM that we must fix the bgwriter so that ForgetDatabaseFsyncRequests
causes PendingUnlinkEntrys for the doomed DB to be thrown away too.
This should prevent the unlink-live-data scenario, I think.
Even then, concurrent deletion attempts are probably possible (since
ForgetDatabaseFsyncRequests is asynchronous) and rmtree() is being far
too fragile about dealing with them.  I think that it should be coded
to ignore ENOENT the same as the bgwriter does, and that it should press
on and keep trying to delete things even if it gets a failure.

Yep. I can write a patch for that, unless you're onto it already?

However, this makes me reconsider Florian's suggestion to just make
relfilenode larger and avoid reusing them altogether. It would simplify
the code quite a bit, and make it more robust. That is good because even if we fix these problems per your suggestion, I'm left wondering if we've missed some even weirder corner-cases.

Florian suggested a scheme where the xid and epoch is embedded in the filename, but that's unnecessarily complex. We could just make relfilenode a 64-bit integer. 2^64 should be enough for everyone.

You listed these problems with Florian's suggestion back then:

1. Zero chance of ever backpatching.  (I know I said I wasn't excited
   about that, but it's still a strike against a proposed fix.)

Still true. We would need to do what you suggested for 8.3, but simplifying the code would be good thing in the long run.

2. Adds new fields to RelFileNode, which will be a major code change,
   and possibly a noticeable performance hit (bigger hashtable keys).

We talked about this wrt. map forks, and concluded that it's not an issue. If we add the map forks as well, BufferTag struct would grow from 16 bytes to 24 bytes. It's worth doing some more micro-benchmarking with that, but it's probably acceptable. Or we could allocate a few bits of the 64-bit relfilenode field in RelFileNode to indicate the map fork.

3. Adds new columns to pg_class, which is a real PITA ...

We would only have to change relfilenode from oid to int64.

4. Breaks oid2name and all similar code that knows about relfilenode.

True, but they're not hard to fix.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to