On Sun, Mar 12, 2017 at 3:05 PM, Peter Geoghegan <p...@bowt.ie> wrote: > There is still an open item here, though: The leader-as-worker > Tuplesortstate, a special case, can still leak files.
I phrased this badly. What I mean is that there can be instances where temp files are left on disk following a failure such as palloc() OOM; no backend ends up doing an unlink() iff a leader-as-worker Tuplesortstate was used and we get unlucky. I did not mean a leak of virtual or real file descriptors, which would see Postgres print a refcount leak warning from resowner.c. Naturally, these "leaked" files will eventually be deleted by the next restart of the server at the latest, within RemovePgTempFiles(). Note also that a duplicate unlink() (with annoying LOG message) is impossible under any circumstances with V9, regardless of whether or not a leader-as-worker Tuplesort state is involved. Anyway, I was sure that I needed to completely nail this down in order to be consistent with existing guarantees, but another look at OpenTemporaryFile() makes me doubt that. ResourceOwnerEnlargeFiles() is called, which itself uses palloc(), which can of course fail. There are remarks over that function within resowner.c about OOM: /* * Make sure there is room for at least one more entry in a ResourceOwner's * files reference array. * * This is separate from actually inserting an entry because if we run out * of memory, it's critical to do so *before* acquiring the resource. */ void ResourceOwnerEnlargeFiles(ResourceOwner owner) { ... } But this happens after OpenTemporaryFileInTablespace() has already returned. Taking care to allocate memory up-front here is motivated by keeping the vFD cache entry and current resource owner in perfect agreement about the FD_XACT_TEMPORARY-ness of a file, and that's it. It's *not* true that there is a broader sense in which OpenTemporaryFile() is atomic, which for some reason I previously believed to be the case. So, I haven't failed to prevent an outcome that wasn't already possible. It doesn't seem like it would be that hard to fix this, and then have the parallel tuplesort patch live up to that new higher standard. But, it's possible that Tom or maybe someone else would consider that a bad idea, for roughly the same reason that we don't call RemovePgTempFiles() for *crash* induced restarts, as mentioned by Thomas up-thead: * NOTE: we could, but don't, call this during a post-backend-crash restart * cycle. The argument for not doing it is that someone might want to examine * the temp files for debugging purposes. This does however mean that * OpenTemporaryFile had better allow for collision with an existing temp * file name. */ void RemovePgTempFiles(void) { ... } Note that I did put some thought into making sure OpenTemporaryFile() does the right thing with collisions with existing temp files. So, maybe the right thing is to do nothing at all. I don't have strong feelings either way on this question. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers