In the release team's discussion leading up to commit 0e758ae89, Andres opined that what commit 4ab5dae94 had done to mdunlinkfork was a mess, and I concur. It invented an entirely new code path through that function, and required two different behaviors from the segment-deletion loop. I think a very straight line can be drawn between that extra complexity and the introduction of a nasty bug. It's all unnecessary too, because AFAICS all we really need is to apply the pre-existing behavior for temp tables and REDO mode to binary-upgrade mode as well.
Hence, the attached reverts everything 4ab5dae94 did to this function, and most of 0e758ae89 too, and instead makes IsBinaryUpgrade an additional reason to take the immediate-unlink path. Barring objections, I'll push this after the release freeze lifts. regards, tom lane
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index b1a2cb575d..0f642df3c1 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -263,6 +263,12 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo) * The fact that temp rels and regular rels have different file naming * patterns provides additional safety. * + * We also don't do it while performing a binary upgrade. There is no reuse + * hazard in that case, since after a crash or even a simple ERROR, the + * upgrade fails and the whole cluster must be recreated from scratch. + * Furthermore, it is important to remove the files from disk immediately, + * because we may be about to reuse the same relfilenumber. + * * All the above applies only to the relation's main fork; other forks can * just be removed immediately, since they are not needed to prevent the * relfilenumber from being recycled. Also, we do not carefully @@ -319,14 +325,15 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo) { char *path; int ret; - BlockNumber segno = 0; path = relpath(rlocator, forknum); /* - * Delete or truncate the first segment. + * Truncate and then unlink the first segment, or just register a request + * to unlink it later, as described in the comments for mdunlink(). */ - if (isRedo || forknum != MAIN_FORKNUM || RelFileLocatorBackendIsTemp(rlocator)) + if (isRedo || IsBinaryUpgrade || forknum != MAIN_FORKNUM || + RelFileLocatorBackendIsTemp(rlocator)) { if (!RelFileLocatorBackendIsTemp(rlocator)) { @@ -351,7 +358,6 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo) ereport(WARNING, (errcode_for_file_access(), errmsg("could not remove file \"%s\": %m", path))); - segno++; } } else @@ -359,42 +365,25 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo) /* Prevent other backends' fds from holding on to the disk space */ ret = do_truncate(path); - /* - * Except during a binary upgrade, register request to unlink first - * segment later, rather than now. - * - * If we're performing a binary upgrade, the dangers described in the - * header comments for mdunlink() do not exist, since after a crash or - * even a simple ERROR, the upgrade fails and the whole new cluster - * must be recreated from scratch. And, on the other hand, it is - * important to remove the files from disk immediately, because we may - * be about to reuse the same relfilenumber. - */ - if (!IsBinaryUpgrade) - { - register_unlink_segment(rlocator, forknum, 0 /* first seg */ ); - segno++; - } + /* Register request to unlink first segment later */ + register_unlink_segment(rlocator, forknum, 0 /* first seg */ ); } /* - * Delete any remaining segments (we might or might not have dealt with - * the first one above). + * Delete any additional segments. */ if (ret >= 0) { char *segpath = (char *) palloc(strlen(path) + 12); + BlockNumber segno; /* * Note that because we loop until getting ENOENT, we will correctly * remove all inactive segments as well as active ones. */ - for (;; segno++) + for (segno = 1;; segno++) { - if (segno == 0) - strcpy(segpath, path); - else - sprintf(segpath, "%s.%u", path, segno); + sprintf(segpath, "%s.%u", path, segno); if (!RelFileLocatorBackendIsTemp(rlocator)) {