Hi,

Hit a case where the server can't checkpoint anymore, and it comes down to the
targeted-drop optimization in DropRelationsAllBuffers().

The scenario is disaster recovery: a relation's data file has gone missing
on disk (failed restore, lost/again-detached storage, a half-finished manual
cleanup, ...), and the administrator does the natural thing -- DROP the
broken relation to get the system going again. The catch is that a dirty
buffer for that relation can still be resident in shared buffers, and every
checkpoint after that fails trying to write it back:

    could not open file "..." while writing block N of relation ...

Before bea449c635c the full scan always ran, so dropping the relation cleaned
the buffer up regardless of the file. The attached patch restores that for the
main fork only -- fsm/vm/init are routinely absent (a permanent rel never has
an init fork), so forcing a full scan on their absence would kill the
optimization for almost every drop.

No in-core reproducer since it needs the file to vanish underneath us, but the
path is clear once it does. Patch attached.

-- 
Adam
>From 0148552645fb463713eb26604930ae4c65385c99 Mon Sep 17 00:00:00 2001
From: Adam Lee <[email protected]>
Date: Tue, 9 Jun 2026 16:22:35 +0800
Subject: [PATCH] Avoid orphaning buffers when a relation's file is missing

When relations are dropped, DropRelationsAllBuffers() avoids scanning the
whole buffer pool if it can read the size of every fork from the cache,
locating the buffers to invalidate directly.  When a fork's size is not
cached it calls smgrexists(), and if the fork's file does not exist it skips
the fork, treating it as having no buffers.

In a disaster-recovery situation, though, a relation's data file can be
missing on disk while a dirty buffer for it is still resident.  Skipping the
fork then leaves that buffer orphaned: the relation is dropped, but the
buffer remains in shared buffers and every later checkpoint fails trying to
write it back to the missing file ("could not open file ... while writing
block N of relation ..."), so the server can no longer checkpoint.

Before the targeted-drop optimization (added in v14, commit bea449c635c) the
buffer pool was always scanned in full, so the buffer was invalidated whether
or not its file still existed, and dropping the relation cleaned it up.
Restore that behavior for the main fork: when its size is uncached and its
file is missing, fall back to the full scan, which invalidates the relation's
buffers across all forks.  The main fork is the only fork a relation with
storage always has; the fsm, vm and init forks are routinely absent on
healthy relations (small tables have no fsm/vm; permanent relations have no
init fork), so triggering a full scan whenever any fork is absent would
disable the optimization for nearly every drop.
---
 src/backend/storage/buffer/bufmgr.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index cc398db124d..1f1fece4a3a 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4951,7 +4951,32 @@ DropRelationsAllBuffers(SMgrRelation *smgr_reln, int 
nlocators)
                        if (block[i][j] == InvalidBlockNumber)
                        {
                                if (!smgrexists(rels[i], j))
+                               {
+                                       /*
+                                        * In a disaster-recovery situation a 
relation's data file
+                                        * may be missing on disk while a dirty 
buffer for the fork
+                                        * is still resident.  Skipping the 
fork (because it has no
+                                        * file) would leave that buffer 
orphaned, after which the
+                                        * checkpointer fails on every run 
trying to write it to the
+                                        * missing file, so the server can no 
longer checkpoint.
+                                        * Fall back to the full buffer-pool 
scan, which invalidates
+                                        * the relation's buffers across all 
forks regardless of the
+                                        * missing file, as was done 
unconditionally before this
+                                        * optimization, so dropping the 
relation can still clean it
+                                        * up.  The main fork is the sentinel: 
it is the only fork a
+                                        * relation with storage always has, 
whereas the fsm, vm and
+                                        * init forks are routinely absent on 
healthy relations
+                                        * (small tables have no fsm/vm; 
permanent relations have no
+                                        * init fork), so triggering on their 
absence would force a
+                                        * full scan on nearly every drop.
+                                        */
+                                       if (j == MAIN_FORKNUM)
+                                       {
+                                               cached = false;
+                                               break;
+                                       }
                                        continue;
+                               }
                                cached = false;
                                break;
                        }
-- 
2.52.0

Reply via email to