On Sun, Apr 06, 2025 at 11:00:54AM -0700, Noah Misch wrote:
> I pushed that as commit 8e7e672 (2024-10-25). I now think DELAY_CHKPT_START
> is superfluous here, per this proc.h comment:
>
> * (In the
> * extremely common case where the data being modified is in shared buffers
> * and we acquire an exclusive content lock on the relevant buffers before
> * writing WAL, this mechanism is not needed, because phase 2 will block
> * until we release the content lock and then flush the modified data to
> * disk.)
>
> heap_inplace_update_and_unlock() meets those conditions. Its closest
> precedent is XLogSaveBufferForHint(), which does need DELAY_CHKPT_START due to
> having only BUFFER_LOCK_SHARED.
True so far, but ...
> heap_inplace_update_and_unlock() has
> BUFFER_LOCK_EXCLUSIVE, so it doesn't need DELAY_CHKPT_START.
... not so, because we've not yet done MarkBufferDirty(). transam/README
cites SyncOneBuffer(), which says:
* Check whether buffer needs writing.
*
* We can make this check without taking the buffer content lock so long
* as we mark pages dirty in access methods *before* logging changes
with
* XLogInsert(): if someone marks the buffer dirty just after our check
we
* don't worry because our checkpoint.redo points before log record for
* upcoming changes and so we are not required to write such dirty
buffer.
The attached patch updates the aforementioned proc.h comment and the
heap_inplace_update_and_unlock() comment that my last message proposed.
From: Noah Misch <[email protected]>
Comment on need to MarkBufferDirty() if omitting DELAY_CHKPT_START.
Blocking checkpoint phase 2 requires MarkBufferDirty() and
BUFFER_LOCK_EXCLUSIVE; neither suffices by itself. transam/README documents
this, citing SyncOneBuffer(). Update the DELAY_CHKPT_START documentation to
say this. Expand the heap_inplace_update_and_unlock() comment that cites
XLogSaveBufferForHint() as precedent, since heap_inplace_update_and_unlock()
could have opted not to use DELAY_CHKPT_START.
Commit 8e7e672cdaa6bfec85d4d5dd9be84159df23bb41 added DELAY_CHKPT_START to
heap_inplace_update_and_unlock(). Since commit
bc6bad88572501aecaa2ac5d4bc900ac0fd457d5 reverted it in non-master branches,
no back-patch.
Discussion: https://postgr.es/m/[email protected]
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ed2e302..c1a4de1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6507,9 +6507,17 @@ heap_inplace_update_and_unlock(Relation relation,
* [crash]
* [recovery restores datfrozenxid w/o relfrozenxid]
*
- * Like in MarkBufferDirtyHint() subroutine XLogSaveBufferForHint(),
copy
- * the buffer to the stack before logging. Here, that facilitates a FPI
- * of the post-mutation block before we accept other sessions seeing it.
+ * Mimic MarkBufferDirtyHint() subroutine XLogSaveBufferForHint().
+ * Specifically, use DELAY_CHKPT_START, and copy the buffer to the
stack.
+ * The stack copy facilitates a FPI of the post-mutation block before we
+ * accept other sessions seeing it. DELAY_CHKPT_START allows us to
+ * XLogInsert() before MarkBufferDirty(). Since XLogSaveBufferForHint()
+ * can operate under BUFFER_LOCK_SHARED, it can't avoid
DELAY_CHKPT_START.
+ * This function, however, likely could avoid it with the following
order
+ * of operations: MarkBufferDirty(), XLogInsert(), memcpy(). Opt to use
+ * DELAY_CHKPT_START here, too, as a way to have fewer distinct code
+ * patterns to analyze. Inplace update isn't so frequent that it should
+ * pursue the small optimization of skipping DELAY_CHKPT_START.
*/
Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0);
START_CRIT_SECTION();
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index f51b03d..86c5f99 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -110,10 +110,10 @@ extern PGDLLIMPORT int FastPathLockGroupsPerBackend;
* is inserted prior to the new redo point, the corresponding data changes will
* also be flushed to disk before the checkpoint can complete. (In the
* extremely common case where the data being modified is in shared buffers
- * and we acquire an exclusive content lock on the relevant buffers before
- * writing WAL, this mechanism is not needed, because phase 2 will block
- * until we release the content lock and then flush the modified data to
- * disk.)
+ * and we acquire an exclusive content lock and MarkBufferDirty() on the
+ * relevant buffers before writing WAL, this mechanism is not needed, because
+ * phase 2 will block until we release the content lock and then flush the
+ * modified data to disk. See transam/README and SyncOneBuffer().)
*
* Setting DELAY_CHKPT_COMPLETE prevents the system from moving from phase 2
* to phase 3. This is useful if we are performing a WAL-logged operation that