On Thu, Oct 31, 2019 at 09:43:47AM +0100, Antonin Houska wrote:
> Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
>> Isn't this prevented by locking of the buffer header? Both FlushBuffer
>> and MarkBufferDirtyHint do obtain that lock. I see MarkBufferDirtyHint
>> does a bit of work before, but that's related to BM_PERMANENT.

In FlushBuffer() you have a window after fetching the buffer state and
the header is once unlocked until TerminateBufferIO() gets called
(this would take again a lock on the buffer header), so it is
logically possible for the buffer to be marked as dirty once again
causing the update of the LSN on the page to be missed even if a
backup block has been written in WAL.

> Yes, I had verified it using gdb: inserted a row into a table, then attached
> gdb to checkpointer and stopped it when FlushBuffer() was about to call
> TerminateBufferIO().  Then, when scanning the table, I saw that
> MarkBufferDirtyHint() skipped the call of PageSetLSN(). Finally, before
> checkpointer unlocked the buffer header in TerminateBufferIO(), buf_state was
> 3553624066 ~ 0b11010011110100000000000000000010.

Small typo here: 11010011110100000000000000000010...

> With BM_DIRTY defined as
> 
>       #define BM_DIRTY                                (1U << 23)
> 
> the flag appeared to be set.

...  Still, the bit is set here.

Does something like the attached patch make sense?  Reviews are
welcome.
--
Michael
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7ad10736d5..eba9a4716d 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3467,7 +3467,7 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 		(BM_DIRTY | BM_JUST_DIRTIED))
 	{
 		XLogRecPtr	lsn = InvalidXLogRecPtr;
-		bool		dirtied = false;
+		bool		update_stats = false;
 		bool		delayChkpt = false;
 		uint32		buf_state;
 
@@ -3524,26 +3524,28 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 
 		Assert(BUF_STATE_GET_REFCOUNT(buf_state) > 0);
 
+		/*
+		 * This allows to update cost accounting below without keeping
+		 * the buffer header lock around.
+		 */
 		if (!(buf_state & BM_DIRTY))
-		{
-			dirtied = true;		/* Means "will be dirtied by this action" */
+			update_stats = true;
 
-			/*
-			 * Set the page LSN if we wrote a backup block. We aren't supposed
-			 * to set this when only holding a share lock but as long as we
-			 * serialise it somehow we're OK. We choose to set LSN while
-			 * holding the buffer header lock, which causes any reader of an
-			 * LSN who holds only a share lock to also obtain a buffer header
-			 * lock before using PageGetLSN(), which is enforced in
-			 * BufferGetLSNAtomic().
-			 *
-			 * If checksums are enabled, you might think we should reset the
-			 * checksum here. That will happen when the page is written
-			 * sometime later in this checkpoint cycle.
-			 */
-			if (!XLogRecPtrIsInvalid(lsn))
-				PageSetLSN(page, lsn);
-		}
+		/*
+		 * Set the page LSN if we wrote a backup block. We aren't supposed
+		 * to set this when only holding a share lock but as long as we
+		 * serialise it somehow we're OK. We choose to set LSN while holding
+		 * the buffer header lock, which causes any reader of an * LSN who
+		 * holds only a share lock to also obtain a buffer header * lock
+		 * before using PageGetLSN(), which is enforced in
+		 * BufferGetLSNAtomic().
+		 *
+		 * If checksums are enabled, you might think we should reset the
+		 * checksum here. That will happen when the page is written sometime
+		 * later in this checkpoint cycle.
+		 */
+		if (!XLogRecPtrIsInvalid(lsn))
+			PageSetLSN(page, lsn);
 
 		buf_state |= BM_DIRTY | BM_JUST_DIRTIED;
 		UnlockBufHdr(bufHdr, buf_state);
@@ -3551,8 +3553,9 @@ MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 		if (delayChkpt)
 			MyPgXact->delayChkpt = false;
 
-		if (dirtied)
+		if (update_stats)
 		{
+			/* Do vacuum cost accounting */
 			VacuumPageDirty++;
 			pgBufferUsage.shared_blks_dirtied++;
 			if (VacuumCostActive)

Attachment: signature.asc
Description: PGP signature

Reply via email to