On 3/10/26 21:41, Andres Freund wrote:
> Hi,
>
> On 2026-03-10 19:41:54 +0100, Tomas Vondra wrote:
>> On 2/5/26 16:38, Peter Geoghegan wrote:
>>> On Wed, Jan 14, 2026 at 1:31 AM Andreas Karlsson <[email protected]> wrote:
>>>> Yeah, that was a quite big thinko. I have a attached a patch with the
>>>> thinko fixed but I am still not happy with it. I think I will try to use
>>>> atomics.h and see if that makes the code nicer to read.
>>>>
>>>> Also will after that see what I can do about pageinspect.
>>>
>>> I believe that pageinspect's heap_page_items function needs to use
>>> get_page_from_raw -- see commit 14e9b18fed.
>
> Maybe I'm slow today, but how's that related to the patch at hand?
>
> Regardless of that, it seems a bit confusing to fold the pageinspect changes
> into the main commit.
>
That's because of alignment requirements, per this comment:
* On machines with MAXALIGN = 8, the payload of a bytea is not
* maxaligned, since it will start 4 bytes into a palloc'd value.
* PageHeaderData requires 8 byte alignment, so always use this
* function when accessing page header fields from a raw page bytea.
AFAIK proper alignment is required to make the load atomic.
>> XLogRecPtr
>> BufferGetLSNAtomic(Buffer buffer)
>> {
>> +#ifdef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY
>> + Assert(BufferIsValid(buffer));
>> + Assert(BufferIsPinned(buffer));
>> +
>> + return PageGetLSN(BufferGetPage(buffer));
>> +#else
>> char *page = BufferGetPage(buffer);
>> BufferDesc *bufHdr;
>> XLogRecPtr lsn;
>>
>> + /* Make sure we've got a real buffer, and that we hold a pin on it. */
>> + Assert(BufferIsValid(buffer));
>> + Assert(BufferIsPinned(buffer));
>> +
>
> Seems a bit silly to have these asserts duplicated. I'd probably just put the
> body of the #else into an {} and then have the asserts before the ifdef.
>
OK, WFM.
>
>> diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
>> index 92a6bb9b0c0..e994526ca52 100644
>> --- a/src/include/storage/bufpage.h
>> +++ b/src/include/storage/bufpage.h
>> @@ -91,24 +91,27 @@ typedef uint16 LocationIndex;
>>
>>
>> /*
>> - * For historical reasons, the 64-bit LSN value is stored as two 32-bit
>> - * values.
>> + * For historical reasons, the storage of 64-bit LSN values depends on CPU
>> + * endianness; PageXLogRecPtr used to be a struct consisting of two 32-bit
>> + * values. When reading (and writing) the pd_lsn field from page headers,
>> the
>> + * caller must convert from (and convert to) the platform's native
>> endianness.
>> */
>> -typedef struct
>> -{
>> - uint32 xlogid; /* high bits */
>> - uint32 xrecoff; /* low bits */
>> -} PageXLogRecPtr;
>> +typedef uint64 PageXLogRecPtr;
>
> I think this should explain why we are storing it as a 64bit value (i.e. that
> we need to be able to read it without tearing).
>
>
> I suspect this should continue to be a struct (just containing a 64bit uint),
> otherwise it'll be way way too easy to omit the conversion, due to C's weak
> typedefs.
>
I agree. It'd be trivial to call it with plain XLogRecPtr (especially
now, with a function handling both directions).
>
>> -static inline XLogRecPtr
>> -PageXLogRecPtrGet(PageXLogRecPtr val)
>> +/*
>> + * Convert a pd_lsn taken from a page header into its native
>> + * uint64/PageXLogRecPtr representation
>> + */
>
> Odd double space before pd_lsn.
>
>
>> +static inline PageXLogRecPtr
>> +PageXLogRecPtrGet(PageXLogRecPtr pd_lsn)
>> {
>> - return (uint64) val.xlogid << 32 | val.xrecoff;
>> +#ifdef WORDS_BIGENDIAN
>> + return pd_lsn;
>> +#else
>> + return (pd_lsn << 32) | (pd_lsn >> 32);
>> +#endif
>> }
>>
>> -#define PageXLogRecPtrSet(ptr, lsn) \
>> - ((ptr).xlogid = (uint32) ((lsn) >> 32), (ptr).xrecoff = (uint32) (lsn))
>> -
>> /*
>> * disk page organization
>> *
>> @@ -387,10 +390,11 @@ PageGetLSN(const PageData *page)
>> {
>> return PageXLogRecPtrGet(((const PageHeaderData *) page)->pd_lsn);
>> }
>
> I think this may need to actually use a volatile to force the read to happen
> as the 64bit value, otherwise the compiler would be entirely free to implement
> it as two 32bit reads.
>
I did that in v5 (I think). But at the same time I'm now rather confused
about the meaning of PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY. Because if a
well-aligned load/store of 8B can be implemented as two 32-bit reads
(with a "sequence point" in between), then how is that atomic?
I'm also a bit ... unsure about "volatile" in general. Is that actually
something the specification says is needed here, or is it just an
attempt to "disable optimizations" (like the split into two stores)?
>
>> static inline void
>> PageSetLSN(Page page, XLogRecPtr lsn)
>> {
>> - PageXLogRecPtrSet(((PageHeader) page)->pd_lsn, lsn);
>> + ((PageHeader) page)->pd_lsn = PageXLogRecPtrGet(lsn);
>> }
>
> Dito.
>
> Seems also a bit misleading to document PageXLogRecPtrSet as "Convert a pd_lsn
> taken from a page header into its native ..." when we use it for both
> directions.
>
I agree, I found that confusing too. In the attached v5 I went back to
the separate get/set functions from v3. I think that's better/clearer.
regards
--
Tomas Vondra
From 794fb844266dcd8d97217da09b61c5c9cafc01d7 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <[email protected]>
Date: Fri, 21 Nov 2025 10:15:06 +0100
Subject: [PATCH v5] Do not lock in BufferGetLSNAtomic() on archs with 8 byte
atomic reads
On platforms where we can read or write the whole LSN atomically, we do
not need to lock the buffer header to prevent torn LSNs. We can do this
only on platforms with PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY, and when the
pd_lsn field is properly aligned.
For historical reasons the PageXLogRecPtr was defined as a struct with
two uint32 fields. This replaces it with a single uint64 value, to make
the intent clearer.
Idea by Andres Freund. Initial patch by Andreas Karlsson, improved by
Peter Geoghegan. Minor tweaks by me.
Author: Andreas Karlsson <[email protected]>
Author: Peter Geoghegan <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
contrib/pageinspect/heapfuncs.c | 13 ++------
contrib/pageinspect/rawpage.c | 8 ++---
src/backend/access/common/bufmask.c | 2 +-
src/backend/storage/buffer/bufmgr.c | 44 ++++++++++++++++-----------
src/include/access/gist.h | 4 +--
src/include/storage/bufpage.h | 47 +++++++++++++++++++++++------
6 files changed, 72 insertions(+), 46 deletions(-)
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index 8e31632ce0e..3a61954e1d9 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -32,6 +32,7 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "pageinspect.h"
#include "port/pg_bitutils.h"
#include "utils/array.h"
#include "utils/builtins.h"
@@ -132,25 +133,17 @@ heap_page_items(PG_FUNCTION_ARGS)
bytea *raw_page = PG_GETARG_BYTEA_P(0);
heap_page_items_state *inter_call_data = NULL;
FuncCallContext *fctx;
- int raw_page_size;
if (!superuser())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be superuser to use raw page functions")));
- raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
-
if (SRF_IS_FIRSTCALL())
{
TupleDesc tupdesc;
MemoryContext mctx;
- if (raw_page_size < SizeOfPageHeaderData)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("input page too small (%d bytes)", raw_page_size)));
-
fctx = SRF_FIRSTCALL_INIT();
mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
@@ -163,7 +156,7 @@ heap_page_items(PG_FUNCTION_ARGS)
inter_call_data->tupd = tupdesc;
inter_call_data->offset = FirstOffsetNumber;
- inter_call_data->page = VARDATA(raw_page);
+ inter_call_data->page = get_page_from_raw(raw_page);
fctx->max_calls = PageGetMaxOffsetNumber(inter_call_data->page);
fctx->user_fctx = inter_call_data;
@@ -209,7 +202,7 @@ heap_page_items(PG_FUNCTION_ARGS)
if (ItemIdHasStorage(id) &&
lp_len >= MinHeapTupleSize &&
lp_offset == MAXALIGN(lp_offset) &&
- lp_offset + lp_len <= raw_page_size)
+ lp_offset + lp_len <= BLCKSZ)
{
HeapTupleHeader tuphdr;
diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index af54afe5b09..86fe245cac5 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -208,11 +208,9 @@ get_raw_page_internal(text *relname, ForkNumber forknum, BlockNumber blkno)
* Get a palloc'd, maxalign'ed page image from the result of get_raw_page()
*
* On machines with MAXALIGN = 8, the payload of a bytea is not maxaligned,
- * since it will start 4 bytes into a palloc'd value. On alignment-picky
- * machines, this will cause failures in accesses to 8-byte-wide values
- * within the page. We don't need to worry if accessing only 4-byte or
- * smaller fields, but when examining a struct that contains 8-byte fields,
- * use this function for safety.
+ * since it will start 4 bytes into a palloc'd value. PageHeaderData requires
+ * 8 byte alignment, so always use this function when accessing page header
+ * fields from a raw page bytea.
*/
Page
get_page_from_raw(bytea *raw_page)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 1a9e7bea5d2..8a67bfa1aff 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -32,7 +32,7 @@ mask_page_lsn_and_checksum(Page page)
{
PageHeader phdr = (PageHeader) page;
- PageXLogRecPtrSet(phdr->pd_lsn, (uint64) MASK_MARKER);
+ PageXLogRecPtrSet(&phdr->pd_lsn, (uint64) MASK_MARKER);
phdr->pd_checksum = MASK_MARKER;
}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 5f3d083e938..4b80d43e69c 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4628,33 +4628,41 @@ BufferIsPermanent(Buffer buffer)
/*
* BufferGetLSNAtomic
- * Retrieves the LSN of the buffer atomically using a buffer header lock.
- * This is necessary for some callers who may not have an exclusive lock
- * on the buffer.
+ * Retrieves the LSN of the buffer atomically.
+ *
+ * On platforms without 8 byte atomic reads/writes we need to take a
+ * header lock. This is necessary for some callers who may not have an
+ * exclusive lock on the buffer.
*/
XLogRecPtr
BufferGetLSNAtomic(Buffer buffer)
{
- char *page = BufferGetPage(buffer);
- BufferDesc *bufHdr;
- XLogRecPtr lsn;
-
- /*
- * If we don't need locking for correctness, fastpath out.
- */
- if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
- return PageGetLSN(page);
-
/* Make sure we've got a real buffer, and that we hold a pin on it. */
Assert(BufferIsValid(buffer));
Assert(BufferIsPinned(buffer));
- bufHdr = GetBufferDescriptor(buffer - 1);
- LockBufHdr(bufHdr);
- lsn = PageGetLSN(page);
- UnlockBufHdr(bufHdr);
+#ifdef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY
+ return PageGetLSN(BufferGetPage(buffer));
+#else
+ {
+ char *page = BufferGetPage(buffer);
+ BufferDesc *bufHdr;
+ XLogRecPtr lsn;
- return lsn;
+ /*
+ * If we don't need locking for correctness, fastpath out.
+ */
+ if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+ return PageGetLSN(page);
+
+ bufHdr = GetBufferDescriptor(buffer - 1);
+ LockBufHdr(bufHdr);
+ lsn = PageGetLSN(page);
+ UnlockBufHdr(bufHdr);
+
+ return lsn;
+ }
+#endif
}
/* ---------------------------------------------------------------------
diff --git a/src/include/access/gist.h b/src/include/access/gist.h
index 8fa30126f8f..9b385b13a88 100644
--- a/src/include/access/gist.h
+++ b/src/include/access/gist.h
@@ -186,8 +186,8 @@ typedef struct GISTENTRY
#define GistMarkFollowRight(page) ( GistPageGetOpaque(page)->flags |= F_FOLLOW_RIGHT)
#define GistClearFollowRight(page) ( GistPageGetOpaque(page)->flags &= ~F_FOLLOW_RIGHT)
-#define GistPageGetNSN(page) ( PageXLogRecPtrGet(GistPageGetOpaque(page)->nsn))
-#define GistPageSetNSN(page, val) ( PageXLogRecPtrSet(GistPageGetOpaque(page)->nsn, val))
+#define GistPageGetNSN(page) ( PageXLogRecPtrGet(&GistPageGetOpaque(page)->nsn))
+#define GistPageSetNSN(page, val) ( PageXLogRecPtrSet(&GistPageGetOpaque(page)->nsn, val))
/*
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 92a6bb9b0c0..2c1394478a8 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -91,23 +91,49 @@ typedef uint16 LocationIndex;
/*
- * For historical reasons, the 64-bit LSN value is stored as two 32-bit
- * values.
+ * Store the LSN as a single 64-bit value, to allow atomic loads/stores.
+ *
+ * For historical reasons, the storage of 64-bit LSN values depends on CPU
+ * endianness; PageXLogRecPtr used to be a struct consisting of two 32-bit
+ * values. When reading (and writing) the pd_lsn field from page headers, the
+ * caller must convert from (and convert to) the platform's native endianness.
*/
typedef struct
{
- uint32 xlogid; /* high bits */
- uint32 xrecoff; /* low bits */
+ uint64 lsn;
} PageXLogRecPtr;
+#ifdef WORDS_BIGENDIAN
+
static inline XLogRecPtr
-PageXLogRecPtrGet(PageXLogRecPtr val)
+PageXLogRecPtrGet(const volatile PageXLogRecPtr *val)
{
- return (uint64) val.xlogid << 32 | val.xrecoff;
+ return val->lsn;
}
-#define PageXLogRecPtrSet(ptr, lsn) \
- ((ptr).xlogid = (uint32) ((lsn) >> 32), (ptr).xrecoff = (uint32) (lsn))
+static inline void
+PageXLogRecPtrSet(volatile PageXLogRecPtr *ptr, XLogRecPtr lsn)
+{
+ ptr->lsn = lsn;
+}
+
+#else
+
+static inline XLogRecPtr
+PageXLogRecPtrGet(const volatile PageXLogRecPtr *val)
+{
+ PageXLogRecPtr tmp = {val->lsn};
+
+ return (tmp.lsn << 32) | (tmp.lsn >> 32);
+}
+
+static inline void
+PageXLogRecPtrSet(volatile PageXLogRecPtr *ptr, XLogRecPtr lsn)
+{
+ ptr->lsn = (lsn << 32) | (lsn >> 32);
+}
+
+#endif
/*
* disk page organization
@@ -385,12 +411,13 @@ PageGetMaxOffsetNumber(const PageData *page)
static inline XLogRecPtr
PageGetLSN(const PageData *page)
{
- return PageXLogRecPtrGet(((const PageHeaderData *) page)->pd_lsn);
+ return PageXLogRecPtrGet(&((const volatile PageHeaderData *) page)->pd_lsn);
}
+
static inline void
PageSetLSN(Page page, XLogRecPtr lsn)
{
- PageXLogRecPtrSet(((PageHeader) page)->pd_lsn, lsn);
+ PageXLogRecPtrSet(&((PageHeader) page)->pd_lsn, lsn);
}
static inline bool
--
2.53.0