On 07/02/2023 16:00, Bharath Rupireddy wrote:
Hi,

While working on [1], I was looking for a quick way to tell if a WAL
record is present in the WAL buffers array without scanning but I
couldn't find one.

/* The end-ptr of the page that contains the record */
expectedEndPtr += XLOG_BLCKSZ - recptr % XLOG_BLCKSZ;

/* get the buffer where the record is, if it's in WAL buffers at all */
idx = XLogRecPtrToBufIdx(recptr);

/* prevent the WAL buffer from being evicted while we look at it */
LWLockAcquire(WALBufMappingLock, LW_SHARED);

/* Check if the page we're interested in is in the buffer */
found = XLogCtl->xlblocks[idx] == expectedEndPtr;

LWLockRelease(WALBufMappingLock, LW_SHARED);

Hence, I put up a patch that basically tracks the
oldest initialized WAL buffer page, named OldestInitializedPage, in
XLogCtl. With OldestInitializedPage, we can easily illustrate WAL
buffers array properties:

1) At any given point of time, pages in the WAL buffers array are
sorted in an ascending order from OldestInitializedPage till
InitializedUpTo. Note that we verify this property for assert-only
builds, see IsXLogBuffersArraySorted() in the patch for more details.

2) OldestInitializedPage is monotonically increasing (by virtue of how
postgres generates WAL records), that is, its value never decreases.
This property lets someone read its value without a lock. There's no
problem even if its value is slightly stale i.e. concurrently being
updated. One can still use it for finding if a given WAL record is
available in WAL buffers. At worst, one might get false positives
(i.e. OldestInitializedPage may tell that the WAL record is available
in WAL buffers, but when one actually looks at it, it isn't really
available). This is more efficient and performant than acquiring a
lock for reading. Note that we may not need a lock to read
OldestInitializedPage but we need to update it holding
WALBufMappingLock.

You actually hint at the above solution here, so I'm confused. If you're OK with slightly stale results, you can skip the WALBufferMappingLock above too, and perform an atomic read of xlblocks[idx] instead.

3) One can start traversing WAL buffers from OldestInitializedPage
till InitializedUpTo to list out all valid WAL records and stats, and
expose them via SQL-callable functions to users, for instance, as
pg_walinspect functions.

4) WAL buffers array is inherently organized as a circular, sorted and
rotated array with OldestInitializedPage as pivot/first element of the
array with the property where LSN of previous buffer page (if valid)
is greater than OldestInitializedPage and LSN of the next buffer page
(if
valid) is greater than OldestInitializedPage.

These properties are true, maybe we should document them explicitly in a comment. But I don't see the point of tracking OldestInitializedPage. It seems cheap enough that we could, if there's a need for it, but I don't see the need.

--
Heikki Linnakangas
Neon (https://neon.tech)



Reply via email to