> On Feb 25, 2026, at 14:52, Chao Li <[email protected]> wrote:
>
>
>
>> On Feb 20, 2026, at 07:41, Melanie Plageman <[email protected]>
>> wrote:
>>
>> On Wed, Jan 14, 2026 at 6:35 PM Melanie Plageman
>> <[email protected]> wrote:
>>>
>>> Thanks! v13 attached.
>>
>> I've added background writer write combining and then spent some time
>> refactoring and benchmarking this over the last few weeks, and I've
>> realized I won't be able to get it in shape for Postgres 19. I've
>> attached version 14 here with my WIP code so I can come back to it in
>> the next release.
>>
>> 0012 is an invasive refactor of GetVictimBuffer() and all the
>> functions it calls that needs to be split apart into smaller commits
>> and redistributed throughout the other patches in the set. Other than
>> that, there are a number of todos both in the code and from a
>> benchmarking perspective:
>>
>> - Probe for preceding as well as following blocks when looking for
>> blocks to combine
>> - Experiment with different batch sizes
>> - Enable and test bulkread and vacuum write combining
>>
>> I also periodically get this troubling warning: "WARNING: invalid
>> page in block 2 of relation "base/16384/2608_fsm"; zeroing out page".
>> So clearly there's at least one bug somewhere.
>>
>> FWIW benchmarks showed a large improvement from backend and bgwriter
>> write combining, but I'll provide more detailed output from benchmarks
>> when I post a more reviewable patch set.
>>
>> 0001 and 0002 are completely independent cleanup that should be pushed
>> anyway, so I might do that, but I didn't want to do so without having
>> posted on the list at least once.
>>
>> - Melanie
>> <v14-0001-Make-FlushUnlockedBuffer-use-its-parameters.patch><v14-0002-Make-ScheduleBufferTagForWriteback-static.patch><v14-0003-Streamline-buffer-rejection-for-bulkreads-of-unl.patch><v14-0004-Allow-PinBuffer-to-skip-increasing-usage-count.patch><v14-0005-Eagerly-flush-bulkwrite-strategy-ring.patch><v14-0006-Write-combining-for-BAS_BULKWRITE.patch><v14-0007-Add-database-Oid-to-CkptSortItem.patch><v14-0008-Implement-checkpointer-data-write-combining.patch><v14-0009-Remove-SyncOneBuffer-and-refactor-BgBufferSync.patch><v14-0010-Normal-backend-write-combining.patch><v14-0011-Add-write-combining-for-background-writer.patch><v14-0012-WIP-refactor.patch>
>
> A few comments for v14:
>
> 1 - 0001
> ```
> - FlushBuffer(buf, reln, IOOBJECT_RELATION, IOCONTEXT_NORMAL);
> + FlushBuffer(buf, reln, io_object, io_context);
> ```
>
> This changes the hardcode IOOBJECT_RELATION to io_object when calling
> FlushBuffer(). But FlushBuffer() itself still uses IOOBJECT_RELATION instead
> of io_object, so the change will actually not take effective:
> ```
> pgstat_count_io_op_time(IOOBJECT_RELATION, io_context,
> IOOP_WRITE, io_start, 1, BLCKSZ);
> ```
>
> So, an update in FlushBuffer() is also needed.
>
> 2 - 0003 - freelist.c
> ```
> bool
> StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool
> from_ring)
> {
> + Assert(strategy);
> +
> /* We only do this in bulkread mode */
> if (strategy->btype != BAS_BULKREAD)
> return false;
> @@ -795,8 +800,14 @@ StrategyRejectBuffer(BufferAccessStrategy strategy,
> BufferDesc *buf, bool from_r
> strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf))
> return false;
>
> + Assert(BufferIsLockedByMe(BufferDescriptorGetBuffer(buf)));
> + Assert(!(pg_atomic_read_u64(&buf->state) & BM_LOCKED));
> ```
>
> I don’t quite understand Assert(!(pg_atomic_read_u64(&buf->state) &
> BM_LOCKED));
>
> The comment says “the buffer header spinlock must not be held,” but I doubt
> this Assert actually ensures that. Since BM_LOCKED is just a bit in a shared
> state, isn't it possible for a concurrent backend to grab the lock right when
> we’re checking it?
>
> If a concurrent process locks the header a split-second before the Assert,
> we’ll get a false-positive crash in a dev environment even though the code is
> fine. If they lock it a split-second after, then the Assert didn't really
> catch anything. It feels like a race condition either way.
>
> I think the only way to truly ensure a spinlock is "not held" is to actually
> acquire it. If we’re just checking the bit like this, it’s just a snapshot
> that might be stale by the time the instruction finishes. What was the
> specific worry here—a self-deadlock, or something else?
>
> 3 - 0004
> ```
> +typedef enum BufferUsageCountChange
> +{
> + BUC_ZERO,
> + BUC_MAX_ONE,
> + BUC_ONE,
> +} BufferUsageCountChange;
> ```
>
> Can we add some brief comments to explain every enum item? I feel hard to
> guess the meaning from the names without further reading the code.
>
> 4 - 0005 - bufmgr.c
> ```
> +/*
> + * Prepare bufdesc for eager flushing.
> + *
> + * Given bufnum, return the buffer descriptor of the buffer to eagerly flush,
> + * pinned and locked and with BM_IO_IN_PROGRESS set, or NULL if this buffer
> + * does not contain a block that should be flushed.
> + *
> + * If returning a buffer, also return its LSN.
> + */
> +static BufferDesc *
> +PrepareOrRejectEagerFlushBuffer(Buffer bufnum)
> ```
>
> This header comment looks stale.
>
> * It says “and with BM_IO_IN_PROGRESS set”, but I don’t see where
> BM_IO_IN_PROGRESS is set in this function.
> * It says “also return its LSN”, but I don’t see LSN is returned.
>
> 5 - 0006
> ```
> + /* Start IO on the first buffer */
> + if (!StartBufferIO(buf_hdr, false, false))
> + goto clean;
> ```
>
> This failure branch feels like leaking the content lock. The buffer was
> already share-locked via BufferLockConditional() earlier, and I don’t see the
> “clean" path unlock that content lock before returning.
>
> 6 - 0009
> ```
> + FlushUnlockedBuffer(bufHdr, NULL, IOOBJECT_RELATION, IOCONTEXT_NORMAL);
> + UnpinBuffer(bufHdr);
> +
> + ScheduleBufferTagForWriteback(wb_context, IOCONTEXT_NORMAL, &bufHdr->tag);
> ```
>
> bufHdr is unlocked and unpinned, it might be unsafe to still use bufHdr->tag.
> I think we can copy bufHdr->tag to a local variable before unlock the buffer.
>
> ~~ My brain is stuck, I will continue 0010 tomorrow ~~
>
7 - 0010
```
- * batch_limit is the largest batch we are allowed to construct given the
- * remaining blocks in the table, the number of available pins, and the
- * current configuration.
+ * max_batch_size is the maximum number of blocks that can be combined into a
+ * single write in general. This function, based on the block number of start,
+ * will determine the maximum IO size for this particular write given how much
+ * of the file remains. max_batch_size is provided by the caller so it doesn't
+ * have to be recalculated for each write.
```
I don’t see the function FindStrategyFlushAdjacents() has a parameter named
max_batch_size, but batch_limit is still there.
8 - 0010
```
static void
+FindFlushAdjacents(BufferDesc *batch_start,
+ uint32 batch_limit,
+ BufferWriteBatch *batch)
+{
+ BufferTag require; /* requested block's buffer tag
*/
+ uint32 newHash; /* hash value for require */
+ LWLock *newPartitionLock; /* buffer partition lock for it */
+ int buf_id;
+
+ /* create a tag so we can lookup the buffers */
+ InitBufferTag(&require, &batch->reln->smgr_rlocator.locator,
+ batch->forkno, InvalidBlockNumber);
+
+ for (; batch->n < batch_limit; batch->n++)
+ {
+ XLogRecPtr lsn;
+
+ require.blockNum = batch->start + batch->n;
+
+ Assert(BlockNumberIsValid(require.blockNum));
```
When I first time read the Assert, I was confused how it can assume the block
to be valid, then I realized that WriteBatchInit() ensures a safe “limit”.
Maybe add a brief comment for the Assert.
9 - 0010
```
+ batch->bufdescs[batch->n] =
+ PrepareOrRejectEagerFlushBuffer(buf_id + 1,
+
&require,
+
newPartitionLock,
+
&lsn);
+ if (lsn > batch->max_lsn)
+ batch->max_lsn = lsn;
+
+ if (batch->bufdescs[batch->n] == NULL)
+ break;
```
I think we can move if (batch->bufdescs[batch->n] == NULL) to before if (lsn >
batch->max_lsn).
This is a correctness issue, because PrepareOrRejectEagerFlushBuffer will set
lsn to InvalidXLogRecPtr when returns NULL, but doing the NULL check early just
feels more reasonable.
10 - 0010
```
+ * max_lsn may be updated if the provided buffer LSN exceeds the current max
+ * LSN.
*/
static BufferDesc *
PrepareOrRejectEagerFlushBuffer(Buffer bufnum,
BufferTag
*require,
+ LWLock
*buftable_lock,
XLogRecPtr *lsn)
```
The function doesn’t have a parameter named max_lsn, is it “lsn”?
11 - 0010
```
+FindFlushAdjacents(BufferDesc *batch_start,
+ uint32 batch_limit,
+ BufferWriteBatch *batch)
```
Looks like batch_start is not used at all in this function.
12 - 0011
```
+ * Callers specify if and by how much they want to bump the buffer's usage
+ * count.
```
I don’t get what this comment means.
~~ As 0012 is WIP, I didn't review it. ~~
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/