On Tue, Feb 17, 2026 at 7:31 PM Andres Freund <[email protected]> wrote:
> Interestingly, I can't reproduce a regression compared to index prefetching
> being disabled.
>
> If I evict just prefetch_customers between runs, I see prefetch + no-yield
> being the fastest by a good amount.
>
> If I evict prefetch_customers as well as prefetch_customers_pkey, yielding
> wins, but only just about. Which I guess makes sense, the index reads are
> synchronous random reads, and we do more of those if we prefetch too
> aggressively.
That can't have been a factor when I ran the query (which is pretty
obvious from the EXPLAIN ANALYZE output).
> I ran the queries both with pgbench (in a script that evicts the buffers, but
> then just looks at the per-statement time for the SELECT, 30 iterations) and
> separately interactively with EXPLAIN ANALYZE to get IO stats.
>
>
> This is with debug_io_direct=data, were you measuring this without DIO? If so,
> was the data in the page cache or did you evict it from there?
I rarely use debug_io_direct=data for any of my testing. The standard
for me is to use buffered IO + io_uring, while prewarming all indexes
and evicting all heap relations. And by evicting the OS filesystem
cache before each run.
FWIW, the version of the patch you're using is slightly different to
the one I have here. Since I worked on unrelated issues with things
like the cost of rescans with nestloop joins. Another difference is
that the memory allocation for the VM cache is now combined with the
main batch alloc, which seems to be more cache efficient. And saves us
memory for plain index scans.
> We really should add a function to pg_prewarm (or pg_buffercache, or ...) that
> evicts pages in a targeted way from the kernel page cache... Flushing the
> entire kernel pagecache leads to undesirable noise, because it also evicts
> filesystem metadata (on somefilesystems at least) etc.
Yeah, having to flush the kernel page cache has been really inconvenient.
> FWIW, I got a crash in a mark-restore query. I think I see the problem:
>
> /*
> * Release all currently loaded batches, being sure to avoid freeing
> * markBatch (unless called with complete, where we're supposed to)
> */
> for (uint8 i = batchringbuf->headBatch; i != batchringbuf->nextBatch;
> i++)
> {
> IndexScanBatch batch = index_scan_batch(scan, i);
>
> if (complete || batch != markBatch)
> {
> markBatchFreed = (batch == markBatch);
> tableam_util_free_batch(scan, batch);
> }
> }
>
> if (complete && markBatch != NULL && !markBatchFreed)
> {
> /*
> * We didn't free markBatch because it was no longer loaded
> in ring
> * buffer. Do so now instead.
> */
> tableam_util_free_batch(scan, markBatch);
> }
>
> If, in the loop, there's a batch after the markBatch in the ring, it'll reset
> markBatchFreed to false. Which then leads to the batch being freed a second
> time.
Oops.
--
Peter Geoghegan