Re: index prefetching

Andres Freund Tue, 10 Mar 2026 15:48:09 -0700

Hi,

On 2026-03-10 16:57:35 -0400, Peter Geoghegan wrote:
> On Fri, Feb 27, 2026 at 6:52 PM Andres Freund <[email protected]> wrote:
> > This is a huge change. Is there a chance we can break it up into more
> > manageable chunks?
> 
> Attached is v12, which has revisions that address most of your
> feedback items. It also includes items that address problems that I
> noticed during performance validation work.
> 
> Highlights:
> 
> * Substantial revisions that give table AMs and index AMs direct
> control over batch layout -- without giving up on batch
> recycling/caching. This is essentially what you (Andres) requested
> because the design from v11 was not sufficiently AM agnostic. In
> particular:
> 
> - Table AMs now control the size and layout of visibility information
> (in practice heapam uses this to store per-item visibility state from
> the visibility map).
> 
> - Index AMs have their own opaque state for things like sibling link
> block numbers, avoiding the assumption that other index AMs supporting
> amgetbatch will need to work like nbtree and hash as regards how they
> navigate to the next index page/index keyspace associated with each
> batch.


Nice!


> * No more read stream yielding. Numerous new patches from Andres are
> now included, which helps with this. In particular, "WIP: read_stream:
> Only increase distance when waiting for IO" fixes the problematic
> regression in an adversarial query -- the one that prompted me to
> invent yielding in the first place. As a result of all this, the read
> stream callback added by the prefetching commit itself is now
> substantially simpler than it was in v11.

Yay.


> * There are now a couple of extra patches created by breaking things
> into more distinct commits. Namely, there's a new "heapam: Track heap
> block in IndexFetchHeapData using xs_blk" commit, as well as a new
> "Make IndexScanInstrumentation a pointer in executor scan nodes"
> commit.

Yay^2.


> * Moreover, some commits now appear in a slightly different order,
> prioritizing work closer to being committable; those commits now come
> first.

Yay^3.


> * New commit "Use simple hash for PrivateRefCount" addresses some of
> the problems we were seeing with PrivateRefCount performance. This
> generic optimization addresses an existing problem that would
> otherwise be much worse with the index prefetching work in place.

Let's get that in soon.

Alexandre Felipe posted an implementation of this in
https://postgr.es/m/CAE8JnxNTETEUiAOF31%3D_yo%3DpvyAi9npOeJfcTvEJJbi4vomtYA%40mail.gmail.com

I don't agree with many of the other changes, but the simplehash conversion
contains an interesting piece - the ability to avoid the status field.  I'd
encourage Alexandre to upstream that separately from this thread (and also
separately from the rest of the patches in the above thread).



> However, I have NOT yet acted on a few feedback items from Andres:
> 
> * I still don't know what Andres meant about requiring table AMs to
> free batch index page buffer pins representing a modularity violation.
> I don't see how we can reasonably avoid it while still preserving the
> guarantees needed to safely drop buffer pins eagerly during index-only
> scans that require prefetching.
> 
> * I'm also not at all sure what Andres meant about index AMs like hash
> not holding onto their own buffer pins, given that prefetching uses a
> read stream sensitive to the number of buffer pins the backend holds.

I tried to respond in
https://postgr.es/m/vbb4naf2tvm2tm7yoml54pzvrmn77p4nvq4awfa4wufc3hn7qx%40mof5q6li3xzv
to explain my concerns / what I think needs to happen.



Greetings,

Andres Freund

Re: index prefetching

Reply via email to