Hi, On 2026-03-10 16:57:35 -0400, Peter Geoghegan wrote: > On Fri, Feb 27, 2026 at 6:52 PM Andres Freund <[email protected]> wrote: > > This is a huge change. Is there a chance we can break it up into more > > manageable chunks? > > Attached is v12, which has revisions that address most of your > feedback items. It also includes items that address problems that I > noticed during performance validation work. > > Highlights: > > * Substantial revisions that give table AMs and index AMs direct > control over batch layout -- without giving up on batch > recycling/caching. This is essentially what you (Andres) requested > because the design from v11 was not sufficiently AM agnostic. In > particular: > > - Table AMs now control the size and layout of visibility information > (in practice heapam uses this to store per-item visibility state from > the visibility map). > > - Index AMs have their own opaque state for things like sibling link > block numbers, avoiding the assumption that other index AMs supporting > amgetbatch will need to work like nbtree and hash as regards how they > navigate to the next index page/index keyspace associated with each > batch.
Nice! > * No more read stream yielding. Numerous new patches from Andres are > now included, which helps with this. In particular, "WIP: read_stream: > Only increase distance when waiting for IO" fixes the problematic > regression in an adversarial query -- the one that prompted me to > invent yielding in the first place. As a result of all this, the read > stream callback added by the prefetching commit itself is now > substantially simpler than it was in v11. Yay. > * There are now a couple of extra patches created by breaking things > into more distinct commits. Namely, there's a new "heapam: Track heap > block in IndexFetchHeapData using xs_blk" commit, as well as a new > "Make IndexScanInstrumentation a pointer in executor scan nodes" > commit. Yay^2. > * Moreover, some commits now appear in a slightly different order, > prioritizing work closer to being committable; those commits now come > first. Yay^3. > * New commit "Use simple hash for PrivateRefCount" addresses some of > the problems we were seeing with PrivateRefCount performance. This > generic optimization addresses an existing problem that would > otherwise be much worse with the index prefetching work in place. Let's get that in soon. Alexandre Felipe posted an implementation of this in https://postgr.es/m/CAE8JnxNTETEUiAOF31%3D_yo%3DpvyAi9npOeJfcTvEJJbi4vomtYA%40mail.gmail.com I don't agree with many of the other changes, but the simplehash conversion contains an interesting piece - the ability to avoid the status field. I'd encourage Alexandre to upstream that separately from this thread (and also separately from the rest of the patches in the above thread). > However, I have NOT yet acted on a few feedback items from Andres: > > * I still don't know what Andres meant about requiring table AMs to > free batch index page buffer pins representing a modularity violation. > I don't see how we can reasonably avoid it while still preserving the > guarantees needed to safely drop buffer pins eagerly during index-only > scans that require prefetching. > > * I'm also not at all sure what Andres meant about index AMs like hash > not holding onto their own buffer pins, given that prefetching uses a > read stream sensitive to the number of buffer pins the backend holds. I tried to respond in https://postgr.es/m/vbb4naf2tvm2tm7yoml54pzvrmn77p4nvq4awfa4wufc3hn7qx%40mof5q6li3xzv to explain my concerns / what I think needs to happen. Greetings, Andres Freund
