Hi Peter,
On Fri, Mar 27, 2026 at 5:36 PM Peter Geoghegan <[email protected]> wrote: > On Sun, Mar 22, 2026 at 9:14 PM Peter Geoghegan <[email protected]> wrote: > > V17 is attached. This addresses most of your feedback, but defers > > dealing with the trickier parts for now. > > Attached is v18, which addresses the tricky issues I skipped in v17. > > Highlights: > > * New commit (the first) that adds a heapam_indexscan.c file, and > moves all of the relevant existing heapam_handler.c functions over to > this new file. This commit is strictly mechanical, and has no > functional impact. > > * Another new commit (the second) adds the new slot-based index scan > interface, and pushes the VM lookups that currently take place in > nodeIndexonlyScan.c into heapam. This includes the instrumentation > changes that were previously in their own small patch. It also > includes the VISITED_PAGES_LIMIT stuff -- there's no longer any commit > that breaks VISITED_PAGES_LIMIT, nor is there a later commit that > subsequently restores the functionality. > > * We now extract one key piece of infrastructure that other table AMs > can now reuse to manage their own index scan batches: a new inline > function called tableam_util_fetch_next_batch now appears as an > indexbatch.h inline function. > > This function (which appeared under a different name in v17) manages > deciding whether to call amgetbatch (performing the amgetbatch call > when needed) or to simply return the next batch in line (because such > a batch already exists/is already loaded in the ring buffer from > earlier). It also detects and handles cross-batch changes in scan > direction, and sets/reads the fields that track when a batch is > definitely the last one in the given scan direction. > > I tried to go further by also placing the bulk of the code in > heapam_index_getnext_scanbatch_pos in indexbatch.h, but ultimately > concluded that it wasn't worth it. There isn't much code in > heapam_index_getnext_scanbatch_pos to generalize. Some of that code > coordinates quite directly with the heapam read stream callback. > > * Added an "isGuarded" field to IndexScanBatchData. That way we can > assert that any batch loaded within the read stream callback is > already unguarded/has no batch index page buffer pin still held. The > field isn't just for assertions, though. We also rely on it in one or > two places during release builds, which seemed clearer to me. For > example, we no longer require that amunguardbatch routines be > idempotent, which seems like a minor improvement. > > * Other work is broken out into its own commit, to better highlight > issues that aren't strictly related to the amgetbatch changes. > Specifically, a new commit adds the "don't unpin on rescan" behavior. > These changes, which first appeared in the main amgetbatch commit, aim > to minimize regressions with cached workloads (this was discussed > briefly upthread, plus Andres and I discussed it over IM some weeks > back). > > * Replaced what was previously an assert with an equivalent > pg_assume(), within heapam_index_getnext_scanbatch_pos -- which is a > very hot function. We now "pg_assume(batchringbuf->headBatch == > scanPos->batch)", allowing the compiler to exploit this existing > invariant. This seems to help the compiler generate better code, > presumably by allowing more efficient register allocation. I haven't > looked at the disassembly just yet, but it helps with certain > microbenchmarks enough to matter. > > * The "Add UnlockBufferGetLSN utility function" commit now uses an > atomic operation (in the common case where it actually drops the index > page's pin), just like UnlockReleaseBuffer itself following Andres' > commit f39cb8c011062d65e146c1e9d1aae221e96d8320 from today. > > I completed this item quickly, after pulling from master a little > while ago. The expectation for UnlockBufferGetLSN is that it be at > least as good as a BufferGetLSNAtomic followed by a > UnlockReleaseBuffer, so IMV it made sense to include this last minute > addition in v18. > > * The "don't wait" patches are no longer included, since Andres > committed them to the master branch just now. Thanks to both you > (Andres) and Melanie for taking care of this very thorny issue! > > I believe I have now acted on all of Andres' feedback, with one minor > exception: Andres disliked how we use 'if (scan->xs_heapfetch)' to > determine whether to use the scan's batch cache (we don't want to use > it at all if the scan is ending). I just ran out of steam, so this > version doesn't address the problem. But I'm still tracking it as an > open item (I don't disagree with Andres, but this problem is slightly > awkward). Note that I *did* create a helper function here, so we at > least avoid having 2 copies of the loop that looks for a free cache > batch slot. > I am still reviewing / understanding the patch, a couple quick comments based on my review. 1. Looks like off-by-one in the for loop in the patch v18-0015-WIP-aio-io_uring-Use-IO-size-not-IO-queue-to-tri.patch @@ -730,7 +711,39 @@ pgaio_uring_sq_from_io(PgAioHandle *ioh, struct io_uring_sqe *sqe) ioh->op_data.read.iov_length, ioh->op_data.read.offset); + for (int i = 0; i <= ioh->op_data.read.iov_length; i++, iov++) + io_size += iov->iov_len; 2. NIT comment: _bt_endpoint return type changed to IndexScanBatch from bool in the patch v18-0005-Add-interfaces-that-enable-index-prefetching.patch. But the places it is returning false would be good to replace with NULL. Though false is treat as 0/NULL and no compiler errors, it improves the readability of the code. if (!BufferIsValid(btfirstbatch->buf)) { /* * Empty index. Lock the whole relation, as nothing finer to lock * exists. */ PredicateLockRelation(rel, scan->xs_snapshot); _bt_parallel_done(scan); return false; } Thanks, Satya
