Re: index prefetching

Peter Geoghegan Wed, 01 Apr 2026 08:52:06 -0700

On Tue, Mar 31, 2026 at 4:22 PM Andres Freund <[email protected]> wrote:
> > From 742dcbea7d184443d2c4ef3c095b60f47f870f72 Mon Sep 17 00:00:00 2001
> > From: Peter Geoghegan <[email protected]>
> > Date: Wed, 25 Mar 2026 00:22:17 -0400
> > Subject: [PATCH v19 01/18] Move heapam_handler.c index scan code to new 
> > file.
> >
> > Move the heapam index fetch callbacks (index_fetch_begin,
> > index_fetch_reset, index_fetch_end, and index_fetch_tuple) into a new
> > dedicated file.  Also move heap_hot_search_buffer over.  This is a
> > purely mechanical move with no functional impact.
>
> I don't love that this renames arguments (s/call_again/heap_continue/) while
> moving. Makes it harder to verify this is just the mechanical change it claims
> to be.


Will fix.

> > + * IDENTIFICATION
> > + *     src/backend/access/heap/heapam_indexscan.c

> Should probably include access/relscan.h for +IndexFetchTableData, rather than
> relying on the indirect include.

Will fix.

> > Subject: [PATCH v19 02/18] Add slot-based table AM index scan interface.

> > This moves VM checks for index-only scans out of the executor and into
> > heapam, enabling batching of visibility map lookups (though for now we
> > continue to just perform retail lookups).
>
> I'd also add that it's architecturally much cleaner this way.

Agreed, will fix the commit message.

> > A small minority of callers (2 callers in total) continue to use the old
> > table_index_fetch_tuple interface.  This is still necessary with callers
> > that fundamentally need to pass a TID to the table AM.  Both callers
> > perform some kind of constraint enforcement.
>
> I think we may need to do something about this before all of this over. It's
> just too confusing to have the generic sounding index_fetch_tuple() interfaces
> that are barely ever used and *shouldn't* be used when, uhm, fetching a tuple
> pointed to by an index.

Agreed.

> At the very least it seems we should rename the ->index_fetch_tuple() to
> ->index_fetch_tid() or something and remove the call_again, all_dead
> arguments, since they are never used in this context.  I suspect it should
> *not* use the table_index_fetch_begin()/table_index_fetch_end() interface
> either.

I'm working on a version of this that makes index_fetch_tid into a
more specialized thing, purely designed for the requirements of the 2
remaining callers (that just need to do constraint enforcement) that
absolutely have to pass a TID/that fundamentally cannot use a
slot-based interface. This approach will avoid the need to call
table_index_fetch_begin()/table_index_fetch_end() -- because it won't
even have an IndexFetchTableData to pass.

Something that has long bothered me about the current structure of the
patch is that indexam.c sometimes resets the batch structure (e.g., on
rescans), while heapam sometimes does very similar things (e.g., when
the scan changes direction). This means heapam relies on indexam.c
having already reset the batch ring buffer when
heapam_index_fetch_reset is called on a rescan; it only needs to reset
the read stream state (which is fairly closely related to the batch
ring buffer and has invariants involving the ring buffer). In short,
the ownership of the batch ring buffer is muddled.

I plan on clarifying things in this area for v20. It will fully
embrace the idea that IndexFetchTableData is just the table AM
specific part of IndexScanDescData. This will allow me to move most of
the batch-related calls currently in indexam.c over to heapam/the
table AM. That will make it very clear that the table AM fully owns
the batch ring buffer (there'll continue to be indexam.c calls to do
things like take a mark, since that is implemented in a way that's
already table AM agnostic).

One problem with the current design is that it still treats
IndexFetchTableData as not just the table AM piece of
IndexScanDescData. I now realize that it works that way purely so we
can avoid changing anything about index_fetch_tuple. But, as you say,
we really *should* do that anyway -- it shouldn't need its own
IndexFetchTableData (or its own IndexScanDescData), since it literally
doesn't have anything to do with index scans.

> > This convention lets both plain and index-only scans use the same
> > table_index_getnext_slot interface.
>
> Not convinced that's really a useful goal.

I don't think so either. That was just pro forma. The next version
won't say this.

> Hm. Why is it ok that we now don't have an equivalent of xs_heap_continue on
> the scan level anymore?  I see there's a local heap_continue in
> heapam_index_getnext_slot(), but isn't that insufficient e.g. for a
> SNAPSHOT_ANY snapshot, where all the row versions would be visible?
>
> Am I misunderstanding how this works?

No, I just screwed this up -- not sure how I missed it. It's a pity
the tests didn't fail as a result.

Will fix.

> > +static inline bool
> > +table_index_getnext_slot(IndexScanDesc iscan, ScanDirection direction,
> > +                                              TupleTableSlot *slot)
> > +{
> > +     return iscan->xs_getnext_slot(iscan, direction, slot);
> > +}
>
> Hm. Does this actually belong in tableam.h? I'm a bit conflicted.

I can see arguments for both.

> > +     .index_plain_amgettuple_getnext_slot = 
> > heapam_index_plain_amgettuple_getnext_slot,
> > +     .index_only_amgettuple_getnext_slot = 
> > heapam_index_only_amgettuple_getnext_slot,
> >       .index_fetch_tuple = heapam_index_fetch_tuple,
> >
> >       .tuple_insert = heapam_tuple_insert,
>
> Wonder if it's perhaps worth deleting _slot and shortening getnext to
> next. These are quite only names...

I will shorten these for the next version.

> > +static pg_attribute_always_inline bool
> > +heapam_index_getnext_slot(IndexScanDesc scan, ScanDirection direction,
> > +                                               TupleTableSlot *slot, bool 
> > index_only)
>
> For heapam_index_fetch_tuple_impl() you added _impl, but here you didn't.
> Mild preference for also adding _impl here.

That was when there was a separate heapam_index_fetch_tuple for the
callback. But that's going away anyway now.

> > +     bool            all_visible = false;
> > +     BlockNumber last_visited_block = InvalidBlockNumber;
> > +     uint8           n_visited_pages = 0,
> > +                             xs_visited_pages_limit = 0;
> > +
> > +     if (index_only)
> > +             xs_visited_pages_limit = scan->xs_visited_pages_limit;
>
> Did you check that it's useful to have xs_visited_pages_limit as a longer
> lived variable? I suspect it's just going to add register pressure and will
> need to be saved/restored across other calls, making it just as cheap to
> always fetch it from scan.

Will fix.

> Still find it somewhat weird that we have local 'tid' variable, that we do not
> use pass to heapam_index_fetch_heap(), because heapam_index_fetch_heap()
> relies on index_getnext_tid() having stored the to-be-fetched tid in
> xs_heaptid.
>
> But I guess that's something to change at a later date.

Agreed, it is weird.

> Previously the ExecClearTuple() here had at least this comment:
> /* We don't actually need the heap tuple for anything */
>
> Now it looks even more confusing...

I will add that comment back.

> Orthogonal: I suspect eventually we should pass the table to
> RelationGetIndexScan(), have the tableam return how much space it needs, and
> allocate it one chunk.

Agreed. Since these two structs (IndexScanDescData and
IndexFetchTableData) are essentially 2 different parts of the same
thing. That might have been debatable before now, but everything we're
doing here is moving in that direction.

> > From 85be7bf520fc2746f4ee73a0744c7aa04da55e52 Mon Sep 17 00:00:00 2001
> > From: Peter Geoghegan <[email protected]>
> > Date: Tue, 10 Mar 2026 14:40:35 -0400
> > Subject: [PATCH v19 03/18] heapam: Track heap block in IndexFetchHeapData.
>
> LGTM.

Cool.

> > --- a/src/backend/access/heap/heapam_indexscan.c
> > +++ b/src/backend/access/heap/heapam_indexscan.c
> > @@ -59,18 +59,15 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)

> > +     /* Resets are a no-op (XXX amgetbatch commit resets xs_vm_items here) 
> > */
> > +     (void) hscan;
>
> I assume that's not a line you intend to commit?

Will fix.

> LGTM otherwise.

Cool.

> Need to do something else for a while.  More another time.

Thanks for the review!

--
Peter Geoghegan

Re: index prefetching

Reply via email to