On Sat, Apr 06, 2024 at 04:57:51PM +0200, Tomas Vondra wrote:
> On 4/6/24 15:40, Melanie Plageman wrote:
> > On Sat, Apr 06, 2024 at 02:51:45AM +0200, Tomas Vondra wrote:
> >>
> >>
> >> On 4/6/24 01:53, Melanie Plageman wrote:
> >>> On Fri, Apr 05, 2024 at 04:06:34AM -0400, Melanie Plageman wrote:
> >>>> On Thu, Apr 04, 2024 at 04:35:45PM +0200, Tomas Vondra wrote:
> >>>>> On 4/4/24 00:57, Melanie Plageman wrote:
> >>>>>> On Sun, Mar 31, 2024 at 11:45:51AM -0400, Melanie Plageman wrote:
> >>>>> I'd focus on the first ~8-9 commits or so for now, we can commit more if
> >>>>> things go reasonably well.
> >>>>
> >>>> Sounds good. I will spend cleanup time on 0010-0013 tomorrow but would
> >>>> love to know if you agree with the direction before I spend more time.
> >>>
> >>> In attached v16, I've split out 0010-0013 into 0011-0017. I think it is
> >>> much easier to understand.
> >>>
> >>
> >> Anyway, I've attached it as .tgz in order to not confuse cfbot. All the
> >> review comments are marked with XXX, so grep for that in the patches.
> >> There's two separate patches - the first one suggests a code change, so
> >> it was better to not merge that with your code. The second has just a
> >> couple XXX comments, I'm not sure why I kept it separate.
> >>
> >> A couple review comments:
> >>
> >> * I think 0001-0009 are 99% ready to. I reworded some of the commit
> >> messages a bit - I realize it's a bit bold, considering you're native
> >> speaker and I'm not. If you could check I didn't make it worse, that
> >> would be great.
> > 
> > Attached v17 has *only* patches 0001-0009 with these changes. I will
> > work on applying the remaining patches, addressing feedback, and adding
> > comments next.
> > 
> > I have reviewed and incorporated all of your feedback on these patches.
> > Attached v17 is your exact patches with 1 or 2 *very* slight tweaks to
> > commit messages (comma splice removal and single word adjustments) as
> > well as the changes listed below:
> > 
> > I have open questions on the following:
> > 
> > - 0003: should it be SO_NEED_TUPLES and need_tuples (instead of
> >     SO_NEED_TUPLE and need_tuple)?
> > 
> 
> I think SO_NEED_TUPLES is more accurate, as we need all tuples from the
> block. But either would work.

Attached v18 changes it to TUPLES/tuples

> 
> > - 0009 (your 0010)
> >     - Should I mention in the commit message that we added blockno and
> >             pfblockno in the BitmapHeapScanState only for validation or is 
> > that
> >             too specific?
> > 
> 
> For the commit message I'd say it's too specific, I'd put it in the
> comment before the struct.

It is in the comment for the struct

> >     - I'm worried this comment is vague and or potentially not totally
> >             correct. Should we remove it? I don't think we have conclusive 
> > proof
> >             that this is true.
> >   /*
> >    * Adjusting the prefetch iterator before invoking
> >    * table_scan_bitmap_next_block() keeps prefetch distance higher across
> >    * the parallel workers.
> >    */
> > 
> 
> TBH it's not clear to me what "higher across parallel workers" means.
> But it sure shouldn't claim things that we think may not be correct. I
> don't have a good idea how to reword it, though.

I realized it makes more sense to add a FIXME (I used XXX. I'm not when
to use what) with a link to the message where Andres describes why he
thinks it is a bug. If we plan on fixing it, it is good to have a record
of that. And it makes it easier to put a clear and accurate comment.
Done in 0009.

> OK, thanks. If think 0001-0008 are ready to go, with some minor tweaks
> per above (tuple vs. tuples etc.), and the question about the recheck
> flag. If you can do these tweaks, I'll get that committed today and we
> can try to get a couple more patches in tomorrow.

Sounds good.

- Melanie
>From 3e80ba8914dd5876ab921ca39540be3b639236f7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Mon, 12 Feb 2024 18:50:29 -0500
Subject: [PATCH v18 01/10] BitmapHeapScan: begin scan after bitmap creation

It makes more sense for BitmapHeapScan to scan the index, build the
bitmap, and only then begin the scan on the underlying table.

This is mostly a cosmetic change for now, but later commits will need
to pass parameters to table_beginscan_bm() that are unavailable in
ExecInitBitmapHeapScan().

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
---
 src/backend/executor/nodeBitmapHeapscan.c | 27 +++++++++++++++++------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index cee7f45aabe..c8c466e3c5c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -178,6 +178,21 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			}
 #endif							/* USE_PREFETCH */
 		}
+
+		/*
+		 * If this is the first scan of the underlying table, create the table
+		 * scan descriptor and begin the scan.
+		 */
+		if (!scan)
+		{
+			scan = table_beginscan_bm(node->ss.ss_currentRelation,
+									  node->ss.ps.state->es_snapshot,
+									  0,
+									  NULL);
+
+			node->ss.ss_currentScanDesc = scan;
+		}
+
 		node->initialized = true;
 	}
 
@@ -601,7 +616,8 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
 	PlanState  *outerPlan = outerPlanState(node);
 
 	/* rescan to release any page pin */
-	table_rescan(node->ss.ss_currentScanDesc, NULL);
+	if (node->ss.ss_currentScanDesc)
+		table_rescan(node->ss.ss_currentScanDesc, NULL);
 
 	/* release bitmaps and buffers if any */
 	if (node->tbmiterator)
@@ -678,7 +694,9 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
 	/*
 	 * close heap scan
 	 */
-	table_endscan(scanDesc);
+	if (scanDesc)
+		table_endscan(scanDesc);
+
 }
 
 /* ----------------------------------------------------------------
@@ -783,11 +801,6 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 
 	scanstate->ss.ss_currentRelation = currentRelation;
 
-	scanstate->ss.ss_currentScanDesc = table_beginscan_bm(currentRelation,
-														  estate->es_snapshot,
-														  0,
-														  NULL);
-
 	/*
 	 * all done.
 	 */
-- 
2.40.1

>From 60e75bbf91bf06832402daa5d3c30bc99401c2f2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Tue, 13 Feb 2024 14:38:41 -0500
Subject: [PATCH v18 02/10] BitmapHeapScan: postpone setting can_skip_fetch

Set BitmapHeapScanState->can_skip_fetch in BitmapHeapNext() instead of
in ExecInitBitmapHeapScan(). This is a preliminary step to pushing the
skip fetch optimization into heap AM code.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
---
 src/backend/executor/nodeBitmapHeapscan.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index c8c466e3c5c..2148a21531a 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,6 +105,16 @@ BitmapHeapNext(BitmapHeapScanState *node)
 	 */
 	if (!node->initialized)
 	{
+		/*
+		 * We can potentially skip fetching heap pages if we do not need any
+		 * columns of the table, either for checking non-indexable quals or
+		 * for returning data.  This test is a bit simplistic, as it checks
+		 * the stronger condition that there's no qual or return tlist at all.
+		 * But in most cases it's probably not worth working harder than that.
+		 */
+		node->can_skip_fetch = (node->ss.ps.plan->qual == NIL &&
+								node->ss.ps.plan->targetlist == NIL);
+
 		if (!pstate)
 		{
 			tbm = (TIDBitmap *) MultiExecProcNode(outerPlanState(node));
@@ -743,16 +753,7 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 	scanstate->shared_tbmiterator = NULL;
 	scanstate->shared_prefetch_iterator = NULL;
 	scanstate->pstate = NULL;
-
-	/*
-	 * We can potentially skip fetching heap pages if we do not need any
-	 * columns of the table, either for checking non-indexable quals or for
-	 * returning data.  This test is a bit simplistic, as it checks the
-	 * stronger condition that there's no qual or return tlist at all.  But in
-	 * most cases it's probably not worth working harder than that.
-	 */
-	scanstate->can_skip_fetch = (node->scan.plan.qual == NIL &&
-								 node->scan.plan.targetlist == NIL);
+	scanstate->can_skip_fetch = false;
 
 	/*
 	 * Miscellaneous initialization
-- 
2.40.1

>From 6af2d6cbbbe53e94f09fdb78fdfaf1f410359114 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Thu, 4 Apr 2024 15:34:25 +0200
Subject: [PATCH v18 03/10] Push BitmapHeapScan skip fetch optimization into
 table AM

7c70996ebf0949b142 introduced an optimization to allow bitmap table
scans to skip fetching a block from the heap if none of the underlying
data was needed and the block is marked all visible in the visibility
map. With the addition of table AMs, a FIXME was added to this code
indicating that it should be pushed into table AM-specific code, as not
all table AMs may use a visibility map in the same way.

Resolve this FIXME for the current block. The layering violation is
still present in BitmapHeapScans's prefetching code, which uses the
visibility map to decide whether or not to prefetch a block. However,
this can be addressed independently.

Author: Melanie Plageman
Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
---
 src/backend/access/heap/heapam.c          |  15 +++
 src/backend/access/heap/heapam_handler.c  |  29 +++++
 src/backend/executor/nodeBitmapHeapscan.c | 124 +++++++---------------
 src/include/access/heapam.h               |  10 ++
 src/include/access/tableam.h              |  12 ++-
 src/include/nodes/execnodes.h             |   8 +-
 6 files changed, 105 insertions(+), 93 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dada2ecd1e3..01bb2f4cc16 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -967,6 +967,8 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	scan->rs_base.rs_flags = flags;
 	scan->rs_base.rs_parallel = parallel_scan;
 	scan->rs_strategy = NULL;	/* set in initscan */
+	scan->rs_vmbuffer = InvalidBuffer;
+	scan->rs_empty_tuples_pending = 0;
 
 	/*
 	 * Disable page-at-a-time mode if it's not a MVCC-safe snapshot.
@@ -1055,6 +1057,14 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
+	Assert(scan->rs_empty_tuples_pending == 0);
+
 	/*
 	 * reinitialize scan descriptor
 	 */
@@ -1074,6 +1084,11 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
+	Assert(scan->rs_empty_tuples_pending == 0);
+
 	/*
 	 * decrement relation reference count and free scan descriptor storage
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3e7a6b5548b..58de2c82a70 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -27,6 +27,7 @@
 #include "access/syncscan.h"
 #include "access/tableam.h"
 #include "access/tsmapi.h"
+#include "access/visibilitymap.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
 #include "catalog/index.h"
@@ -2198,6 +2199,24 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	hscan->rs_cindex = 0;
 	hscan->rs_ntuples = 0;
 
+	/*
+	 * We can skip fetching the heap page if we don't need any fields from the
+	 * heap, the bitmap entries don't need rechecking, and all tuples on the
+	 * page are visible to our transaction.
+	 */
+	if (!(scan->rs_flags & SO_NEED_TUPLES) &&
+		!tbmres->recheck &&
+		VM_ALL_VISIBLE(scan->rs_rd, tbmres->blockno, &hscan->rs_vmbuffer))
+	{
+		/* can't be lossy in the skip_fetch case */
+		Assert(tbmres->ntuples >= 0);
+		Assert(hscan->rs_empty_tuples_pending >= 0);
+
+		hscan->rs_empty_tuples_pending += tbmres->ntuples;
+
+		return true;
+	}
+
 	/*
 	 * Ignore any claimed entries past what we think is the end of the
 	 * relation. It may have been extended after the start of our scan (we
@@ -2310,6 +2329,16 @@ heapam_scan_bitmap_next_tuple(TableScanDesc scan,
 	Page		page;
 	ItemId		lp;
 
+	if (hscan->rs_empty_tuples_pending > 0)
+	{
+		/*
+		 * If we don't have to fetch the tuple, just return nulls.
+		 */
+		ExecStoreAllNullTuple(slot);
+		hscan->rs_empty_tuples_pending--;
+		return true;
+	}
+
 	/*
 	 * Out of range?  If so, nothing more to look at on this page
 	 */
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 2148a21531a..5f768701a5e 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,16 +105,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 	 */
 	if (!node->initialized)
 	{
-		/*
-		 * We can potentially skip fetching heap pages if we do not need any
-		 * columns of the table, either for checking non-indexable quals or
-		 * for returning data.  This test is a bit simplistic, as it checks
-		 * the stronger condition that there's no qual or return tlist at all.
-		 * But in most cases it's probably not worth working harder than that.
-		 */
-		node->can_skip_fetch = (node->ss.ps.plan->qual == NIL &&
-								node->ss.ps.plan->targetlist == NIL);
-
 		if (!pstate)
 		{
 			tbm = (TIDBitmap *) MultiExecProcNode(outerPlanState(node));
@@ -195,10 +185,24 @@ BitmapHeapNext(BitmapHeapScanState *node)
 		 */
 		if (!scan)
 		{
+			bool		need_tuples = false;
+
+			/*
+			 * We can potentially skip fetching heap pages if we do not need
+			 * any columns of the table, either for checking non-indexable
+			 * quals or for returning data.  This test is a bit simplistic, as
+			 * it checks the stronger condition that there's no qual or return
+			 * tlist at all. But in most cases it's probably not worth working
+			 * harder than that.
+			 */
+			need_tuples = (node->ss.ps.plan->qual != NIL ||
+						   node->ss.ps.plan->targetlist != NIL);
+
 			scan = table_beginscan_bm(node->ss.ss_currentRelation,
 									  node->ss.ps.state->es_snapshot,
 									  0,
-									  NULL);
+									  NULL,
+									  need_tuples);
 
 			node->ss.ss_currentScanDesc = scan;
 		}
@@ -208,7 +212,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 	for (;;)
 	{
-		bool		skip_fetch;
+		bool		valid_block;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -229,37 +233,14 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 			BitmapAdjustPrefetchIterator(node, tbmres);
 
+			valid_block = table_scan_bitmap_next_block(scan, tbmres);
+
 			if (tbmres->ntuples >= 0)
 				node->exact_pages++;
 			else
 				node->lossy_pages++;
 
-			/*
-			 * We can skip fetching the heap page if we don't need any fields
-			 * from the heap, and the bitmap entries don't need rechecking,
-			 * and all tuples on the page are visible to our transaction.
-			 *
-			 * XXX: It's a layering violation that we do these checks above
-			 * tableam, they should probably moved below it at some point.
-			 */
-			skip_fetch = (node->can_skip_fetch &&
-						  !tbmres->recheck &&
-						  VM_ALL_VISIBLE(node->ss.ss_currentRelation,
-										 tbmres->blockno,
-										 &node->vmbuffer));
-
-			if (skip_fetch)
-			{
-				/* can't be lossy in the skip_fetch case */
-				Assert(tbmres->ntuples >= 0);
-
-				/*
-				 * The number of tuples on this page is put into
-				 * node->return_empty_tuples.
-				 */
-				node->return_empty_tuples = tbmres->ntuples;
-			}
-			else if (!table_scan_bitmap_next_block(scan, tbmres))
+			if (!valid_block)
 			{
 				/* AM doesn't think this block is valid, skip */
 				continue;
@@ -302,52 +283,33 @@ BitmapHeapNext(BitmapHeapScanState *node)
 		 * should happen only when we have determined there is still something
 		 * to do on the current page, else we may uselessly prefetch the same
 		 * page we are just about to request for real.
-		 *
-		 * XXX: It's a layering violation that we do these checks above
-		 * tableam, they should probably moved below it at some point.
 		 */
 		BitmapPrefetch(node, scan);
 
-		if (node->return_empty_tuples > 0)
+		/*
+		 * Attempt to fetch tuple from AM.
+		 */
+		if (!table_scan_bitmap_next_tuple(scan, tbmres, slot))
 		{
-			/*
-			 * If we don't have to fetch the tuple, just return nulls.
-			 */
-			ExecStoreAllNullTuple(slot);
-
-			if (--node->return_empty_tuples == 0)
-			{
-				/* no more tuples to return in the next round */
-				node->tbmres = tbmres = NULL;
-			}
+			/* nothing more to look at on this page */
+			node->tbmres = tbmres = NULL;
+			continue;
 		}
-		else
+
+		/*
+		 * If we are using lossy info, we have to recheck the qual conditions
+		 * at every tuple.
+		 */
+		if (tbmres->recheck)
 		{
-			/*
-			 * Attempt to fetch tuple from AM.
-			 */
-			if (!table_scan_bitmap_next_tuple(scan, tbmres, slot))
+			econtext->ecxt_scantuple = slot;
+			if (!ExecQualAndReset(node->bitmapqualorig, econtext))
 			{
-				/* nothing more to look at on this page */
-				node->tbmres = tbmres = NULL;
+				/* Fails recheck, so drop it and loop back for another */
+				InstrCountFiltered2(node, 1);
+				ExecClearTuple(slot);
 				continue;
 			}
-
-			/*
-			 * If we are using lossy info, we have to recheck the qual
-			 * conditions at every tuple.
-			 */
-			if (tbmres->recheck)
-			{
-				econtext->ecxt_scantuple = slot;
-				if (!ExecQualAndReset(node->bitmapqualorig, econtext))
-				{
-					/* Fails recheck, so drop it and loop back for another */
-					InstrCountFiltered2(node, 1);
-					ExecClearTuple(slot);
-					continue;
-				}
-			}
 		}
 
 		/* OK to return this tuple */
@@ -519,7 +481,7 @@ BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan)
 				 * it did for the current heap page; which is not a certainty
 				 * but is true in many cases.
 				 */
-				skip_fetch = (node->can_skip_fetch &&
+				skip_fetch = (!(scan->rs_flags & SO_NEED_TUPLES) &&
 							  (node->tbmres ? !node->tbmres->recheck : false) &&
 							  VM_ALL_VISIBLE(node->ss.ss_currentRelation,
 											 tbmpre->blockno,
@@ -570,7 +532,7 @@ BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan)
 				}
 
 				/* As above, skip prefetch if we expect not to need page */
-				skip_fetch = (node->can_skip_fetch &&
+				skip_fetch = (!(scan->rs_flags & SO_NEED_TUPLES) &&
 							  (node->tbmres ? !node->tbmres->recheck : false) &&
 							  VM_ALL_VISIBLE(node->ss.ss_currentRelation,
 											 tbmpre->blockno,
@@ -640,8 +602,6 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
 		tbm_end_shared_iterate(node->shared_prefetch_iterator);
 	if (node->tbm)
 		tbm_free(node->tbm);
-	if (node->vmbuffer != InvalidBuffer)
-		ReleaseBuffer(node->vmbuffer);
 	if (node->pvmbuffer != InvalidBuffer)
 		ReleaseBuffer(node->pvmbuffer);
 	node->tbm = NULL;
@@ -651,7 +611,6 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
 	node->initialized = false;
 	node->shared_tbmiterator = NULL;
 	node->shared_prefetch_iterator = NULL;
-	node->vmbuffer = InvalidBuffer;
 	node->pvmbuffer = InvalidBuffer;
 
 	ExecScanReScan(&node->ss);
@@ -696,8 +655,6 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
 		tbm_end_shared_iterate(node->shared_tbmiterator);
 	if (node->shared_prefetch_iterator)
 		tbm_end_shared_iterate(node->shared_prefetch_iterator);
-	if (node->vmbuffer != InvalidBuffer)
-		ReleaseBuffer(node->vmbuffer);
 	if (node->pvmbuffer != InvalidBuffer)
 		ReleaseBuffer(node->pvmbuffer);
 
@@ -741,8 +698,6 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 	scanstate->tbm = NULL;
 	scanstate->tbmiterator = NULL;
 	scanstate->tbmres = NULL;
-	scanstate->return_empty_tuples = 0;
-	scanstate->vmbuffer = InvalidBuffer;
 	scanstate->pvmbuffer = InvalidBuffer;
 	scanstate->exact_pages = 0;
 	scanstate->lossy_pages = 0;
@@ -753,7 +708,6 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 	scanstate->shared_tbmiterator = NULL;
 	scanstate->shared_prefetch_iterator = NULL;
 	scanstate->pstate = NULL;
-	scanstate->can_skip_fetch = false;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2765efc4e5e..750ea30852e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -76,6 +76,16 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * These fields are only used for bitmap scans for the "skip fetch"
+	 * optimization. Bitmap scans needing no fields from the heap may skip
+	 * fetching an all visible block, instead using the number of tuples per
+	 * block reported by the bitmap to determine how many NULL-filled tuples
+	 * to return.
+	 */
+	Buffer		rs_vmbuffer;
+	int			rs_empty_tuples_pending;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	int			rs_cindex;		/* current tuple's index in vistuples */
 	int			rs_ntuples;		/* number of visible tuples on page */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e7eeb754098..be198fa3158 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,13 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/*
+	 * At the discretion of the table AM, bitmap table scans may be able to
+	 * skip fetching a block from the table if none of the table data is
+	 * needed. If table data may be needed, set SO_NEED_TUPLES.
+	 */
+	SO_NEED_TUPLES = 1 << 10,
 }			ScanOptions;
 
 /*
@@ -937,10 +944,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, struct ScanKeyData *key)
+				   int nkeys, struct ScanKeyData *key, bool need_tuple)
 {
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
+	if (need_tuple)
+		flags |= SO_NEED_TUPLES;
+
 	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
 }
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e57ebd72887..fa2f70b7a48 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1794,10 +1794,7 @@ typedef struct ParallelBitmapHeapState
  *		tbm				   bitmap obtained from child index scan(s)
  *		tbmiterator		   iterator for scanning current pages
  *		tbmres			   current-page data
- *		can_skip_fetch	   can we potentially skip tuple fetches in this scan?
- *		return_empty_tuples number of empty tuples to return
- *		vmbuffer		   buffer for visibility-map lookups
- *		pvmbuffer		   ditto, for prefetched pages
+ *		pvmbuffer		   buffer for visibility-map lookups of prefetched pages
  *		exact_pages		   total number of exact pages retrieved
  *		lossy_pages		   total number of lossy pages retrieved
  *		prefetch_iterator  iterator for prefetching ahead of current page
@@ -1817,9 +1814,6 @@ typedef struct BitmapHeapScanState
 	TIDBitmap  *tbm;
 	TBMIterator *tbmiterator;
 	TBMIterateResult *tbmres;
-	bool		can_skip_fetch;
-	int			return_empty_tuples;
-	Buffer		vmbuffer;
 	Buffer		pvmbuffer;
 	long		exact_pages;
 	long		lossy_pages;
-- 
2.40.1

>From 08af70f3becef4bd1de9aac3943b0b2a8b15e788 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Mon, 12 Feb 2024 19:03:24 -0500
Subject: [PATCH v18 04/10] Use prefetch block recheck value to determine
 skip_fetch

As of 7c70996ebf0949b142a9, BitmapPrefetch() used the recheck flag for
the current block to determine whether or not it should skip prefetching
the proposed prefetch block. As explained in the comment, this assumed
the index AM will report the same recheck value for the future page as
it did for the current page - but there's no guarantee.

This only affects prefetching - we may prefetch blocks unecessarily and
not prefetch blocks that will be needed. But we don't need to rely on
that assumption - we know the recheck flag for the block we're
considering prefetching.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Tom Lane
Discussion: https://postgr.es/m/1939305.1712415547%40sss.pgh.pa.us
---
 src/backend/executor/nodeBitmapHeapscan.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 5f768701a5e..6f843908032 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -475,14 +475,9 @@ BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan)
 				 * skip this prefetch call, but continue to run the prefetch
 				 * logic normally.  (Would it be better not to increment
 				 * prefetch_pages?)
-				 *
-				 * This depends on the assumption that the index AM will
-				 * report the same recheck flag for this future heap page as
-				 * it did for the current heap page; which is not a certainty
-				 * but is true in many cases.
 				 */
 				skip_fetch = (!(scan->rs_flags & SO_NEED_TUPLES) &&
-							  (node->tbmres ? !node->tbmres->recheck : false) &&
+							  !tbmpre->recheck &&
 							  VM_ALL_VISIBLE(node->ss.ss_currentRelation,
 											 tbmpre->blockno,
 											 &node->pvmbuffer));
@@ -533,7 +528,7 @@ BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan)
 
 				/* As above, skip prefetch if we expect not to need page */
 				skip_fetch = (!(scan->rs_flags & SO_NEED_TUPLES) &&
-							  (node->tbmres ? !node->tbmres->recheck : false) &&
+							  !tbmpre->recheck &&
 							  VM_ALL_VISIBLE(node->ss.ss_currentRelation,
 											 tbmpre->blockno,
 											 &node->pvmbuffer));
-- 
2.40.1

>From 391747a1235b1a29bd8beed127418bbd584ba10d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Mon, 12 Feb 2024 19:04:48 -0500
Subject: [PATCH v18 05/10] Change BitmapAdjustPrefetchIterator to accept
 BlockNumber

BitmapAdjustPrefetchIterator() only used the blockno member of the
passed in TBMIterateResult to ensure that the prefetch iterator and
regular iterator stay in sync. Pass it the BlockNumber only, so that we
can move away from using the TBMIterateResult outside of table AM
specific code.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
---
 src/backend/executor/nodeBitmapHeapscan.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 6f843908032..6b48a6d8350 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -52,7 +52,7 @@
 static TupleTableSlot *BitmapHeapNext(BitmapHeapScanState *node);
 static inline void BitmapDoneInitializingSharedState(ParallelBitmapHeapState *pstate);
 static inline void BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
-												TBMIterateResult *tbmres);
+												BlockNumber blockno);
 static inline void BitmapAdjustPrefetchTarget(BitmapHeapScanState *node);
 static inline void BitmapPrefetch(BitmapHeapScanState *node,
 								  TableScanDesc scan);
@@ -231,7 +231,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 				break;
 			}
 
-			BitmapAdjustPrefetchIterator(node, tbmres);
+			BitmapAdjustPrefetchIterator(node, tbmres->blockno);
 
 			valid_block = table_scan_bitmap_next_block(scan, tbmres);
 
@@ -342,7 +342,7 @@ BitmapDoneInitializingSharedState(ParallelBitmapHeapState *pstate)
  */
 static inline void
 BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
-							 TBMIterateResult *tbmres)
+							 BlockNumber blockno)
 {
 #ifdef USE_PREFETCH
 	ParallelBitmapHeapState *pstate = node->pstate;
@@ -361,7 +361,7 @@ BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
 			/* Do not let the prefetch iterator get behind the main one */
 			TBMIterateResult *tbmpre = tbm_iterate(prefetch_iterator);
 
-			if (tbmpre == NULL || tbmpre->blockno != tbmres->blockno)
+			if (tbmpre == NULL || tbmpre->blockno != blockno)
 				elog(ERROR, "prefetch and main iterators are out of sync");
 		}
 		return;
-- 
2.40.1

>From 8bd31f41e00ec958cfc8c309509cdd2e93020f0b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Tue, 13 Feb 2024 10:17:47 -0500
Subject: [PATCH v18 06/10] BitmapHeapScan: Reduce scope of tbmiterator local
 variables

A future patch will change where tbmiterators are initialized and saved.
To keep that diff easier to understand, move them into a tighter scope.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
---
 src/backend/executor/nodeBitmapHeapscan.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 6b48a6d8350..fcbc8664508 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -71,8 +71,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 	ExprContext *econtext;
 	TableScanDesc scan;
 	TIDBitmap  *tbm;
-	TBMIterator *tbmiterator = NULL;
-	TBMSharedIterator *shared_tbmiterator = NULL;
 	TBMIterateResult *tbmres;
 	TupleTableSlot *slot;
 	ParallelBitmapHeapState *pstate = node->pstate;
@@ -85,10 +83,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 	slot = node->ss.ss_ScanTupleSlot;
 	scan = node->ss.ss_currentScanDesc;
 	tbm = node->tbm;
-	if (pstate == NULL)
-		tbmiterator = node->tbmiterator;
-	else
-		shared_tbmiterator = node->shared_tbmiterator;
 	tbmres = node->tbmres;
 
 	/*
@@ -105,6 +99,9 @@ BitmapHeapNext(BitmapHeapScanState *node)
 	 */
 	if (!node->initialized)
 	{
+		TBMIterator *tbmiterator = NULL;
+		TBMSharedIterator *shared_tbmiterator = NULL;
+
 		if (!pstate)
 		{
 			tbm = (TIDBitmap *) MultiExecProcNode(outerPlanState(node));
@@ -113,7 +110,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 				elog(ERROR, "unrecognized result from subplan");
 
 			node->tbm = tbm;
-			node->tbmiterator = tbmiterator = tbm_begin_iterate(tbm);
+			tbmiterator = tbm_begin_iterate(tbm);
 			node->tbmres = tbmres = NULL;
 
 #ifdef USE_PREFETCH
@@ -166,8 +163,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			}
 
 			/* Allocate a private iterator and attach the shared state to it */
-			node->shared_tbmiterator = shared_tbmiterator =
-				tbm_attach_shared_iterate(dsa, pstate->tbmiterator);
+			shared_tbmiterator = tbm_attach_shared_iterate(dsa, pstate->tbmiterator);
 			node->tbmres = tbmres = NULL;
 
 #ifdef USE_PREFETCH
@@ -207,6 +203,8 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			node->ss.ss_currentScanDesc = scan;
 		}
 
+		node->tbmiterator = tbmiterator;
+		node->shared_tbmiterator = shared_tbmiterator;
 		node->initialized = true;
 	}
 
@@ -222,9 +220,9 @@ BitmapHeapNext(BitmapHeapScanState *node)
 		if (tbmres == NULL)
 		{
 			if (!pstate)
-				node->tbmres = tbmres = tbm_iterate(tbmiterator);
+				node->tbmres = tbmres = tbm_iterate(node->tbmiterator);
 			else
-				node->tbmres = tbmres = tbm_shared_iterate(shared_tbmiterator);
+				node->tbmres = tbmres = tbm_shared_iterate(node->shared_tbmiterator);
 			if (tbmres == NULL)
 			{
 				/* no more entries in the bitmap */
-- 
2.40.1

>From e35e6e050e79028f978f3e59c5728c6d85c0becb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Sat, 6 Apr 2024 08:51:40 -0400
Subject: [PATCH v18 07/10] table_scan_bitmap_next_block counts lossy and exact
 pages

Future commits will remove the TBMIterateResult from BitmapHeapNext() --
pushing it into the table AM-specific code. So the table AM must keep
track of whether or not individual TBMIterateResult's blocks were
represented lossily in the bitmap for the purposes of EXPLAIN counters.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
---
 src/backend/access/heap/heapam_handler.c  |  8 +++++++-
 src/backend/executor/nodeBitmapHeapscan.c | 12 ++----------
 src/include/access/tableam.h              | 22 ++++++++++++++++------
 3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 58de2c82a70..eb56dfe4e93 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2188,7 +2188,8 @@ heapam_estimate_rel_size(Relation rel, int32 *attr_widths,
 
 static bool
 heapam_scan_bitmap_next_block(TableScanDesc scan,
-							  TBMIterateResult *tbmres)
+							  TBMIterateResult *tbmres,
+							  long *lossy_pages, long *exact_pages)
 {
 	HeapScanDesc hscan = (HeapScanDesc) scan;
 	BlockNumber block = tbmres->blockno;
@@ -2316,6 +2317,11 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	Assert(ntup <= MaxHeapTuplesPerPage);
 	hscan->rs_ntuples = ntup;
 
+	if (tbmres->ntuples < 0)
+		(*lossy_pages)++;
+	else
+		(*exact_pages)++;
+
 	return ntup > 0;
 }
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index fcbc8664508..da201a14385 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -210,8 +210,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 	for (;;)
 	{
-		bool		valid_block;
-
 		CHECK_FOR_INTERRUPTS();
 
 		/*
@@ -231,14 +229,8 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 			BitmapAdjustPrefetchIterator(node, tbmres->blockno);
 
-			valid_block = table_scan_bitmap_next_block(scan, tbmres);
-
-			if (tbmres->ntuples >= 0)
-				node->exact_pages++;
-			else
-				node->lossy_pages++;
-
-			if (!valid_block)
+			if (!table_scan_bitmap_next_block(scan, tbmres,
+											  &node->lossy_pages, &node->exact_pages))
 			{
 				/* AM doesn't think this block is valid, skip */
 				continue;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index be198fa3158..8c75a4c038c 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -789,6 +789,9 @@ typedef struct TableAmRoutine
 	 * on the page have to be returned, otherwise the tuples at offsets in
 	 * `tbmres->offsets` need to be returned.
 	 *
+	 * lossy_pages is incremented if the bitmap is lossy for the selected
+	 * block; otherwise, exact_pages is incremented.
+	 *
 	 * XXX: Currently this may only be implemented if the AM uses md.c as its
 	 * storage manager, and uses ItemPointer->ip_blkid in a manner that maps
 	 * blockids directly to the underlying storage. nodeBitmapHeapscan.c
@@ -804,7 +807,8 @@ typedef struct TableAmRoutine
 	 * scan_bitmap_next_tuple need to exist, or neither.
 	 */
 	bool		(*scan_bitmap_next_block) (TableScanDesc scan,
-										   struct TBMIterateResult *tbmres);
+										   struct TBMIterateResult *tbmres,
+										   long *lossy_pages, long *exact_pages);
 
 	/*
 	 * Fetch the next tuple of a bitmap table scan into `slot` and return true
@@ -1971,17 +1975,21 @@ table_relation_estimate_size(Relation rel, int32 *attr_widths,
  */
 
 /*
- * Prepare to fetch / check / return tuples from `tbmres->blockno` as part of
- * a bitmap table scan. `scan` needs to have been started via
+ * Prepare to fetch / check / return tuples from `tbmres->blockno` as part of a
+ * bitmap table scan. `scan` needs to have been started via
  * table_beginscan_bm(). Returns false if there are no tuples to be found on
- * the page, true otherwise.
+ * the page, true otherwise. lossy_pages is incremented is the block's
+ * representation in the bitmap is lossy; otherwise, exact_pages is
+ * incremented.
  *
  * Note, this is an optionally implemented function, therefore should only be
  * used after verifying the presence (at plan time or such).
  */
 static inline bool
 table_scan_bitmap_next_block(TableScanDesc scan,
-							 struct TBMIterateResult *tbmres)
+							 struct TBMIterateResult *tbmres,
+							 long *lossy_pages,
+							 long *exact_pages)
 {
 	/*
 	 * We don't expect direct calls to table_scan_bitmap_next_block with valid
@@ -1992,7 +2000,9 @@ table_scan_bitmap_next_block(TableScanDesc scan,
 		elog(ERROR, "unexpected table_scan_bitmap_next_block call during logical decoding");
 
 	return scan->rs_rd->rd_tableam->scan_bitmap_next_block(scan,
-														   tbmres);
+														   tbmres,
+														   lossy_pages,
+														   exact_pages);
 }
 
 /*
-- 
2.40.1

>From 426c2e04782242589d67f6a87fb9327c0fc1eaa9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Mon, 12 Feb 2024 18:13:41 -0500
Subject: [PATCH v18 08/10] Remove table_scan_bitmap_next_tuple parameter
 tbmres

Future commits will push all of the logic for choosing the next block
into the table AM specific code. The table AM will be responsible for
iterating and reading in the right blocks.

Thus, it no longer makes sense to use the TBMIterateResult (which
contains a block number) as a means of communication between
table_scan_bitmap_next_tuple() and table_scan_bitmap_next_block().

Note that this parameter was unused by heap AM's implementation of
table_scan_bitmap_next_tuple().

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas, Mark Dilger
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
---
 src/backend/access/heap/heapam_handler.c  |  1 -
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/include/access/tableam.h              | 12 +-----------
 3 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index eb56dfe4e93..f99e6efd757 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2327,7 +2327,6 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 
 static bool
 heapam_scan_bitmap_next_tuple(TableScanDesc scan,
-							  TBMIterateResult *tbmres,
 							  TupleTableSlot *slot)
 {
 	HeapScanDesc hscan = (HeapScanDesc) scan;
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index da201a14385..306bca801d7 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -279,7 +279,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 		/*
 		 * Attempt to fetch tuple from AM.
 		 */
-		if (!table_scan_bitmap_next_tuple(scan, tbmres, slot))
+		if (!table_scan_bitmap_next_tuple(scan, slot))
 		{
 			/* nothing more to look at on this page */
 			node->tbmres = tbmres = NULL;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8c75a4c038c..697b5488b10 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -780,10 +780,7 @@ typedef struct TableAmRoutine
 	 *
 	 * This will typically read and pin the target block, and do the necessary
 	 * work to allow scan_bitmap_next_tuple() to return tuples (e.g. it might
-	 * make sense to perform tuple visibility checks at this time). For some
-	 * AMs it will make more sense to do all the work referencing `tbmres`
-	 * contents here, for others it might be better to defer more work to
-	 * scan_bitmap_next_tuple.
+	 * make sense to perform tuple visibility checks at this time).
 	 *
 	 * If `tbmres->blockno` is -1, this is a lossy scan and all visible tuples
 	 * on the page have to be returned, otherwise the tuples at offsets in
@@ -814,15 +811,10 @@ typedef struct TableAmRoutine
 	 * Fetch the next tuple of a bitmap table scan into `slot` and return true
 	 * if a visible tuple was found, false otherwise.
 	 *
-	 * For some AMs it will make more sense to do all the work referencing
-	 * `tbmres` contents in scan_bitmap_next_block, for others it might be
-	 * better to defer more work to this callback.
-	 *
 	 * Optional callback, but either both scan_bitmap_next_block and
 	 * scan_bitmap_next_tuple need to exist, or neither.
 	 */
 	bool		(*scan_bitmap_next_tuple) (TableScanDesc scan,
-										   struct TBMIterateResult *tbmres,
 										   TupleTableSlot *slot);
 
 	/*
@@ -2015,7 +2007,6 @@ table_scan_bitmap_next_block(TableScanDesc scan,
  */
 static inline bool
 table_scan_bitmap_next_tuple(TableScanDesc scan,
-							 struct TBMIterateResult *tbmres,
 							 TupleTableSlot *slot)
 {
 	/*
@@ -2027,7 +2018,6 @@ table_scan_bitmap_next_tuple(TableScanDesc scan,
 		elog(ERROR, "unexpected table_scan_bitmap_next_tuple call during logical decoding");
 
 	return scan->rs_rd->rd_tableam->scan_bitmap_next_tuple(scan,
-														   tbmres,
 														   slot);
 }
 
-- 
2.40.1

>From d19f6290f0cc783979c0c2556f0ac908b9c0adb2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Sat, 6 Apr 2024 09:04:01 -0400
Subject: [PATCH v18 09/10] Make table_scan_bitmap_next_block() async-friendly

table_scan_bitmap_next_block() previously returned false if we did not
wish to call table_scan_bitmap_next_tuple() on tuples on the page. This
could happen when there were no visible tuples on the page or, due to
concurrent activity on the table, the block returned by the iterator is
past the end of the table recorded when the scan started.

That forced the caller to be responsible for determining if additional
blocks should be fetched and for invoking table_scan_bitmap_next_block()
for these blocks.

It makes more sense for table_scan_bitmap_next_block() to be responsible
for finding a block that is not past the end of the table (as of the
beginning of the scan) and for table_scan_bitmap_next_tuple() to return
false if there are no visible tuples on the page.

This also allows us to move responsibility for the iterator to table AM
specific code. This means handling invalid blocks is entirely up to the
table AM.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
---
 src/backend/access/heap/heapam_handler.c  |  58 +++++--
 src/backend/executor/nodeBitmapHeapscan.c | 186 ++++++++++------------
 src/include/access/relscan.h              |   7 +
 src/include/access/tableam.h              |  68 +++++---
 src/include/nodes/execnodes.h             |  12 +-
 5 files changed, 194 insertions(+), 137 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index f99e6efd757..903cb80157e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2188,18 +2188,52 @@ heapam_estimate_rel_size(Relation rel, int32 *attr_widths,
 
 static bool
 heapam_scan_bitmap_next_block(TableScanDesc scan,
-							  TBMIterateResult *tbmres,
+							  BlockNumber *blockno, bool *recheck,
 							  long *lossy_pages, long *exact_pages)
 {
 	HeapScanDesc hscan = (HeapScanDesc) scan;
-	BlockNumber block = tbmres->blockno;
+	BlockNumber block;
 	Buffer		buffer;
 	Snapshot	snapshot;
 	int			ntup;
+	TBMIterateResult *tbmres;
 
 	hscan->rs_cindex = 0;
 	hscan->rs_ntuples = 0;
 
+	*blockno = InvalidBlockNumber;
+	*recheck = true;
+
+	do
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		if (scan->shared_tbmiterator)
+			tbmres = tbm_shared_iterate(scan->shared_tbmiterator);
+		else
+			tbmres = tbm_iterate(scan->tbmiterator);
+
+		if (tbmres == NULL)
+		{
+			/* no more entries in the bitmap */
+			Assert(hscan->rs_empty_tuples_pending == 0);
+			return false;
+		}
+
+		/*
+		 * Ignore any claimed entries past what we think is the end of the
+		 * relation. It may have been extended after the start of our scan (we
+		 * only hold an AccessShareLock, and it could be inserts from this
+		 * backend).  We don't take this optimization in SERIALIZABLE
+		 * isolation though, as we need to examine all invisible tuples
+		 * reachable by the index.
+		 */
+	} while (!IsolationIsSerializable() && tbmres->blockno >= hscan->rs_nblocks);
+
+	/* Got a valid block */
+	*blockno = tbmres->blockno;
+	*recheck = tbmres->recheck;
+
 	/*
 	 * We can skip fetching the heap page if we don't need any fields from the
 	 * heap, the bitmap entries don't need rechecking, and all tuples on the
@@ -2218,16 +2252,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 		return true;
 	}
 
-	/*
-	 * Ignore any claimed entries past what we think is the end of the
-	 * relation. It may have been extended after the start of our scan (we
-	 * only hold an AccessShareLock, and it could be inserts from this
-	 * backend).  We don't take this optimization in SERIALIZABLE isolation
-	 * though, as we need to examine all invisible tuples reachable by the
-	 * index.
-	 */
-	if (!IsolationIsSerializable() && block >= hscan->rs_nblocks)
-		return false;
+	block = tbmres->blockno;
 
 	/*
 	 * Acquire pin on the target heap page, trading in any pin we held before.
@@ -2322,7 +2347,14 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	else
 		(*exact_pages)++;
 
-	return ntup > 0;
+	/*
+	 * Return true to indicate that a valid block was found and the bitmap is
+	 * not exhausted. If there are no visible tuples on this page,
+	 * hscan->rs_ntuples will be 0 and heapam_scan_bitmap_next_tuple() will
+	 * return false returning control to this function to advance to the next
+	 * block in the bitmap.
+	 */
+	return true;
 }
 
 static bool
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 306bca801d7..01703bd4865 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -51,8 +51,7 @@
 
 static TupleTableSlot *BitmapHeapNext(BitmapHeapScanState *node);
 static inline void BitmapDoneInitializingSharedState(ParallelBitmapHeapState *pstate);
-static inline void BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
-												BlockNumber blockno);
+static inline void BitmapAdjustPrefetchIterator(BitmapHeapScanState *node);
 static inline void BitmapAdjustPrefetchTarget(BitmapHeapScanState *node);
 static inline void BitmapPrefetch(BitmapHeapScanState *node,
 								  TableScanDesc scan);
@@ -71,7 +70,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 	ExprContext *econtext;
 	TableScanDesc scan;
 	TIDBitmap  *tbm;
-	TBMIterateResult *tbmres;
 	TupleTableSlot *slot;
 	ParallelBitmapHeapState *pstate = node->pstate;
 	dsa_area   *dsa = node->ss.ps.state->es_query_dsa;
@@ -83,7 +81,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 	slot = node->ss.ss_ScanTupleSlot;
 	scan = node->ss.ss_currentScanDesc;
 	tbm = node->tbm;
-	tbmres = node->tbmres;
 
 	/*
 	 * If we haven't yet performed the underlying index scan, do it, and begin
@@ -111,7 +108,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 			node->tbm = tbm;
 			tbmiterator = tbm_begin_iterate(tbm);
-			node->tbmres = tbmres = NULL;
 
 #ifdef USE_PREFETCH
 			if (node->prefetch_maximum > 0)
@@ -164,7 +160,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 
 			/* Allocate a private iterator and attach the shared state to it */
 			shared_tbmiterator = tbm_attach_shared_iterate(dsa, pstate->tbmiterator);
-			node->tbmres = tbmres = NULL;
 
 #ifdef USE_PREFETCH
 			if (node->prefetch_maximum > 0)
@@ -203,48 +198,23 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			node->ss.ss_currentScanDesc = scan;
 		}
 
-		node->tbmiterator = tbmiterator;
-		node->shared_tbmiterator = shared_tbmiterator;
+		scan->tbmiterator = tbmiterator;
+		scan->shared_tbmiterator = shared_tbmiterator;
 		node->initialized = true;
+
+		goto new_page;
 	}
 
 	for (;;)
 	{
-		CHECK_FOR_INTERRUPTS();
-
-		/*
-		 * Get next page of results if needed
-		 */
-		if (tbmres == NULL)
-		{
-			if (!pstate)
-				node->tbmres = tbmres = tbm_iterate(node->tbmiterator);
-			else
-				node->tbmres = tbmres = tbm_shared_iterate(node->shared_tbmiterator);
-			if (tbmres == NULL)
-			{
-				/* no more entries in the bitmap */
-				break;
-			}
-
-			BitmapAdjustPrefetchIterator(node, tbmres->blockno);
-
-			if (!table_scan_bitmap_next_block(scan, tbmres,
-											  &node->lossy_pages, &node->exact_pages))
-			{
-				/* AM doesn't think this block is valid, skip */
-				continue;
-			}
-
-			/* Adjust the prefetch target */
-			BitmapAdjustPrefetchTarget(node);
-		}
-		else
+		while (table_scan_bitmap_next_tuple(scan, slot))
 		{
 			/*
 			 * Continuing in previously obtained page.
 			 */
 
+			CHECK_FOR_INTERRUPTS();
+
 #ifdef USE_PREFETCH
 
 			/*
@@ -265,45 +235,56 @@ BitmapHeapNext(BitmapHeapScanState *node)
 				SpinLockRelease(&pstate->mutex);
 			}
 #endif							/* USE_PREFETCH */
-		}
 
-		/*
-		 * We issue prefetch requests *after* fetching the current page to try
-		 * to avoid having prefetching interfere with the main I/O. Also, this
-		 * should happen only when we have determined there is still something
-		 * to do on the current page, else we may uselessly prefetch the same
-		 * page we are just about to request for real.
-		 */
-		BitmapPrefetch(node, scan);
+			/*
+			 * We issue prefetch requests *after* fetching the current page to
+			 * try to avoid having prefetching interfere with the main I/O.
+			 * Also, this should happen only when we have determined there is
+			 * still something to do on the current page, else we may
+			 * uselessly prefetch the same page we are just about to request
+			 * for real.
+			 */
+			BitmapPrefetch(node, scan);
 
-		/*
-		 * Attempt to fetch tuple from AM.
-		 */
-		if (!table_scan_bitmap_next_tuple(scan, slot))
-		{
-			/* nothing more to look at on this page */
-			node->tbmres = tbmres = NULL;
-			continue;
+			/*
+			 * If we are using lossy info, we have to recheck the qual
+			 * conditions at every tuple.
+			 */
+			if (node->recheck)
+			{
+				econtext->ecxt_scantuple = slot;
+				if (!ExecQualAndReset(node->bitmapqualorig, econtext))
+				{
+					/* Fails recheck, so drop it and loop back for another */
+					InstrCountFiltered2(node, 1);
+					ExecClearTuple(slot);
+					continue;
+				}
+			}
+
+			/* OK to return this tuple */
+			return slot;
 		}
 
+new_page:
+
+		BitmapAdjustPrefetchIterator(node);
+
+		if (!table_scan_bitmap_next_block(scan, &node->blockno, &node->recheck,
+										  &node->lossy_pages, &node->exact_pages))
+			break;
+
 		/*
-		 * If we are using lossy info, we have to recheck the qual conditions
-		 * at every tuple.
+		 * If serial, we can error out if the the prefetch block doesn't stay
+		 * ahead of the current block.
 		 */
-		if (tbmres->recheck)
-		{
-			econtext->ecxt_scantuple = slot;
-			if (!ExecQualAndReset(node->bitmapqualorig, econtext))
-			{
-				/* Fails recheck, so drop it and loop back for another */
-				InstrCountFiltered2(node, 1);
-				ExecClearTuple(slot);
-				continue;
-			}
-		}
+		if (node->pstate == NULL &&
+			node->prefetch_iterator &&
+			node->pfblockno < node->blockno)
+			elog(ERROR, "prefetch and main iterators are out of sync");
 
-		/* OK to return this tuple */
-		return slot;
+		/* Adjust the prefetch target */
+		BitmapAdjustPrefetchTarget(node);
 	}
 
 	/*
@@ -329,13 +310,17 @@ BitmapDoneInitializingSharedState(ParallelBitmapHeapState *pstate)
 
 /*
  *	BitmapAdjustPrefetchIterator - Adjust the prefetch iterator
+ *
+ *	We keep track of how far the prefetch iterator is ahead of the main
+ *	iterator in prefetch_pages. For each block the main iterator returns, we
+ *	decrement prefetch_pages.
  */
 static inline void
-BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
-							 BlockNumber blockno)
+BitmapAdjustPrefetchIterator(BitmapHeapScanState *node)
 {
 #ifdef USE_PREFETCH
 	ParallelBitmapHeapState *pstate = node->pstate;
+	TBMIterateResult *tbmpre;
 
 	if (pstate == NULL)
 	{
@@ -349,14 +334,21 @@ BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
 		else if (prefetch_iterator)
 		{
 			/* Do not let the prefetch iterator get behind the main one */
-			TBMIterateResult *tbmpre = tbm_iterate(prefetch_iterator);
-
-			if (tbmpre == NULL || tbmpre->blockno != blockno)
-				elog(ERROR, "prefetch and main iterators are out of sync");
+			tbmpre = tbm_iterate(prefetch_iterator);
+			node->pfblockno = tbmpre ? tbmpre->blockno : InvalidBlockNumber;
 		}
 		return;
 	}
 
+	/*
+	 * XXX: There is a known issue with keeping the prefetch and current block
+	 * iterators in sync for parallel bitmapheapscans. This can lead to
+	 * prefetching blocks that have already been read. See the discussion
+	 * here:
+	 * https://postgr.es/m/20240315211449.en2jcmdqxv5o6tlz%40alap3.anarazel.de
+	 * Note that moving the call site of BitmapAdjustPrefetchIterator()
+	 * exacerbates the effects of this bug.
+	 */
 	if (node->prefetch_maximum > 0)
 	{
 		TBMSharedIterator *prefetch_iterator = node->shared_prefetch_iterator;
@@ -381,7 +373,10 @@ BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
 			 * case.
 			 */
 			if (prefetch_iterator)
-				tbm_shared_iterate(prefetch_iterator);
+			{
+				tbmpre = tbm_shared_iterate(prefetch_iterator);
+				node->pfblockno = tbmpre ? tbmpre->blockno : InvalidBlockNumber;
+			}
 		}
 	}
 #endif							/* USE_PREFETCH */
@@ -459,6 +454,7 @@ BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan)
 					break;
 				}
 				node->prefetch_pages++;
+				node->pfblockno = tbmpre->blockno;
 
 				/*
 				 * If we expect not to have to actually read this heap page,
@@ -516,6 +512,8 @@ BitmapPrefetch(BitmapHeapScanState *node, TableScanDesc scan)
 					break;
 				}
 
+				node->pfblockno = tbmpre->blockno;
+
 				/* As above, skip prefetch if we expect not to need page */
 				skip_fetch = (!(scan->rs_flags & SO_NEED_TUPLES) &&
 							  !tbmpre->recheck &&
@@ -577,12 +575,8 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
 		table_rescan(node->ss.ss_currentScanDesc, NULL);
 
 	/* release bitmaps and buffers if any */
-	if (node->tbmiterator)
-		tbm_end_iterate(node->tbmiterator);
 	if (node->prefetch_iterator)
 		tbm_end_iterate(node->prefetch_iterator);
-	if (node->shared_tbmiterator)
-		tbm_end_shared_iterate(node->shared_tbmiterator);
 	if (node->shared_prefetch_iterator)
 		tbm_end_shared_iterate(node->shared_prefetch_iterator);
 	if (node->tbm)
@@ -590,13 +584,13 @@ ExecReScanBitmapHeapScan(BitmapHeapScanState *node)
 	if (node->pvmbuffer != InvalidBuffer)
 		ReleaseBuffer(node->pvmbuffer);
 	node->tbm = NULL;
-	node->tbmiterator = NULL;
-	node->tbmres = NULL;
 	node->prefetch_iterator = NULL;
 	node->initialized = false;
-	node->shared_tbmiterator = NULL;
 	node->shared_prefetch_iterator = NULL;
 	node->pvmbuffer = InvalidBuffer;
+	node->recheck = true;
+	node->blockno = InvalidBlockNumber;
+	node->pfblockno = InvalidBlockNumber;
 
 	ExecScanReScan(&node->ss);
 
@@ -627,28 +621,24 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
 	 */
 	ExecEndNode(outerPlanState(node));
 
+
+	/*
+	 * close heap scan
+	 */
+	if (scanDesc)
+		table_endscan(scanDesc);
+
 	/*
 	 * release bitmaps and buffers if any
 	 */
-	if (node->tbmiterator)
-		tbm_end_iterate(node->tbmiterator);
 	if (node->prefetch_iterator)
 		tbm_end_iterate(node->prefetch_iterator);
 	if (node->tbm)
 		tbm_free(node->tbm);
-	if (node->shared_tbmiterator)
-		tbm_end_shared_iterate(node->shared_tbmiterator);
 	if (node->shared_prefetch_iterator)
 		tbm_end_shared_iterate(node->shared_prefetch_iterator);
 	if (node->pvmbuffer != InvalidBuffer)
 		ReleaseBuffer(node->pvmbuffer);
-
-	/*
-	 * close heap scan
-	 */
-	if (scanDesc)
-		table_endscan(scanDesc);
-
 }
 
 /* ----------------------------------------------------------------
@@ -681,8 +671,6 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 	scanstate->ss.ps.ExecProcNode = ExecBitmapHeapScan;
 
 	scanstate->tbm = NULL;
-	scanstate->tbmiterator = NULL;
-	scanstate->tbmres = NULL;
 	scanstate->pvmbuffer = InvalidBuffer;
 	scanstate->exact_pages = 0;
 	scanstate->lossy_pages = 0;
@@ -690,9 +678,11 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 	scanstate->prefetch_pages = 0;
 	scanstate->prefetch_target = 0;
 	scanstate->initialized = false;
-	scanstate->shared_tbmiterator = NULL;
 	scanstate->shared_prefetch_iterator = NULL;
 	scanstate->pstate = NULL;
+	scanstate->recheck = true;
+	scanstate->blockno = InvalidBlockNumber;
+	scanstate->pfblockno = InvalidBlockNumber;
 
 	/*
 	 * Miscellaneous initialization
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 521043304ab..92b829cebc7 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -24,6 +24,9 @@
 
 struct ParallelTableScanDescData;
 
+struct TBMIterator;
+struct TBMSharedIterator;
+
 /*
  * Generic descriptor for table scans. This is the base-class for table scans,
  * which needs to be embedded in the scans of individual AMs.
@@ -40,6 +43,10 @@ typedef struct TableScanDescData
 	ItemPointerData rs_mintid;
 	ItemPointerData rs_maxtid;
 
+	/* Only used for Bitmap table scans */
+	struct TBMIterator *tbmiterator;
+	struct TBMSharedIterator *shared_tbmiterator;
+
 	/*
 	 * Information about type and behaviour of the scan, a bitmask of members
 	 * of the ScanOptions enum (see tableam.h).
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 697b5488b10..68a479496d7 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
 #include "access/xact.h"
 #include "commands/vacuum.h"
 #include "executor/tuptable.h"
+#include "nodes/tidbitmap.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
 
@@ -773,19 +774,14 @@ typedef struct TableAmRoutine
 	 */
 
 	/*
-	 * Prepare to fetch / check / return tuples from `tbmres->blockno` as part
-	 * of a bitmap table scan. `scan` was started via table_beginscan_bm().
-	 * Return false if there are no tuples to be found on the page, true
-	 * otherwise.
+	 * Prepare to fetch / check / return tuples from `blockno` as part of a
+	 * bitmap table scan. `scan` was started via table_beginscan_bm(). Return
+	 * false if the bitmap is exhausted and true otherwise.
 	 *
 	 * This will typically read and pin the target block, and do the necessary
 	 * work to allow scan_bitmap_next_tuple() to return tuples (e.g. it might
 	 * make sense to perform tuple visibility checks at this time).
 	 *
-	 * If `tbmres->blockno` is -1, this is a lossy scan and all visible tuples
-	 * on the page have to be returned, otherwise the tuples at offsets in
-	 * `tbmres->offsets` need to be returned.
-	 *
 	 * lossy_pages is incremented if the bitmap is lossy for the selected
 	 * block; otherwise, exact_pages is incremented.
 	 *
@@ -804,7 +800,7 @@ typedef struct TableAmRoutine
 	 * scan_bitmap_next_tuple need to exist, or neither.
 	 */
 	bool		(*scan_bitmap_next_block) (TableScanDesc scan,
-										   struct TBMIterateResult *tbmres,
+										   BlockNumber *blockno, bool *recheck,
 										   long *lossy_pages, long *exact_pages);
 
 	/*
@@ -942,12 +938,16 @@ static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
 				   int nkeys, struct ScanKeyData *key, bool need_tuple)
 {
+	TableScanDesc result;
 	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	if (need_tuple)
 		flags |= SO_NEED_TUPLES;
 
-	return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+	result = rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+	result->shared_tbmiterator = NULL;
+	result->tbmiterator = NULL;
+	return result;
 }
 
 /*
@@ -994,6 +994,21 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 static inline void
 table_endscan(TableScanDesc scan)
 {
+	if (scan->rs_flags & SO_TYPE_BITMAPSCAN)
+	{
+		if (scan->shared_tbmiterator)
+		{
+			tbm_end_shared_iterate(scan->shared_tbmiterator);
+			scan->shared_tbmiterator = NULL;
+		}
+
+		if (scan->tbmiterator)
+		{
+			tbm_end_iterate(scan->tbmiterator);
+			scan->tbmiterator = NULL;
+		}
+	}
+
 	scan->rs_rd->rd_tableam->scan_end(scan);
 }
 
@@ -1004,6 +1019,21 @@ static inline void
 table_rescan(TableScanDesc scan,
 			 struct ScanKeyData *key)
 {
+	if (scan->rs_flags & SO_TYPE_BITMAPSCAN)
+	{
+		if (scan->shared_tbmiterator)
+		{
+			tbm_end_shared_iterate(scan->shared_tbmiterator);
+			scan->shared_tbmiterator = NULL;
+		}
+
+		if (scan->tbmiterator)
+		{
+			tbm_end_iterate(scan->tbmiterator);
+			scan->tbmiterator = NULL;
+		}
+	}
+
 	scan->rs_rd->rd_tableam->scan_rescan(scan, key, false, false, false, false);
 }
 
@@ -1967,19 +1997,18 @@ table_relation_estimate_size(Relation rel, int32 *attr_widths,
  */
 
 /*
- * Prepare to fetch / check / return tuples from `tbmres->blockno` as part of a
- * bitmap table scan. `scan` needs to have been started via
- * table_beginscan_bm(). Returns false if there are no tuples to be found on
- * the page, true otherwise. lossy_pages is incremented is the block's
- * representation in the bitmap is lossy; otherwise, exact_pages is
- * incremented.
+ * Prepare to fetch / check / return tuples as part of a bitmap table scan.
+ * `scan` needs to have been started via table_beginscan_bm(). Returns false if
+ * there are no more blocks in the bitmap, true otherwise. lossy_pages is
+ * incremented is the block's representation in the bitmap is lossy; otherwise,
+ * exact_pages is incremented.
  *
  * Note, this is an optionally implemented function, therefore should only be
  * used after verifying the presence (at plan time or such).
  */
 static inline bool
 table_scan_bitmap_next_block(TableScanDesc scan,
-							 struct TBMIterateResult *tbmres,
+							 BlockNumber *blockno, bool *recheck,
 							 long *lossy_pages,
 							 long *exact_pages)
 {
@@ -1992,9 +2021,8 @@ table_scan_bitmap_next_block(TableScanDesc scan,
 		elog(ERROR, "unexpected table_scan_bitmap_next_block call during logical decoding");
 
 	return scan->rs_rd->rd_tableam->scan_bitmap_next_block(scan,
-														   tbmres,
-														   lossy_pages,
-														   exact_pages);
+														   blockno, recheck,
+														   lossy_pages, exact_pages);
 }
 
 /*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index fa2f70b7a48..d96703b04d4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1792,8 +1792,6 @@ typedef struct ParallelBitmapHeapState
  *
  *		bitmapqualorig	   execution state for bitmapqualorig expressions
  *		tbm				   bitmap obtained from child index scan(s)
- *		tbmiterator		   iterator for scanning current pages
- *		tbmres			   current-page data
  *		pvmbuffer		   buffer for visibility-map lookups of prefetched pages
  *		exact_pages		   total number of exact pages retrieved
  *		lossy_pages		   total number of lossy pages retrieved
@@ -1802,9 +1800,11 @@ typedef struct ParallelBitmapHeapState
  *		prefetch_target    current target prefetch distance
  *		prefetch_maximum   maximum value for prefetch_target
  *		initialized		   is node is ready to iterate
- *		shared_tbmiterator	   shared iterator
  *		shared_prefetch_iterator shared iterator for prefetching
  *		pstate			   shared state for parallel bitmap scan
+ *		recheck			   do current page's tuples need recheck
+ *		blockno			   used to validate pf and current block in sync
+ *		pfblockno		   used to validate pf stays ahead of current block
  * ----------------
  */
 typedef struct BitmapHeapScanState
@@ -1812,8 +1812,6 @@ typedef struct BitmapHeapScanState
 	ScanState	ss;				/* its first field is NodeTag */
 	ExprState  *bitmapqualorig;
 	TIDBitmap  *tbm;
-	TBMIterator *tbmiterator;
-	TBMIterateResult *tbmres;
 	Buffer		pvmbuffer;
 	long		exact_pages;
 	long		lossy_pages;
@@ -1822,9 +1820,11 @@ typedef struct BitmapHeapScanState
 	int			prefetch_target;
 	int			prefetch_maximum;
 	bool		initialized;
-	TBMSharedIterator *shared_tbmiterator;
 	TBMSharedIterator *shared_prefetch_iterator;
 	ParallelBitmapHeapState *pstate;
+	bool		recheck;
+	BlockNumber blockno;
+	BlockNumber pfblockno;
 } BitmapHeapScanState;
 
 /* ----------------
-- 
2.40.1

Reply via email to