On Mon, Apr 6, 2026 at 4:39 PM Tomas Vondra <[email protected]> wrote:
>
> On 4/6/26 18:50, Melanie Plageman wrote:
>
> > I cleaned up the first patch in the set that refactors index-only and
> > index scan to use this pattern and realized that I wasn't sure what to
> > do about the duplication between index-only and index scans for these
> > functions.
>
<--snip-->
>
> I think this is ready to go, with a tiny amount of polishing.
>
> 1) "amount of DSM" sounds a bit strange to me. The wording "amount of
> space in ..." from the other nodes seems better to me. Or maybe I'm just
> used to it, not sure.

Fixed that before pushing.

> 2) I wonder if maybe PARALLEL_KEY_SCAN_INSTRUMENT_OFFSET should be
> placed in plannodes.h, because that's where Plan->plan_node_id is
> defined. instrument_node.h works too, but the places accessing the
> plan_node_id already have to include plannodes.h.

I don't think it really belongs there. It is very specific to
execution. We use the plan node id as the key for convenience, but it
isn't the purpose of plan_node_id.
PARALLEL_KEY_SCAN_INSTRUMENT_OFFSET's only purpose is to be used
during execution for instrumentation, so it feels like it belongs in
node_instrument.h. And we don't need a separate include for it either.
It goes with the other stuff being defined in instrument_node.h (like
SharedIndexScanInstrumentation) and being used by those callers. I
admit the comment above it is a bit odd, but I think it is ultimately
okay.

> We need to decide whether to push this into PG19. This was primarily
> motivated by the index prefetching work, but we now know that won't
> happen in PG19 :-( But the instrumentation is useful even for the three
> scans using read streams, so I think it's a meaningful improvement.

I think it is a meaningful improvement too. I think we should do it.

> If you think you can get this pushed, I'll do my best to finalize the
> instrumentation and SeqScan/TidRangeScan parts.

I've pushed the first patch for index/index-only scans. Attached is
the BHS fix that uses the new pattern. It needs at least one review
before pushing. While I was polishing it, I realized I neglected to
use add_size()/mul_size() in the index-only/index scan patches. So,
0002 is just a fix commit to do that. Feel free to push these if you
think they're ready. Otherwise, I'll do so pending your review in my
morning.

- Melanie
From 4a792a3d864761741c7ce5fd0e10a9f1c9539094 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sun, 5 Apr 2026 16:27:22 -0400
Subject: [PATCH v11 1/2] Fix BitmapHeapScan non-parallel-aware EXPLAIN ANALYZE

Allocate shared bitmap table scan instrumentation in its own DSM chunk,
separate from ParallelBitmapHeapState. Previously, shared
instrumentation was only allocated for parallel-aware nodes, so a
non-parallel-aware bitmap table scan in a parallel query had no shared
instrumentation and EXPLAIN ANALYZE didn't report exact/lossy pages.
This affected queries like bitmap table scans on the outside of a
parallel join or those run with debug_parallel_query=regress. Fix by
allocating a separate DSM chunk for shared instrumentation and doing so
regardless of parallel-awareness.
---
 src/backend/commands/explain.c            |   2 +-
 src/backend/executor/execParallel.c       |   9 ++
 src/backend/executor/nodeBitmapHeapscan.c | 111 +++++++++++++---------
 src/include/executor/nodeBitmapHeapscan.h |   6 ++
 4 files changed, 83 insertions(+), 45 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 73eaaf176ac..f151f21f9b3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3948,7 +3948,7 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
 	}
 
 	/* Display stats for each parallel worker */
-	if (planstate->pstate != NULL)
+	if (planstate->sinstrument != NULL)
 	{
 		for (int n = 0; n < planstate->sinstrument->num_workers; n++)
 		{
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 726aca230a4..1a5ec0c305f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -303,6 +303,9 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e)
 			if (planstate->plan->parallel_aware)
 				ExecBitmapHeapEstimate((BitmapHeapScanState *) planstate,
 									   e->pcxt);
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecBitmapHeapInstrumentEstimate((BitmapHeapScanState *) planstate,
+											 e->pcxt);
 			break;
 		case T_HashJoinState:
 			if (planstate->plan->parallel_aware)
@@ -542,6 +545,9 @@ ExecParallelInitializeDSM(PlanState *planstate,
 			if (planstate->plan->parallel_aware)
 				ExecBitmapHeapInitializeDSM((BitmapHeapScanState *) planstate,
 											d->pcxt);
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecBitmapHeapInstrumentInitDSM((BitmapHeapScanState *) planstate,
+											d->pcxt);
 			break;
 		case T_HashJoinState:
 			if (planstate->plan->parallel_aware)
@@ -1427,6 +1433,9 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt)
 			if (planstate->plan->parallel_aware)
 				ExecBitmapHeapInitializeWorker((BitmapHeapScanState *) planstate,
 											   pwcxt);
+			/* even when not parallel-aware, for EXPLAIN ANALYZE */
+			ExecBitmapHeapInstrumentInitWorker((BitmapHeapScanState *) planstate,
+											   pwcxt);
 			break;
 		case T_HashJoinState:
 			if (planstate->plan->parallel_aware)
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 73831aed451..d65e2a87b42 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -495,18 +495,8 @@ void
 ExecBitmapHeapEstimate(BitmapHeapScanState *node,
 					   ParallelContext *pcxt)
 {
-	Size		size;
-
-	size = MAXALIGN(sizeof(ParallelBitmapHeapState));
-
-	/* account for instrumentation, if required */
-	if (node->ss.ps.instrument && pcxt->nworkers > 0)
-	{
-		size = add_size(size, offsetof(SharedBitmapHeapInstrumentation, sinstrument));
-		size = add_size(size, mul_size(pcxt->nworkers, sizeof(BitmapHeapScanInstrumentation)));
-	}
-
-	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_chunk(&pcxt->estimator,
+						   MAXALIGN(sizeof(ParallelBitmapHeapState)));
 	shm_toc_estimate_keys(&pcxt->estimator, 1);
 }
 
@@ -521,27 +511,15 @@ ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node,
 							ParallelContext *pcxt)
 {
 	ParallelBitmapHeapState *pstate;
-	SharedBitmapHeapInstrumentation *sinstrument = NULL;
 	dsa_area   *dsa = node->ss.ps.state->es_query_dsa;
-	char	   *ptr;
-	Size		size;
 
 	/* If there's no DSA, there are no workers; initialize nothing. */
 	if (dsa == NULL)
 		return;
 
-	size = MAXALIGN(sizeof(ParallelBitmapHeapState));
-	if (node->ss.ps.instrument && pcxt->nworkers > 0)
-	{
-		size = add_size(size, offsetof(SharedBitmapHeapInstrumentation, sinstrument));
-		size = add_size(size, mul_size(pcxt->nworkers, sizeof(BitmapHeapScanInstrumentation)));
-	}
-
-	ptr = shm_toc_allocate(pcxt->toc, size);
-	pstate = (ParallelBitmapHeapState *) ptr;
-	ptr += MAXALIGN(sizeof(ParallelBitmapHeapState));
-	if (node->ss.ps.instrument && pcxt->nworkers > 0)
-		sinstrument = (SharedBitmapHeapInstrumentation *) ptr;
+	pstate = (ParallelBitmapHeapState *)
+		shm_toc_allocate(pcxt->toc,
+						 MAXALIGN(sizeof(ParallelBitmapHeapState)));
 
 	pstate->tbmiterator = 0;
 
@@ -551,18 +529,8 @@ ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node,
 
 	ConditionVariableInit(&pstate->cv);
 
-	if (sinstrument)
-	{
-		sinstrument->num_workers = pcxt->nworkers;
-
-		/* ensure any unfilled slots will contain zeroes */
-		memset(sinstrument->sinstrument, 0,
-			   pcxt->nworkers * sizeof(BitmapHeapScanInstrumentation));
-	}
-
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pstate);
 	node->pstate = pstate;
-	node->sinstrument = sinstrument;
 }
 
 /* ----------------------------------------------------------------
@@ -600,17 +568,72 @@ void
 ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node,
 							   ParallelWorkerContext *pwcxt)
 {
-	char	   *ptr;
-
 	Assert(node->ss.ps.state->es_query_dsa != NULL);
 
-	ptr = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+	node->pstate = (ParallelBitmapHeapState *)
+		shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+}
+
+/*
+ * Compute the amount of space we'll need for the shared instrumentation and
+ * inform pcxt->estimator.
+ */
+void
+ExecBitmapHeapInstrumentEstimate(BitmapHeapScanState *node,
+								 ParallelContext *pcxt)
+{
+	Size		size;
+
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = add_size(offsetof(SharedBitmapHeapInstrumentation, sinstrument),
+					mul_size(pcxt->nworkers, sizeof(BitmapHeapScanInstrumentation)));
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/*
+ * Set up parallel bitmap heap scan instrumentation.
+ */
+void
+ExecBitmapHeapInstrumentInitDSM(BitmapHeapScanState *node,
+								ParallelContext *pcxt)
+{
+	Size		size;
+
+	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
+		return;
+
+	size = add_size(offsetof(SharedBitmapHeapInstrumentation, sinstrument),
+					mul_size(pcxt->nworkers, sizeof(BitmapHeapScanInstrumentation)));
+	node->sinstrument =
+		(SharedBitmapHeapInstrumentation *) shm_toc_allocate(pcxt->toc, size);
+
+	/* Each per-worker area must start out as zeroes */
+	memset(node->sinstrument, 0, size);
+	node->sinstrument->num_workers = pcxt->nworkers;
+	shm_toc_insert(pcxt->toc,
+				   node->ss.ps.plan->plan_node_id +
+				   PARALLEL_KEY_SCAN_INSTRUMENT_OFFSET,
+				   node->sinstrument);
+}
 
-	node->pstate = (ParallelBitmapHeapState *) ptr;
-	ptr += MAXALIGN(sizeof(ParallelBitmapHeapState));
+/*
+ * Look up and save the location of the shared instrumentation.
+ */
+void
+ExecBitmapHeapInstrumentInitWorker(BitmapHeapScanState *node,
+								   ParallelWorkerContext *pwcxt)
+{
+	if (!node->ss.ps.instrument)
+		return;
 
-	if (node->ss.ps.instrument)
-		node->sinstrument = (SharedBitmapHeapInstrumentation *) ptr;
+	node->sinstrument = (SharedBitmapHeapInstrumentation *)
+		shm_toc_lookup(pwcxt->toc,
+					   node->ss.ps.plan->plan_node_id +
+					   PARALLEL_KEY_SCAN_INSTRUMENT_OFFSET,
+					   false);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/include/executor/nodeBitmapHeapscan.h b/src/include/executor/nodeBitmapHeapscan.h
index 5d82f71abff..5c0b6592a1f 100644
--- a/src/include/executor/nodeBitmapHeapscan.h
+++ b/src/include/executor/nodeBitmapHeapscan.h
@@ -28,6 +28,12 @@ extern void ExecBitmapHeapReInitializeDSM(BitmapHeapScanState *node,
 										  ParallelContext *pcxt);
 extern void ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node,
 										   ParallelWorkerContext *pwcxt);
+extern void ExecBitmapHeapInstrumentEstimate(BitmapHeapScanState *node,
+											 ParallelContext *pcxt);
+extern void ExecBitmapHeapInstrumentInitDSM(BitmapHeapScanState *node,
+											ParallelContext *pcxt);
+extern void ExecBitmapHeapInstrumentInitWorker(BitmapHeapScanState *node,
+											   ParallelWorkerContext *pwcxt);
 extern void ExecBitmapHeapRetrieveInstrumentation(BitmapHeapScanState *node);
 
 #endif							/* NODEBITMAPHEAPSCAN_H */
-- 
2.43.0

From 9ab70837dac963dc1ae6721d0f45dc2c9d3f398d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 6 Apr 2026 20:06:49 -0400
Subject: [PATCH v11 2/2] Use add_size/mul_size for index instrumentation size
 calculations

Use overflow-safe size arithmetic in the Index[Only]Scan and parallel
instrumentation functions, consistent with other executor nodes (Hash,
Sort, Agg, Memoize). This was an oversight in dd78e69cfc3.
---
 src/backend/executor/nodeIndexonlyscan.c | 8 ++++----
 src/backend/executor/nodeIndexscan.c     | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c4ff422756e..d52012e8a69 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -854,8 +854,8 @@ ExecIndexOnlyScanInstrumentEstimate(IndexOnlyScanState *node,
 	 * in the IndexOnlyScanState. We'll recalculate the needed size in
 	 * ExecIndexOnlyScanInstrumentInitDSM().
 	 */
-	size = offsetof(SharedIndexScanInstrumentation, winstrument) +
-		pcxt->nworkers * sizeof(IndexScanInstrumentation);
+	size = add_size(offsetof(SharedIndexScanInstrumentation, winstrument),
+					mul_size(pcxt->nworkers, sizeof(IndexScanInstrumentation)));
 	shm_toc_estimate_chunk(&pcxt->estimator, size);
 	shm_toc_estimate_keys(&pcxt->estimator, 1);
 }
@@ -872,8 +872,8 @@ ExecIndexOnlyScanInstrumentInitDSM(IndexOnlyScanState *node,
 	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
 		return;
 
-	size = offsetof(SharedIndexScanInstrumentation, winstrument) +
-		pcxt->nworkers * sizeof(IndexScanInstrumentation);
+	size = add_size(offsetof(SharedIndexScanInstrumentation, winstrument),
+					mul_size(pcxt->nworkers, sizeof(IndexScanInstrumentation)));
 	node->ioss_SharedInfo =
 		(SharedIndexScanInstrumentation *) shm_toc_allocate(pcxt->toc, size);
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 6cc927f2454..39f6691ee35 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -1789,8 +1789,8 @@ ExecIndexScanInstrumentEstimate(IndexScanState *node,
 	 * in the IndexScanState. We'll recalculate the needed size in
 	 * ExecIndexScanInstrumentInitDSM().
 	 */
-	size = offsetof(SharedIndexScanInstrumentation, winstrument) +
-		pcxt->nworkers * sizeof(IndexScanInstrumentation);
+	size = add_size(offsetof(SharedIndexScanInstrumentation, winstrument),
+					mul_size(pcxt->nworkers, sizeof(IndexScanInstrumentation)));
 	shm_toc_estimate_chunk(&pcxt->estimator, size);
 	shm_toc_estimate_keys(&pcxt->estimator, 1);
 }
@@ -1807,8 +1807,8 @@ ExecIndexScanInstrumentInitDSM(IndexScanState *node,
 	if (!node->ss.ps.instrument || pcxt->nworkers == 0)
 		return;
 
-	size = offsetof(SharedIndexScanInstrumentation, winstrument) +
-		pcxt->nworkers * sizeof(IndexScanInstrumentation);
+	size = add_size(offsetof(SharedIndexScanInstrumentation, winstrument),
+					mul_size(pcxt->nworkers, sizeof(IndexScanInstrumentation)));
 	node->iss_SharedInfo =
 		(SharedIndexScanInstrumentation *) shm_toc_allocate(pcxt->toc, size);
 
-- 
2.43.0

Reply via email to