On Mon, Apr 6, 2026 at 4:39 PM Tomas Vondra <[email protected]> wrote: > > On 4/6/26 18:50, Melanie Plageman wrote: > > > I cleaned up the first patch in the set that refactors index-only and > > index scan to use this pattern and realized that I wasn't sure what to > > do about the duplication between index-only and index scans for these > > functions. > <--snip--> > > I think this is ready to go, with a tiny amount of polishing. > > 1) "amount of DSM" sounds a bit strange to me. The wording "amount of > space in ..." from the other nodes seems better to me. Or maybe I'm just > used to it, not sure.
Fixed that before pushing. > 2) I wonder if maybe PARALLEL_KEY_SCAN_INSTRUMENT_OFFSET should be > placed in plannodes.h, because that's where Plan->plan_node_id is > defined. instrument_node.h works too, but the places accessing the > plan_node_id already have to include plannodes.h. I don't think it really belongs there. It is very specific to execution. We use the plan node id as the key for convenience, but it isn't the purpose of plan_node_id. PARALLEL_KEY_SCAN_INSTRUMENT_OFFSET's only purpose is to be used during execution for instrumentation, so it feels like it belongs in node_instrument.h. And we don't need a separate include for it either. It goes with the other stuff being defined in instrument_node.h (like SharedIndexScanInstrumentation) and being used by those callers. I admit the comment above it is a bit odd, but I think it is ultimately okay. > We need to decide whether to push this into PG19. This was primarily > motivated by the index prefetching work, but we now know that won't > happen in PG19 :-( But the instrumentation is useful even for the three > scans using read streams, so I think it's a meaningful improvement. I think it is a meaningful improvement too. I think we should do it. > If you think you can get this pushed, I'll do my best to finalize the > instrumentation and SeqScan/TidRangeScan parts. I've pushed the first patch for index/index-only scans. Attached is the BHS fix that uses the new pattern. It needs at least one review before pushing. While I was polishing it, I realized I neglected to use add_size()/mul_size() in the index-only/index scan patches. So, 0002 is just a fix commit to do that. Feel free to push these if you think they're ready. Otherwise, I'll do so pending your review in my morning. - Melanie
From 4a792a3d864761741c7ce5fd0e10a9f1c9539094 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <[email protected]> Date: Sun, 5 Apr 2026 16:27:22 -0400 Subject: [PATCH v11 1/2] Fix BitmapHeapScan non-parallel-aware EXPLAIN ANALYZE Allocate shared bitmap table scan instrumentation in its own DSM chunk, separate from ParallelBitmapHeapState. Previously, shared instrumentation was only allocated for parallel-aware nodes, so a non-parallel-aware bitmap table scan in a parallel query had no shared instrumentation and EXPLAIN ANALYZE didn't report exact/lossy pages. This affected queries like bitmap table scans on the outside of a parallel join or those run with debug_parallel_query=regress. Fix by allocating a separate DSM chunk for shared instrumentation and doing so regardless of parallel-awareness. --- src/backend/commands/explain.c | 2 +- src/backend/executor/execParallel.c | 9 ++ src/backend/executor/nodeBitmapHeapscan.c | 111 +++++++++++++--------- src/include/executor/nodeBitmapHeapscan.h | 6 ++ 4 files changed, 83 insertions(+), 45 deletions(-) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 73eaaf176ac..f151f21f9b3 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -3948,7 +3948,7 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es) } /* Display stats for each parallel worker */ - if (planstate->pstate != NULL) + if (planstate->sinstrument != NULL) { for (int n = 0; n < planstate->sinstrument->num_workers; n++) { diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c index 726aca230a4..1a5ec0c305f 100644 --- a/src/backend/executor/execParallel.c +++ b/src/backend/executor/execParallel.c @@ -303,6 +303,9 @@ ExecParallelEstimate(PlanState *planstate, ExecParallelEstimateContext *e) if (planstate->plan->parallel_aware) ExecBitmapHeapEstimate((BitmapHeapScanState *) planstate, e->pcxt); + /* even when not parallel-aware, for EXPLAIN ANALYZE */ + ExecBitmapHeapInstrumentEstimate((BitmapHeapScanState *) planstate, + e->pcxt); break; case T_HashJoinState: if (planstate->plan->parallel_aware) @@ -542,6 +545,9 @@ ExecParallelInitializeDSM(PlanState *planstate, if (planstate->plan->parallel_aware) ExecBitmapHeapInitializeDSM((BitmapHeapScanState *) planstate, d->pcxt); + /* even when not parallel-aware, for EXPLAIN ANALYZE */ + ExecBitmapHeapInstrumentInitDSM((BitmapHeapScanState *) planstate, + d->pcxt); break; case T_HashJoinState: if (planstate->plan->parallel_aware) @@ -1427,6 +1433,9 @@ ExecParallelInitializeWorker(PlanState *planstate, ParallelWorkerContext *pwcxt) if (planstate->plan->parallel_aware) ExecBitmapHeapInitializeWorker((BitmapHeapScanState *) planstate, pwcxt); + /* even when not parallel-aware, for EXPLAIN ANALYZE */ + ExecBitmapHeapInstrumentInitWorker((BitmapHeapScanState *) planstate, + pwcxt); break; case T_HashJoinState: if (planstate->plan->parallel_aware) diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c index 73831aed451..d65e2a87b42 100644 --- a/src/backend/executor/nodeBitmapHeapscan.c +++ b/src/backend/executor/nodeBitmapHeapscan.c @@ -495,18 +495,8 @@ void ExecBitmapHeapEstimate(BitmapHeapScanState *node, ParallelContext *pcxt) { - Size size; - - size = MAXALIGN(sizeof(ParallelBitmapHeapState)); - - /* account for instrumentation, if required */ - if (node->ss.ps.instrument && pcxt->nworkers > 0) - { - size = add_size(size, offsetof(SharedBitmapHeapInstrumentation, sinstrument)); - size = add_size(size, mul_size(pcxt->nworkers, sizeof(BitmapHeapScanInstrumentation))); - } - - shm_toc_estimate_chunk(&pcxt->estimator, size); + shm_toc_estimate_chunk(&pcxt->estimator, + MAXALIGN(sizeof(ParallelBitmapHeapState))); shm_toc_estimate_keys(&pcxt->estimator, 1); } @@ -521,27 +511,15 @@ ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node, ParallelContext *pcxt) { ParallelBitmapHeapState *pstate; - SharedBitmapHeapInstrumentation *sinstrument = NULL; dsa_area *dsa = node->ss.ps.state->es_query_dsa; - char *ptr; - Size size; /* If there's no DSA, there are no workers; initialize nothing. */ if (dsa == NULL) return; - size = MAXALIGN(sizeof(ParallelBitmapHeapState)); - if (node->ss.ps.instrument && pcxt->nworkers > 0) - { - size = add_size(size, offsetof(SharedBitmapHeapInstrumentation, sinstrument)); - size = add_size(size, mul_size(pcxt->nworkers, sizeof(BitmapHeapScanInstrumentation))); - } - - ptr = shm_toc_allocate(pcxt->toc, size); - pstate = (ParallelBitmapHeapState *) ptr; - ptr += MAXALIGN(sizeof(ParallelBitmapHeapState)); - if (node->ss.ps.instrument && pcxt->nworkers > 0) - sinstrument = (SharedBitmapHeapInstrumentation *) ptr; + pstate = (ParallelBitmapHeapState *) + shm_toc_allocate(pcxt->toc, + MAXALIGN(sizeof(ParallelBitmapHeapState))); pstate->tbmiterator = 0; @@ -551,18 +529,8 @@ ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node, ConditionVariableInit(&pstate->cv); - if (sinstrument) - { - sinstrument->num_workers = pcxt->nworkers; - - /* ensure any unfilled slots will contain zeroes */ - memset(sinstrument->sinstrument, 0, - pcxt->nworkers * sizeof(BitmapHeapScanInstrumentation)); - } - shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pstate); node->pstate = pstate; - node->sinstrument = sinstrument; } /* ---------------------------------------------------------------- @@ -600,17 +568,72 @@ void ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node, ParallelWorkerContext *pwcxt) { - char *ptr; - Assert(node->ss.ps.state->es_query_dsa != NULL); - ptr = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false); + node->pstate = (ParallelBitmapHeapState *) + shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false); +} + +/* + * Compute the amount of space we'll need for the shared instrumentation and + * inform pcxt->estimator. + */ +void +ExecBitmapHeapInstrumentEstimate(BitmapHeapScanState *node, + ParallelContext *pcxt) +{ + Size size; + + if (!node->ss.ps.instrument || pcxt->nworkers == 0) + return; + + size = add_size(offsetof(SharedBitmapHeapInstrumentation, sinstrument), + mul_size(pcxt->nworkers, sizeof(BitmapHeapScanInstrumentation))); + shm_toc_estimate_chunk(&pcxt->estimator, size); + shm_toc_estimate_keys(&pcxt->estimator, 1); +} + +/* + * Set up parallel bitmap heap scan instrumentation. + */ +void +ExecBitmapHeapInstrumentInitDSM(BitmapHeapScanState *node, + ParallelContext *pcxt) +{ + Size size; + + if (!node->ss.ps.instrument || pcxt->nworkers == 0) + return; + + size = add_size(offsetof(SharedBitmapHeapInstrumentation, sinstrument), + mul_size(pcxt->nworkers, sizeof(BitmapHeapScanInstrumentation))); + node->sinstrument = + (SharedBitmapHeapInstrumentation *) shm_toc_allocate(pcxt->toc, size); + + /* Each per-worker area must start out as zeroes */ + memset(node->sinstrument, 0, size); + node->sinstrument->num_workers = pcxt->nworkers; + shm_toc_insert(pcxt->toc, + node->ss.ps.plan->plan_node_id + + PARALLEL_KEY_SCAN_INSTRUMENT_OFFSET, + node->sinstrument); +} - node->pstate = (ParallelBitmapHeapState *) ptr; - ptr += MAXALIGN(sizeof(ParallelBitmapHeapState)); +/* + * Look up and save the location of the shared instrumentation. + */ +void +ExecBitmapHeapInstrumentInitWorker(BitmapHeapScanState *node, + ParallelWorkerContext *pwcxt) +{ + if (!node->ss.ps.instrument) + return; - if (node->ss.ps.instrument) - node->sinstrument = (SharedBitmapHeapInstrumentation *) ptr; + node->sinstrument = (SharedBitmapHeapInstrumentation *) + shm_toc_lookup(pwcxt->toc, + node->ss.ps.plan->plan_node_id + + PARALLEL_KEY_SCAN_INSTRUMENT_OFFSET, + false); } /* ---------------------------------------------------------------- diff --git a/src/include/executor/nodeBitmapHeapscan.h b/src/include/executor/nodeBitmapHeapscan.h index 5d82f71abff..5c0b6592a1f 100644 --- a/src/include/executor/nodeBitmapHeapscan.h +++ b/src/include/executor/nodeBitmapHeapscan.h @@ -28,6 +28,12 @@ extern void ExecBitmapHeapReInitializeDSM(BitmapHeapScanState *node, ParallelContext *pcxt); extern void ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node, ParallelWorkerContext *pwcxt); +extern void ExecBitmapHeapInstrumentEstimate(BitmapHeapScanState *node, + ParallelContext *pcxt); +extern void ExecBitmapHeapInstrumentInitDSM(BitmapHeapScanState *node, + ParallelContext *pcxt); +extern void ExecBitmapHeapInstrumentInitWorker(BitmapHeapScanState *node, + ParallelWorkerContext *pwcxt); extern void ExecBitmapHeapRetrieveInstrumentation(BitmapHeapScanState *node); #endif /* NODEBITMAPHEAPSCAN_H */ -- 2.43.0
From 9ab70837dac963dc1ae6721d0f45dc2c9d3f398d Mon Sep 17 00:00:00 2001 From: Melanie Plageman <[email protected]> Date: Mon, 6 Apr 2026 20:06:49 -0400 Subject: [PATCH v11 2/2] Use add_size/mul_size for index instrumentation size calculations Use overflow-safe size arithmetic in the Index[Only]Scan and parallel instrumentation functions, consistent with other executor nodes (Hash, Sort, Agg, Memoize). This was an oversight in dd78e69cfc3. --- src/backend/executor/nodeIndexonlyscan.c | 8 ++++---- src/backend/executor/nodeIndexscan.c | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c index c4ff422756e..d52012e8a69 100644 --- a/src/backend/executor/nodeIndexonlyscan.c +++ b/src/backend/executor/nodeIndexonlyscan.c @@ -854,8 +854,8 @@ ExecIndexOnlyScanInstrumentEstimate(IndexOnlyScanState *node, * in the IndexOnlyScanState. We'll recalculate the needed size in * ExecIndexOnlyScanInstrumentInitDSM(). */ - size = offsetof(SharedIndexScanInstrumentation, winstrument) + - pcxt->nworkers * sizeof(IndexScanInstrumentation); + size = add_size(offsetof(SharedIndexScanInstrumentation, winstrument), + mul_size(pcxt->nworkers, sizeof(IndexScanInstrumentation))); shm_toc_estimate_chunk(&pcxt->estimator, size); shm_toc_estimate_keys(&pcxt->estimator, 1); } @@ -872,8 +872,8 @@ ExecIndexOnlyScanInstrumentInitDSM(IndexOnlyScanState *node, if (!node->ss.ps.instrument || pcxt->nworkers == 0) return; - size = offsetof(SharedIndexScanInstrumentation, winstrument) + - pcxt->nworkers * sizeof(IndexScanInstrumentation); + size = add_size(offsetof(SharedIndexScanInstrumentation, winstrument), + mul_size(pcxt->nworkers, sizeof(IndexScanInstrumentation))); node->ioss_SharedInfo = (SharedIndexScanInstrumentation *) shm_toc_allocate(pcxt->toc, size); diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c index 6cc927f2454..39f6691ee35 100644 --- a/src/backend/executor/nodeIndexscan.c +++ b/src/backend/executor/nodeIndexscan.c @@ -1789,8 +1789,8 @@ ExecIndexScanInstrumentEstimate(IndexScanState *node, * in the IndexScanState. We'll recalculate the needed size in * ExecIndexScanInstrumentInitDSM(). */ - size = offsetof(SharedIndexScanInstrumentation, winstrument) + - pcxt->nworkers * sizeof(IndexScanInstrumentation); + size = add_size(offsetof(SharedIndexScanInstrumentation, winstrument), + mul_size(pcxt->nworkers, sizeof(IndexScanInstrumentation))); shm_toc_estimate_chunk(&pcxt->estimator, size); shm_toc_estimate_keys(&pcxt->estimator, 1); } @@ -1807,8 +1807,8 @@ ExecIndexScanInstrumentInitDSM(IndexScanState *node, if (!node->ss.ps.instrument || pcxt->nworkers == 0) return; - size = offsetof(SharedIndexScanInstrumentation, winstrument) + - pcxt->nworkers * sizeof(IndexScanInstrumentation); + size = add_size(offsetof(SharedIndexScanInstrumentation, winstrument), + mul_size(pcxt->nworkers, sizeof(IndexScanInstrumentation))); node->iss_SharedInfo = (SharedIndexScanInstrumentation *) shm_toc_allocate(pcxt->toc, size); -- 2.43.0
