On Fri, Sep 26, 2025 at 2:12 PM Tom Lane <[email protected]> wrote: > Maciek Sakrejda <[email protected]> writes: > > The page you link says > > > In some query plans, it is possible for a subplan node to be > > executed more than once. For example, the inner index scan will be > > executed once per outer row in the above nested-loop plan. In such > > cases, the loops value reports the total number of executions of the > > node, and the actual time and rows values shown are averages > > per-execution. This is done to make the numbers comparable with the > > way that the cost estimates are shown. Multiply by the loops value to > > get the total time actually spent in the node. In the above example, > > we spent a total of 0.030 milliseconds executing the index scans on > > tenk2. > > > in the second paragraph after the example in this section. Do you > > think that's not sufficiently clear? > > It's not wrong, but it feels a little incomplete now. Maybe change > the last two sentences to > > Multiply by the loops value to get the total time actually spent in > the node and the total number of rows processed by the node across all > executions. In the above example, we spent a total of 0.030 > milliseconds executing the index scans on tenk2, and they handled a > total of 10 rows. > > A bigger gap in perform.sgml is that it doesn't address parallel > query cases at all AFAICS. I think that was one of the main drivers > of this change, so it feels a little sad that it's not covered here.
Fair point. I included your proposed change and took a stab at briefly covering parallelism in the attached (admittedly, my understanding of how that works is a little shaky, so apologies if I'm way off on some of this). However, to get a parallel query in the regression database (I chose EXPLAIN ANALYZE SELECT * FROM tenk2), I had to change some settings: SET min_parallel_table_scan_size = 0; SET parallel_tuple_cost = 0; SET parallel_setup_cost = 0; Should I mention that in the example? Or should I generate a bigger table so using these is not necessary? If we say nothing and use the example, I think it may be confusing if someone wants to use the example as a starting point for their own exploration of how this works. Or is there a better query that works out of the box and does not need changes to the settings? It also seems like the EXPLAIN ANALYZE section is getting a little unwieldy. Should we subdivide it, or is this still okay? Thanks, Maciek
From 1c777b07ff1c085884f273dfd40290a06921e224 Mon Sep 17 00:00:00 2001 From: Maciek Sakrejda <[email protected]> Date: Fri, 26 Sep 2025 17:17:41 -0700 Subject: [PATCH v1] Improve EXPLAIN docs - clarify fractional rows in loops - add some basic info on parallel execution --- doc/src/sgml/perform.sgml | 80 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 76 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index 106583fb296..82b71571b86 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -756,12 +756,84 @@ WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; <literal>loops</literal> value reports the total number of executions of the node, and the actual time and rows values shown are averages per-execution. This is done to make the numbers - comparable with the way that the cost estimates are shown. Multiply by - the <literal>loops</literal> value to get the total time actually spent in - the node. In the above example, we spent a total of 0.030 milliseconds - executing the index scans on <literal>tenk2</literal>. + comparable with the way that the cost estimates are shown. Multiply by the + <literal>loops</literal> value to get the total time actually spent in the + node and the total number of rows processed by the node across all + executions. In the above example, we spent a total of 0.030 + milliseconds executing the index scans on <literal>tenk2</literal>, and + they handled a total of 10 rows. </para> + <para> + Parallel execution will also cause nodes to be executed more than once. + This is also indicated with the <literal>loops</literal> value: + </para> + +<screen> +EXPLAIN ANALYZE SELECT * FROM tenk2; + QUERY PLAN +--------------------------------------------------------------------------------------------------------------------------- + Gather (cost=0.00..386.67 rows=10000 width=244) (actual time=1.291..19.308 rows=10000.00 loops=1) + Workers Planned: 2 + Workers Launched: 2 + Buffers: shared hit=345 + -> Parallel Seq Scan on tenk2 (cost=0.00..386.67 rows=4167 width=244) (actual time=0.006..0.556 rows=3333.33 loops=3) + Buffers: shared hit=345 + Planning: + Buffers: shared hit=126 + Planning Time: 1.375 ms + Execution Time: 20.214 ms +(10 rows) +</screen> + + <para> + The sequential scan was executed three separate times in parallel: once + in the leader (since <xref linkend="guc-parallel-leader-participation"/> + is on by default), and once in each of the two launched workers. Similarly + to sequential repeated executions, rows and actual time are averages + per-worker. Multiply by the <literal>loops</literal> value to get the + total number of rows processed by the node across all workers. The total + time spent in all workers can be similarly calculated, but note this time + is spent concurrently, so it is not equivalent to total time spent in the + node. + </para> + + <para> + More detailed information about parallel execution is available with the + VERBOSE option: + </para> + +<screen> +EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM tenk2; + QUERY PLAN +------------------------------------------------------------------------------------------------------------------------------------------------------- + Gather (cost=0.00..386.67 rows=10000 width=244) (actual time=0.901..8.389 rows=10000.00 loops=1) + Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 + Workers Planned: 2 + Workers Launched: 2 + Buffers: shared hit=345 + -> Parallel Seq Scan on public.tenk2 (cost=0.00..386.67 rows=4167 width=244) (actual time=0.011..0.669 rows=3333.33 loops=3) + Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 + Buffers: shared hit=345 + Worker 0: actual time=0.009..0.071 rows=517.00 loops=1 + Buffers: shared hit=18 + Worker 1: actual time=0.009..0.062 rows=435.00 loops=1 + Buffers: shared hit=15 + Planning Time: 0.187 ms + Execution Time: 9.524 ms +(14 rows) +</screen> + + <para> + This shows actual time, rows, loops, and buffer usage for each worker. + When <xref linkend="guc-parallel-leader-participation"/> is on, the + corresponding stats for the leader can be calculated by subtracting the + sum of the workers' stats from the total. In the above example, the leader + spent 1.874 milliseconds (0.669 * 3 - 0.071 - 0.062) executing its portion + of the work and processed 9048 rows (3333.33 * 3 - 517.00 - 435.00, rounded + up). + </para> + <para> In some cases <command>EXPLAIN ANALYZE</command> shows additional execution statistics beyond the plan node execution times and row counts. -- 2.43.0
