Re: Improving EXPLAIN's display of SubPlan nodes
Dean Rasheed writes: > Fair enough. I have no further comments. Pushed then. Thanks for reviewing! I gave you credit as co-author. regards, tom lane
Re: Improving EXPLAIN's display of SubPlan nodes
On Tue, 19 Mar 2024 at 21:40, Tom Lane wrote: > > I'm inclined to leave that alone. The actual source sub-SELECT > could only have had one column, so no real confusion is possible. > Yeah, there's a resjunk grouping column visible in the plan as well, > but those exist in many other queries, and we've not gotten questions > about them. > Fair enough. I have no further comments. Regards, Dean
Re: Improving EXPLAIN's display of SubPlan nodes
Dean Rasheed writes: > One final case that could possibly be improved is this one from > aggregates.out: > explain (verbose, costs off) > select array(select sum(x+y) s > from generate_series(1,3) y group by y order by s) > from generate_series(1,3) x; > QUERY PLAN > --- > Function Scan on pg_catalog.generate_series x >Output: ARRAY(SubPlan 1) >Function Call: generate_series(1, 3) >SubPlan 1 > -> Sort >Output: (sum((x.x + y.y))), y.y >Sort Key: (sum((x.x + y.y))) >-> HashAggregate > Output: sum((x.x + y.y)), y.y > Group Key: y.y > -> Function Scan on pg_catalog.generate_series y >Output: y.y >Function Call: generate_series(1, 3) > ARRAY operates on a SELECT with a single targetlist item, but in this > case it looks like the subplan output has 2 columns, which might > confuse people. I'm inclined to leave that alone. The actual source sub-SELECT could only have had one column, so no real confusion is possible. Yeah, there's a resjunk grouping column visible in the plan as well, but those exist in many other queries, and we've not gotten questions about them. (Perhaps some documentation about junk columns needs to be added? I'm not eager to write it though.) I had actually had a similar thought about sticking ".col1" onto EXPR_SUBLINK cases, but I concluded it was mostly pedantry. Nobody's likely to get confused. regards, tom lane
Re: Improving EXPLAIN's display of SubPlan nodes
On Tue, 19 Mar 2024 at 16:42, Tom Lane wrote: > > Here's a hopefully-final version that makes that adjustment and > tweaks a couple of comments. > This looks very good to me. One final case that could possibly be improved is this one from aggregates.out: explain (verbose, costs off) select array(select sum(x+y) s from generate_series(1,3) y group by y order by s) from generate_series(1,3) x; QUERY PLAN --- Function Scan on pg_catalog.generate_series x Output: ARRAY(SubPlan 1) Function Call: generate_series(1, 3) SubPlan 1 -> Sort Output: (sum((x.x + y.y))), y.y Sort Key: (sum((x.x + y.y))) -> HashAggregate Output: sum((x.x + y.y)), y.y Group Key: y.y -> Function Scan on pg_catalog.generate_series y Output: y.y Function Call: generate_series(1, 3) ARRAY operates on a SELECT with a single targetlist item, but in this case it looks like the subplan output has 2 columns, which might confuse people. I wonder if we should output "ARRAY((SubPlan 1).col1)" to make it clearer. Since ARRAY_SUBLINK is a special case, which always collects the first column's values, we could just always output "col1" for ARRAY. Regards, Dean
Re: Improving EXPLAIN's display of SubPlan nodes
I wrote: > I won't update the patch right now, but "(rescan SubPlan N)" > seems like a winner to me. Here's a hopefully-final version that makes that adjustment and tweaks a couple of comments. regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 58a603ac56..acbbf3b56c 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -3062,10 +3062,10 @@ select exists(select 1 from pg_enum), sum(c1) from ft1; QUERY PLAN -- Foreign Scan - Output: $0, (sum(ft1.c1)) + Output: (InitPlan 1).col1, (sum(ft1.c1)) Relations: Aggregate on (public.ft1) Remote SQL: SELECT sum("C 1") FROM "S 1"."T 1" - InitPlan 1 (returns $0) + InitPlan 1 -> Seq Scan on pg_catalog.pg_enum (6 rows) @@ -3080,8 +3080,8 @@ select exists(select 1 from pg_enum), sum(c1) from ft1 group by 1; QUERY PLAN --- GroupAggregate - Output: $0, sum(ft1.c1) - InitPlan 1 (returns $0) + Output: (InitPlan 1).col1, sum(ft1.c1) + InitPlan 1 -> Seq Scan on pg_catalog.pg_enum -> Foreign Scan on public.ft1 Output: ft1.c1 @@ -3305,10 +3305,10 @@ select sum(c1) filter (where (c1 / c1) * random() <= 1) from ft1 group by c2 ord explain (verbose, costs off) select sum(c2) filter (where c2 in (select c2 from ft1 where c2 < 5)) from ft1; -QUERY PLAN + QUERY PLAN +--- Aggregate - Output: sum(ft1.c2) FILTER (WHERE (hashed SubPlan 1)) + Output: sum(ft1.c2) FILTER (WHERE (ANY (ft1.c2 = (hashed SubPlan 1).col1))) -> Foreign Scan on public.ft1 Output: ft1.c2 Remote SQL: SELECT c2 FROM "S 1"."T 1" @@ -6171,9 +6171,9 @@ UPDATE ft2 AS target SET (c2, c7) = ( Update on public.ft2 target Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2, c7 = $3 WHERE ctid = $1 -> Foreign Scan on public.ft2 target - Output: $1, $2, (SubPlan 1 (returns $1,$2)), target.ctid, target.* + Output: (SubPlan 1).col1, (SubPlan 1).col2, (rescan SubPlan 1), target.ctid, target.* Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 1100)) FOR UPDATE - SubPlan 1 (returns $1,$2) + SubPlan 1 -> Foreign Scan on public.ft2 src Output: (src.c2 * 10), src.c7 Remote SQL: SELECT c2, c7 FROM "S 1"."T 1" WHERE (($1::integer = "C 1")) @@ -11685,9 +11685,9 @@ SELECT * FROM local_tbl t1 LEFT JOIN (SELECT *, (SELECT count(*) FROM async_pt W QUERY PLAN Nested Loop Left Join - Output: t1.a, t1.b, t1.c, async_pt.a, async_pt.b, async_pt.c, ($0) + Output: t1.a, t1.b, t1.c, async_pt.a, async_pt.b, async_pt.c, ((InitPlan 1).col1) Join Filter: (t1.a = async_pt.a) - InitPlan 1 (returns $0) + InitPlan 1 -> Aggregate Output: count(*) -> Append @@ -11699,10 +11699,10 @@ SELECT * FROM local_tbl t1 LEFT JOIN (SELECT *, (SELECT count(*) FROM async_pt W Output: t1.a, t1.b, t1.c -> Append -> Async Foreign Scan on public.async_p1 async_pt_1 - Output: async_pt_1.a, async_pt_1.b, async_pt_1.c, $0 + Output: async_pt_1.a, async_pt_1.b, async_pt_1.c, (InitPlan 1).col1 Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE ((a < 3000)) -> Async Foreign Scan on public.async_p2 async_pt_2 - Output: async_pt_2.a, async_pt_2.b, async_pt_2.c, $0 + Output: async_pt_2.a, async_pt_2.b, async_pt_2.c, (InitPlan 1).col1 Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE ((a < 3000)) (20 rows) @@ -11713,7 +11713,7 @@ SELECT * FROM local_tbl t1 LEFT JOIN (SELECT *, (SELECT count(*) FROM async_pt W Nested Loop Left Join (actual rows=1 loops=1) Join Filter: (t1.a = async_pt.a) Rows Removed by Join Filter: 399 - InitPlan 1 (returns $0) + InitPlan 1 -> Aggregate (actual rows=1 loops=1) -> Append (actual rows=400 loops=1) -> Async Foreign Scan on async_p1 async_pt_4 (actual rows=200 loops=1) @@ -11936,11 +11936,11 @@ CREATE FOREIGN TABLE foreign_tbl2 () INHERITS (foreign_tbl) SERVER loopback OPTIONS (table_name 'base_tbl'); EXPLAIN (VERBOSE, COSTS OFF) SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl);
Re: Improving EXPLAIN's display of SubPlan nodes
Dean Rasheed writes: > On Mon, 18 Mar 2024 at 23:19, Tom Lane wrote: >> After thinking a bit more, I understood what was bothering me about >> that notation: it looks too much like a call of a user-defined >> function named "rescan()". I think we'd be better off with the >> all-caps "RESCAN()". > Or perhaps move the parentheses, and write "(rescan SubPlan N)" or > "(reset SubPlan N)". Dunno. Oh, I like that! It seems rather parallel to the existing "hashed" annotation. If I had it to do over, I'd likely do the "hashed" bit differently --- but as the proposal currently stands, we are not changing "hashed", so we might as well double down on that. I won't update the patch right now, but "(rescan SubPlan N)" seems like a winner to me. regards, tom lane
Re: Improving EXPLAIN's display of SubPlan nodes
On Mon, 18 Mar 2024 at 23:19, Tom Lane wrote: > > > Hm. I used "rescan(SubPlan)" in the attached, but it still looks > > a bit odd to my eye. > > After thinking a bit more, I understood what was bothering me about > that notation: it looks too much like a call of a user-defined > function named "rescan()". I think we'd be better off with the > all-caps "RESCAN()". > Or perhaps move the parentheses, and write "(rescan SubPlan N)" or "(reset SubPlan N)". Dunno. Regards, Dean
Re: Improving EXPLAIN's display of SubPlan nodes
I wrote: > Hm. I used "rescan(SubPlan)" in the attached, but it still looks > a bit odd to my eye. After thinking a bit more, I understood what was bothering me about that notation: it looks too much like a call of a user-defined function named "rescan()". I think we'd be better off with the all-caps "RESCAN()". regards, tom lane
Re: Improving EXPLAIN's display of SubPlan nodes
Dean Rasheed writes: > The get_rule_expr() code could perhaps be simplified a bit, getting > rid of the show_subplan_name variable and moving the recursive calls > to get_rule_expr() to after the switch statement -- if testexpr is > non-NULL, print it, else print the subplan name probably works for all > subplan types. Oooh, good idea. The symmetry wasn't apparent when we started, but it's there now, and the code does look nicer this way. > The "colN" notation has grown on me, especially when you look at > examples like those in partition_prune.out with a mix of Param types. OK, I've left it like that in the attached v5, but I'm still open to other opinions. >> The undecorated reference to (SubPlan 1) is fairly confusing, since >> it doesn't correspond to anything that will actually get output. >> I suggest that perhaps instead this should read >> Output: (SubPlan 1).col1, (SubPlan 1).col2, IGNORE(SubPlan 1), i.tableoid, >> i.ctid >> or >> Output: (SubPlan 1).col1, (SubPlan 1).col2, RESET(SubPlan 1), i.tableoid, >> i.ctid > I think "RESET()" or "RESCAN()" or something like that is better than > "INGORE()", because it indicates that it is actually doing something. > I don't really have a better idea. Perhaps not all uppercase though, > since that seems to go against the rest of the EXPLAIN output. Hm. I used "rescan(SubPlan)" in the attached, but it still looks a bit odd to my eye. I did some more work on the documentation too, to show the difference between hashed and not-hashed subplans. I feel like we're pretty close here, with the possible exception of how to show MULTIEXPR. regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 58a603ac56..8d75ca56c9 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -3062,10 +3062,10 @@ select exists(select 1 from pg_enum), sum(c1) from ft1; QUERY PLAN -- Foreign Scan - Output: $0, (sum(ft1.c1)) + Output: (InitPlan 1).col1, (sum(ft1.c1)) Relations: Aggregate on (public.ft1) Remote SQL: SELECT sum("C 1") FROM "S 1"."T 1" - InitPlan 1 (returns $0) + InitPlan 1 -> Seq Scan on pg_catalog.pg_enum (6 rows) @@ -3080,8 +3080,8 @@ select exists(select 1 from pg_enum), sum(c1) from ft1 group by 1; QUERY PLAN --- GroupAggregate - Output: $0, sum(ft1.c1) - InitPlan 1 (returns $0) + Output: (InitPlan 1).col1, sum(ft1.c1) + InitPlan 1 -> Seq Scan on pg_catalog.pg_enum -> Foreign Scan on public.ft1 Output: ft1.c1 @@ -3305,10 +3305,10 @@ select sum(c1) filter (where (c1 / c1) * random() <= 1) from ft1 group by c2 ord explain (verbose, costs off) select sum(c2) filter (where c2 in (select c2 from ft1 where c2 < 5)) from ft1; -QUERY PLAN + QUERY PLAN +--- Aggregate - Output: sum(ft1.c2) FILTER (WHERE (hashed SubPlan 1)) + Output: sum(ft1.c2) FILTER (WHERE (ANY (ft1.c2 = (hashed SubPlan 1).col1))) -> Foreign Scan on public.ft1 Output: ft1.c2 Remote SQL: SELECT c2 FROM "S 1"."T 1" @@ -6171,9 +6171,9 @@ UPDATE ft2 AS target SET (c2, c7) = ( Update on public.ft2 target Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2, c7 = $3 WHERE ctid = $1 -> Foreign Scan on public.ft2 target - Output: $1, $2, (SubPlan 1 (returns $1,$2)), target.ctid, target.* + Output: (SubPlan 1).col1, (SubPlan 1).col2, rescan(SubPlan 1), target.ctid, target.* Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 1100)) FOR UPDATE - SubPlan 1 (returns $1,$2) + SubPlan 1 -> Foreign Scan on public.ft2 src Output: (src.c2 * 10), src.c7 Remote SQL: SELECT c2, c7 FROM "S 1"."T 1" WHERE (($1::integer = "C 1")) @@ -11685,9 +11685,9 @@ SELECT * FROM local_tbl t1 LEFT JOIN (SELECT *, (SELECT count(*) FROM async_pt W QUERY PLAN Nested Loop Left Join - Output: t1.a, t1.b, t1.c, async_pt.a, async_pt.b, async_pt.c, ($0) + Output: t1.a, t1.b, t1.c, async_pt.a, async_pt.b, async_pt.c, ((InitPlan 1).col1) Join Filter: (t1.a = async_pt.a) - InitPlan 1 (returns $0) + InitPlan 1 -> Aggregate Output: count(*) -> Append @@ -11699,10 +11699,10 @@ SELECT * FROM local_tbl t1 LEFT JOIN
Re: Improving EXPLAIN's display of SubPlan nodes
On Sun, 17 Mar 2024 at 19:39, Tom Lane wrote: > > Here's a cleaned-up version that seems to successfully resolve > PARAM_EXEC references in all cases. I haven't changed the > "(plan_name).colN" notation, but that's an easy fix once we have > consensus on the spelling. I took a more detailed look at this and the code and doc changes all look good to me. There's a comment at the end of find_param_generator() that should probably say "No generator found", rather than "No referent found". The get_rule_expr() code could perhaps be simplified a bit, getting rid of the show_subplan_name variable and moving the recursive calls to get_rule_expr() to after the switch statement -- if testexpr is non-NULL, print it, else print the subplan name probably works for all subplan types. The "colN" notation has grown on me, especially when you look at examples like those in partition_prune.out with a mix of Param types. Not using "$n" for 2 different purposes is good, and I much prefer this to the original "FROM [HASHED] SubPlan N (returns ...)" notation. > There are two other loose ends bothering me: > > 1. I see that Gather nodes sometimes print things like > >-> Gather (actual rows=N loops=N) > Workers Planned: 2 > Params Evaluated: $0, $1 > Workers Launched: N > > I propose we nuke the "Params Evaluated" output altogether. +1 > 2. MULTIEXPR_SUBLINK subplans now result in EXPLAIN output like > >-> Result > Output: (SubPlan 1).col1, (SubPlan 1).col2, (SubPlan 1), i.tableoid, > i.ctid > > The undecorated reference to (SubPlan 1) is fairly confusing, since > it doesn't correspond to anything that will actually get output. > I suggest that perhaps instead this should read > > Output: (SubPlan 1).col1, (SubPlan 1).col2, IGNORE(SubPlan 1), > i.tableoid, i.ctid > > or > > Output: (SubPlan 1).col1, (SubPlan 1).col2, RESET(SubPlan 1), > i.tableoid, i.ctid I think "RESET()" or "RESCAN()" or something like that is better than "INGORE()", because it indicates that it is actually doing something. I don't really have a better idea. Perhaps not all uppercase though, since that seems to go against the rest of the EXPLAIN output. Regards, Dean
Re: Improving EXPLAIN's display of SubPlan nodes
I wrote: > Dean Rasheed writes: >> Overall, I think this is heading in the right direction. I think we >> just need a good way to say "the n'th output column of the subplan", >> that can't be confused with anything else in the output. > We could consider notations like "(SubPlan 1 column 2)", which > couldn't be confused with anything else, if only because a name > like that would have to be double-quoted. It's a little more > verbose but not that much. I fear "(SubPlan 1 col 2)" is too > short to be clear. Here's a cleaned-up version that seems to successfully resolve PARAM_EXEC references in all cases. I haven't changed the "(plan_name).colN" notation, but that's an easy fix once we have consensus on the spelling. There are two other loose ends bothering me: 1. I see that Gather nodes sometimes print things like -> Gather (actual rows=N loops=N) Workers Planned: 2 Params Evaluated: $0, $1 Workers Launched: N This now sticks out like a sore thumb, because there's no other reference to $0 or $1 in the EXPLAIN output. We could possibly adjust the code to print something like Params Evaluated: (InitPlan 1).col1, (InitPlan 2).col1 but I think that's pretty silly. This looks to me like a code debugging aid that shouldn't have survived past initial development. It's of zero use to end users, and it doesn't correspond to anything we bother to mention in EXPLAIN output in any other case: initplans just magically get evaluated at the right times. I propose we nuke the "Params Evaluated" output altogether. 2. MULTIEXPR_SUBLINK subplans now result in EXPLAIN output like explain (verbose, costs off) update inhpar i set (f1, f2) = (select i.f1, i.f2 || '-' from int4_tbl limit 1); ... -> Result Output: (SubPlan 1).col1, (SubPlan 1).col2, (SubPlan 1), i.tableoid, i.ctid The undecorated reference to (SubPlan 1) is fairly confusing, since it doesn't correspond to anything that will actually get output. I suggest that perhaps instead this should read Output: (SubPlan 1).col1, (SubPlan 1).col2, IGNORE(SubPlan 1), i.tableoid, i.ctid or Output: (SubPlan 1).col1, (SubPlan 1).col2, RESET(SubPlan 1), i.tableoid, i.ctid the point of "RESET()" being that what the executor actually does there is to re-arm the SubPlan to be evaluated again on the next pass through the targetlist. I'm not greatly in love with either of those ideas, though. Any thoughts? regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 58a603ac56..2609255da6 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -3062,10 +3062,10 @@ select exists(select 1 from pg_enum), sum(c1) from ft1; QUERY PLAN -- Foreign Scan - Output: $0, (sum(ft1.c1)) + Output: (InitPlan 1).col1, (sum(ft1.c1)) Relations: Aggregate on (public.ft1) Remote SQL: SELECT sum("C 1") FROM "S 1"."T 1" - InitPlan 1 (returns $0) + InitPlan 1 -> Seq Scan on pg_catalog.pg_enum (6 rows) @@ -3080,8 +3080,8 @@ select exists(select 1 from pg_enum), sum(c1) from ft1 group by 1; QUERY PLAN --- GroupAggregate - Output: $0, sum(ft1.c1) - InitPlan 1 (returns $0) + Output: (InitPlan 1).col1, sum(ft1.c1) + InitPlan 1 -> Seq Scan on pg_catalog.pg_enum -> Foreign Scan on public.ft1 Output: ft1.c1 @@ -3305,10 +3305,10 @@ select sum(c1) filter (where (c1 / c1) * random() <= 1) from ft1 group by c2 ord explain (verbose, costs off) select sum(c2) filter (where c2 in (select c2 from ft1 where c2 < 5)) from ft1; -QUERY PLAN + QUERY PLAN +--- Aggregate - Output: sum(ft1.c2) FILTER (WHERE (hashed SubPlan 1)) + Output: sum(ft1.c2) FILTER (WHERE (ANY (ft1.c2 = (hashed SubPlan 1).col1))) -> Foreign Scan on public.ft1 Output: ft1.c2 Remote SQL: SELECT c2 FROM "S 1"."T 1" @@ -6171,9 +6171,9 @@ UPDATE ft2 AS target SET (c2, c7) = ( Update on public.ft2 target Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2, c7 = $3 WHERE ctid = $1 -> Foreign Scan on public.ft2 target - Output: $1, $2, (SubPlan 1 (returns $1,$2)), target.ctid, target.* + Output: (SubPlan 1).col1, (SubPlan 1).col2, (SubPlan 1), target.ctid, target.* Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 1100)) FOR UPDATE - SubPlan 1 (returns $1,$2) + SubPlan 1 ->
Re: Improving EXPLAIN's display of SubPlan nodes
Dean Rasheed writes: > On Sat, 16 Mar 2024 at 17:25, Tom Lane wrote: >> After looking at your draft some more, it occurred to me that we're >> not that far from getting rid of showing "$n" entirely in this >> context. Instead of (subplan_name).$n, we could write something like >> (subplan_name).colN or (subplan_name).columnN or (subplan_name).fN, > I think it would be confusing if there were tables whose columns are > named "colN". In that case, a SQL qual like "t1.col2 = t2.col2" might > be output as something like "t1.col2 = (SubPlan 1).col3", since the > subplan's targetlist wouldn't necessarily just output the table > columns in order. Perhaps. I think people who are using columns named like that are already accustomed to having to pay close attention to which table the column is shown as qualified by. So I'm not sure there'd really be much problem in practice. > I actually think "$n" is not so bad (especially if we make n the > column number). The fact that it's qualified by the subplan name ought > to be sufficient to avoid it being confused with an external > parameter. That's an interesting compromise position, but I'm not sure that it buys much compared to the approach shown in your draft (that is, continuing to use the real param IDs). The real IDs at least have the advantage of being unique. > Whatever name is chosen, I think we should still output "(returns > ...)" on the subplan nodes. We should do that if we continue to show real param IDs, but if we change to using column numbers then I think it's pointless. Output like SubPlan 1 (returns $1,$2) ... SubPlan 2 (returns $1,$2,$3) seems to me that it'd be more confusing not less so. Does SubPlan 2 return the same values as SubPlan 1 plus more? > Overall, I think this is heading in the right direction. I think we > just need a good way to say "the n'th output column of the subplan", > that can't be confused with anything else in the output. We could consider notations like "(SubPlan 1 column 2)", which couldn't be confused with anything else, if only because a name like that would have to be double-quoted. It's a little more verbose but not that much. I fear "(SubPlan 1 col 2)" is too short to be clear. regards, tom lane
Re: Improving EXPLAIN's display of SubPlan nodes
On Sat, 16 Mar 2024 at 17:25, Tom Lane wrote: > > After looking at your draft some more, it occurred to me that we're > not that far from getting rid of showing "$n" entirely in this > context. Instead of (subplan_name).$n, we could write something like > (subplan_name).colN or (subplan_name).columnN or (subplan_name).fN, > depending on your taste for verboseness. "fN" would correspond to the > names we assign to columns of anonymous record types, but it hasn't > got much else to recommend it. In the attached I used "colN"; > "columnN" would be my next choice. Using the column number rather than the parameter index looks a lot neater, especially in output with multiple subplans. Of those choices, "colN" looks nicest, however... I think it would be confusing if there were tables whose columns are named "colN". In that case, a SQL qual like "t1.col2 = t2.col2" might be output as something like "t1.col2 = (SubPlan 1).col3", since the subplan's targetlist wouldn't necessarily just output the table columns in order. I actually think "$n" is not so bad (especially if we make n the column number). The fact that it's qualified by the subplan name ought to be sufficient to avoid it being confused with an external parameter. Maybe there are other options, but I think it's important to choose something that's unlikely to be confused with a real column name. Whatever name is chosen, I think we should still output "(returns ...)" on the subplan nodes. In a complex query there might be a lot of output columns, and the expressions might be quite complex, making it hard to see how many columns the subplan is returning. Besides, without that, it might not be obvious to everyone what "colN" (or whatever we settle on) means in places that refer to the subplan. > You could also imagine trying to use the sub-SELECT's actual output > column names, but I fear that would be ambiguous --- too often it'd > be "?column?" or some other non-unique name. Yeah, I think that's a non-starter, because the output column names aren't necessarily unique. > The attached proof-of-concept is incomplete: it fails to replace some > $n occurrences with subplan references, as is visible in some of the > test cases. I believe your quick hack in get_parameter() is not > correct in detail, but for the moment I didn't bother to debug it. Yeah, that's exactly what it was, a quick hack. I just wanted to get some output to see what it would look like in a few real cases. Overall, I think this is heading in the right direction. I think we just need a good way to say "the n'th output column of the subplan", that can't be confused with anything else in the output. Regards, Dean
Re: Improving EXPLAIN's display of SubPlan nodes
I wrote: > Dean Rasheed writes: >> One thing that concerns me about making even greater use of "$n" is >> the potential for confusion with generic plan parameters. > True. After looking at your draft some more, it occurred to me that we're not that far from getting rid of showing "$n" entirely in this context. Instead of (subplan_name).$n, we could write something like (subplan_name).colN or (subplan_name).columnN or (subplan_name).fN, depending on your taste for verboseness. "fN" would correspond to the names we assign to columns of anonymous record types, but it hasn't got much else to recommend it. In the attached I used "colN"; "columnN" would be my next choice. You could also imagine trying to use the sub-SELECT's actual output column names, but I fear that would be ambiguous --- too often it'd be "?column?" or some other non-unique name. > Hmm. I guess what bothers me about that is that it could be read to > suggest that the initplan or subplan is evaluated again for each > output parameter. This objection seems like it could be solved through documentation, so I wrote some. The attached proof-of-concept is incomplete: it fails to replace some $n occurrences with subplan references, as is visible in some of the test cases. I believe your quick hack in get_parameter() is not correct in detail, but for the moment I didn't bother to debug it. I'm just presenting this as a POC to see if this is the direction people would like to go in. If there's not objections, I'll try to make a bulletproof implementation. regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 58a603ac56..585ba60ae3 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -3062,10 +3062,10 @@ select exists(select 1 from pg_enum), sum(c1) from ft1; QUERY PLAN -- Foreign Scan - Output: $0, (sum(ft1.c1)) + Output: (InitPlan 1).col1, (sum(ft1.c1)) Relations: Aggregate on (public.ft1) Remote SQL: SELECT sum("C 1") FROM "S 1"."T 1" - InitPlan 1 (returns $0) + InitPlan 1 -> Seq Scan on pg_catalog.pg_enum (6 rows) @@ -3080,8 +3080,8 @@ select exists(select 1 from pg_enum), sum(c1) from ft1 group by 1; QUERY PLAN --- GroupAggregate - Output: $0, sum(ft1.c1) - InitPlan 1 (returns $0) + Output: (InitPlan 1).col1, sum(ft1.c1) + InitPlan 1 -> Seq Scan on pg_catalog.pg_enum -> Foreign Scan on public.ft1 Output: ft1.c1 @@ -3305,10 +3305,10 @@ select sum(c1) filter (where (c1 / c1) * random() <= 1) from ft1 group by c2 ord explain (verbose, costs off) select sum(c2) filter (where c2 in (select c2 from ft1 where c2 < 5)) from ft1; -QUERY PLAN + QUERY PLAN +--- Aggregate - Output: sum(ft1.c2) FILTER (WHERE (hashed SubPlan 1)) + Output: sum(ft1.c2) FILTER (WHERE (ANY (ft1.c2 = (hashed SubPlan 1).col1))) -> Foreign Scan on public.ft1 Output: ft1.c2 Remote SQL: SELECT c2 FROM "S 1"."T 1" @@ -6171,9 +6171,9 @@ UPDATE ft2 AS target SET (c2, c7) = ( Update on public.ft2 target Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2, c7 = $3 WHERE ctid = $1 -> Foreign Scan on public.ft2 target - Output: $1, $2, (SubPlan 1 (returns $1,$2)), target.ctid, target.* + Output: $1, $2, (SubPlan 1), target.ctid, target.* Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 1100)) FOR UPDATE - SubPlan 1 (returns $1,$2) + SubPlan 1 -> Foreign Scan on public.ft2 src Output: (src.c2 * 10), src.c7 Remote SQL: SELECT c2, c7 FROM "S 1"."T 1" WHERE (($1::integer = "C 1")) @@ -11687,7 +11687,7 @@ SELECT * FROM local_tbl t1 LEFT JOIN (SELECT *, (SELECT count(*) FROM async_pt W Nested Loop Left Join Output: t1.a, t1.b, t1.c, async_pt.a, async_pt.b, async_pt.c, ($0) Join Filter: (t1.a = async_pt.a) - InitPlan 1 (returns $0) + InitPlan 1 -> Aggregate Output: count(*) -> Append @@ -11713,7 +11713,7 @@ SELECT * FROM local_tbl t1 LEFT JOIN (SELECT *, (SELECT count(*) FROM async_pt W Nested Loop Left Join (actual rows=1 loops=1) Join Filter: (t1.a = async_pt.a) Rows Removed by Join Filter: 399 - InitPlan 1 (returns $0) + InitPlan 1 -> Aggregate (actual rows=1 loops=1) -> Append (actual rows=400 loops=1) -> Async Foreign Scan on async_p1 async_pt_4
Re: Improving EXPLAIN's display of SubPlan nodes
Dean Rasheed writes: > One thing that concerns me about making even greater use of "$n" is > the potential for confusion with generic plan parameters. True. > Another possibility is to put the SubPlan and InitPlan names inline, > rather than outputting "FROM SubPlan ...". I had a go at hacking that > up and this was the result: >Output: (($3 = (InitPlan 1).$0) AND ($4 = (InitPlan 1).$1) AND > ((($1 = (SubPlan 2).$3) AND ($2 = (SubPlan 2).$4 Hmm. I guess what bothers me about that is that it could be read to suggest that the initplan or subplan is evaluated again for each output parameter. Perhaps it'll be sufficiently clear as long as we keep the labeling >InitPlan 1 (returns $0,$1) >SubPlan 2 (returns $3,$4) but I'm not sure. Anybody else have an opinion? (I didn't read your changes to the code yet --- I think at this point we can just debate proposed output without worrying about how to implement it.) regards, tom lane
Re: Improving EXPLAIN's display of SubPlan nodes
On Fri, 16 Feb 2024 at 19:39, Tom Lane wrote: > > > So now I'm thinking that we do have enough detail in the present > > proposal, and we just need to think about whether there's some > > nicer way to present it than the particular spelling I used here. > One thing that concerns me about making even greater use of "$n" is the potential for confusion with generic plan parameters. Maybe it's always possible to work out which is which from context, but still it looks messy: drop table if exists foo; create table foo(id int, x int, y int); explain (verbose, costs off, generic_plan) select row($3,$4) = (select x,y from foo where id=y) and row($1,$2) = (select min(x+y),max(x+y) from generate_series(1,3) x) from generate_series(1,3) y; QUERY PLAN --- Function Scan on pg_catalog.generate_series y Output: (($3 = $0) AND ($4 = $1) AND (ROWCOMPARE (($1 = $3) AND ($2 = $4)) FROM SubPlan 2 (returns $3,$4))) Function Call: generate_series(1, 3) InitPlan 1 (returns $0,$1) -> Seq Scan on public.foo Output: foo.x, foo.y Filter: (foo.id = foo.y) SubPlan 2 (returns $3,$4) -> Aggregate Output: min((x.x + y.y)), max((x.x + y.y)) -> Function Scan on pg_catalog.generate_series x Output: x.x Function Call: generate_series(1, 3) Another odd thing about that is the inconsistency between how the SubPlan and InitPlan expressions are displayed. I think "ROWCOMPARE" is really just an internal detail that could be omitted without losing anything. But the "FROM SubPlan ..." is useful to work out where it's coming from. Should it also output "FROM InitPlan ..."? I think that would risk making it harder to read. Another possibility is to put the SubPlan and InitPlan names inline, rather than outputting "FROM SubPlan ...". I had a go at hacking that up and this was the result: QUERY PLAN --- Function Scan on pg_catalog.generate_series y Output: (($3 = (InitPlan 1).$0) AND ($4 = (InitPlan 1).$1) AND ((($1 = (SubPlan 2).$3) AND ($2 = (SubPlan 2).$4 Function Call: generate_series(1, 3) InitPlan 1 (returns $0,$1) -> Seq Scan on public.foo Output: foo.x, foo.y Filter: (foo.id = foo.y) SubPlan 2 (returns $3,$4) -> Aggregate Output: min((x.x + y.y)), max((x.x + y.y)) -> Function Scan on pg_catalog.generate_series x Output: x.x Function Call: generate_series(1, 3) It's a little more verbose in this case, but in a lot of other cases it ended up being more compact. The code is a bit messy, but I think the regression test output (attached) is clearer and easier to interpret. SubPlans and InitPlans are displayed consistently, and it's easier to distinguish SubPlan/InitPlan outputs from external parameters. There are a few more regression test changes, corresponding to cases where InitPlans are referenced, such as: Seq Scan on document - Filter: ((dlevel <= $0) AND f_leak(dtitle)) + Filter: ((dlevel <= (InitPlan 1).$0) AND f_leak(dtitle)) InitPlan 1 (returns $0) -> Index Scan using uaccount_pkey on uaccount Index Cond: (pguser = CURRENT_USER) but I think that's useful extra clarification. Regards, Dean diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out new file mode 100644 index c355e8f..f0ff936 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -3021,7 +3021,7 @@ select exists(select 1 from pg_enum), su QUERY PLAN -- Foreign Scan - Output: $0, (sum(ft1.c1)) + Output: (InitPlan 1).$0, (sum(ft1.c1)) Relations: Aggregate on (public.ft1) Remote SQL: SELECT sum("C 1") FROM "S 1"."T 1" InitPlan 1 (returns $0) @@ -3039,7 +3039,7 @@ select exists(select 1 from pg_enum), su QUERY PLAN --- GroupAggregate - Output: $0, sum(ft1.c1) + Output: (InitPlan 1).$0, sum(ft1.c1) InitPlan 1 (returns $0) -> Seq Scan on pg_catalog.pg_enum -> Foreign Scan on public.ft1 @@ -3264,14 +3264,14 @@ select sum(c1) filter (where (c1 / c1) * explain (verbose, costs off) select sum(c2) filter (where c2 in (select c2 from ft1 where c2 < 5)) from ft1; -QUERY PLAN + QUERY PLAN
Re: Improving EXPLAIN's display of SubPlan nodes
> On 17 Feb 2024, at 00:38, Tom Lane wrote: > > Here's a rebase over 9f1337639 --- no code changes, but this affects > some of the new or changed expected outputs from that commit. Aleksander, as long as your was reviewing this previously, I’ve added you to reviewers of this CF entry [0]. Please, ping me or remove yourself, it it’s actually not a case. BTW, as long as there’s a new version and some input from Tom, would you be willing to post fleshier review? Thanks for working on this! Best regards, Andrey Borodin. [0] https://commitfest.postgresql.org/47/4782/
Re: Improving EXPLAIN's display of SubPlan nodes
I wrote: > So now I'm thinking that we do have enough detail in the present > proposal, and we just need to think about whether there's some > nicer way to present it than the particular spelling I used here. Here's a rebase over 9f1337639 --- no code changes, but this affects some of the new or changed expected outputs from that commit. regards, tom lane From f84343864e74501df627f986e326f766072398cd Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Fri, 16 Feb 2024 14:32:23 -0500 Subject: [PATCH v2] Improve EXPLAIN's display of SubPlan nodes. Represent the SubLinkType as best we can, and show the testexpr where relevant. To aid in interpreting testexprs, add the output parameter IDs to the subplan name for subplans as well as initplans. Discussion: https://postgr.es/m/2838538.1705692...@sss.pgh.pa.us --- .../postgres_fdw/expected/postgres_fdw.out| 16 +- src/backend/optimizer/plan/subselect.c| 30 ++-- src/backend/utils/adt/ruleutils.c | 45 - src/test/regress/expected/aggregates.out | 2 +- src/test/regress/expected/insert_conflict.out | 2 +- src/test/regress/expected/join.out| 24 +-- src/test/regress/expected/memoize.out | 2 +- src/test/regress/expected/rowsecurity.out | 56 +++--- src/test/regress/expected/select_parallel.out | 22 +-- src/test/regress/expected/subselect.out | 159 -- src/test/regress/expected/updatable_views.out | 24 +-- src/test/regress/sql/subselect.sql| 14 ++ 12 files changed, 255 insertions(+), 141 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index c355e8f3f7..bf067738e3 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -3264,14 +3264,14 @@ select sum(c1) filter (where (c1 / c1) * random() <= 1) from ft1 group by c2 ord explain (verbose, costs off) select sum(c2) filter (where c2 in (select c2 from ft1 where c2 < 5)) from ft1; -QUERY PLAN + QUERY PLAN +- Aggregate - Output: sum(ft1.c2) FILTER (WHERE (hashed SubPlan 1)) + Output: sum(ft1.c2) FILTER (WHERE (ANY (ft1.c2 = $0) FROM HASHED SubPlan 1 (returns $0))) -> Foreign Scan on public.ft1 Output: ft1.c2 Remote SQL: SELECT c2 FROM "S 1"."T 1" - SubPlan 1 + SubPlan 1 (returns $0) -> Foreign Scan on public.ft1 ft1_1 Output: ft1_1.c2 Remote SQL: SELECT c2 FROM "S 1"."T 1" WHERE ((c2 < 5)) @@ -11895,12 +11895,12 @@ CREATE FOREIGN TABLE foreign_tbl2 () INHERITS (foreign_tbl) SERVER loopback OPTIONS (table_name 'base_tbl'); EXPLAIN (VERBOSE, COSTS OFF) SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl); - QUERY PLAN -- +QUERY PLAN +-- Seq Scan on public.base_tbl Output: base_tbl.a - Filter: (SubPlan 1) - SubPlan 1 + Filter: (ANY ((base_tbl.a = $1) AND ((random() > '0'::double precision) = $2)) FROM SubPlan 1 (returns $1,$2)) + SubPlan 1 (returns $1,$2) -> Result Output: base_tbl.a, (random() > '0'::double precision) -> Append diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 47e14723d2..d5919a9164 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -326,6 +326,7 @@ build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot, Node *result; SubPlan*splan; bool isInitPlan; + StringInfoData splanname; ListCell *lc; /* @@ -560,22 +561,31 @@ build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot, splan->plan_id); /* Label the subplan for EXPLAIN purposes */ - splan->plan_name = palloc(32 + 12 * list_length(splan->setParam)); - sprintf(splan->plan_name, "%s %d", - isInitPlan ? "InitPlan" : "SubPlan", - splan->plan_id); + initStringInfo(); + appendStringInfo(, "%s %d", + isInitPlan ? "InitPlan" : "SubPlan", + splan->plan_id); if (splan->setParam) { - char *ptr = splan->plan_name + strlen(splan->plan_name); - - ptr += sprintf(ptr, " (returns "); + appendStringInfoString(, " (returns "); foreach(lc, splan->setParam) { - ptr += sprintf(ptr, "$%d%s", - lfirst_int(lc), -
Re: Improving EXPLAIN's display of SubPlan nodes
I wrote: > The main thing that's still missing compared to what is in the plan > data structure is information about which Param is which. I think > we have the subplan output Params relatively well covered through > the expedient of listing them in the generated plan_name, but it's > still not apparent to me how we could shoehorn subplan input > Params into this (or whether it's worth doing). Actually ... it looks like it probably isn't worth doing, because it's already the case that we don't expose input Params as such. EXPLAIN searches for the referent of an input Param and displays that (cf find_param_referent()). Just for experimental purposes, I wrote a follow-on patch to add printout of the parParam and args list (attached, as .txt so the cfbot doesn't think it's a patch). This produces results like explain (verbose, costs off) select array(select sum(x+y) s from generate_series(1,3) y group by y order by s) from generate_series(1,3) x; QUERY PLAN --- Function Scan on pg_catalog.generate_series x Output: ARRAY(SubPlan 1 PASSING $0 := x.x) ^ added by delta patch Function Call: generate_series(1, 3) SubPlan 1 -> Sort Output: (sum((x.x + y.y))), y.y Sort Key: (sum((x.x + y.y))) -> HashAggregate Output: sum((x.x + y.y)), y.y ^^^ actual reference to $0 Group Key: y.y -> Function Scan on pg_catalog.generate_series y Output: y.y Function Call: generate_series(1, 3) (13 rows) As you can see, it's not necessary to explain what $0 is because that name isn't shown anywhere else --- the references to "x.x" in the subplan are actually uses of $0. So now I'm thinking that we do have enough detail in the present proposal, and we just need to think about whether there's some nicer way to present it than the particular spelling I used here. One idea I considered briefly is to pull the same trick with regards to output parameters --- that is, instead of adding all the "returns $n" annotations to subplans, drill down and print the subplan's relevant targetlist expression instead of "$n". On balance I think that might be more confusing not less so, though. SQL users are used to the idea that a sub-select can "see" variables from the outer query, but not at all vice-versa. I think it probably wouldn't be formally ambiguous, because ruleutils already de-duplicates table aliases across the whole tree, but it still seems likely to be confusing. Also, people are already pretty used to seeing $n to represent the outputs of InitPlans, and I've not heard many complaints suggesting that we should change that. regards, tom lane diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 9ee35640ba..b3da98a2f7 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -8911,9 +8911,30 @@ get_rule_expr(Node *node, deparse_context *context, break; } if (subplan->useHashTable) - appendStringInfo(buf, "HASHED %s)", subplan->plan_name); + appendStringInfo(buf, "HASHED %s", subplan->plan_name); else - appendStringInfo(buf, "%s)", subplan->plan_name); + appendStringInfo(buf, "%s", subplan->plan_name); + if (subplan->parParam) + { + ListCell *lc3; + ListCell *lc4; + bool first = true; + + appendStringInfoString(buf, " PASSING "); + forboth(lc3, subplan->parParam, lc4, subplan->args) + { + int paramid = lfirst_int(lc3); + Node *arg = (Node *) lfirst(lc4); + + if (first) + first = false; + else + appendStringInfoString(buf, ", "); + appendStringInfo(buf, "$%d := ", paramid); + get_rule_expr(arg, context, showimplicit); + } + } +
Re: Improving EXPLAIN's display of SubPlan nodes
Hi Aleksander and Tom I do confirm that I requested to get this information, in order to recover the formula to filter on. Thanks to both of you Chantal Le 22/01/2024 à 18:07, Tom Lane a écrit : Aleksander Alekseev writes: Although something like: ``` + Filter: (ANY (base_tbl.a = $1) FROM SubPlan 1 (returns $1)) + SubPlan 1 (returns $1) ``` ... arguably doesn't give much more information to the user comparing to what we have now: ``` - Filter: (SubPlan 1) - SubPlan 1 ``` Yeah, I would probably not have thought to do this on my own; but we had an actual user request for it. I think arguably the main benefit is to confirm "yes, this is the sub-select you think it is". The main thing that's still missing compared to what is in the plan data structure is information about which Param is which. I think we have the subplan output Params relatively well covered through the expedient of listing them in the generated plan_name, but it's still not apparent to me how we could shoehorn subplan input Params into this (or whether it's worth doing). regards, tom lane
Re: Improving EXPLAIN's display of SubPlan nodes
Aleksander Alekseev writes: > Although something like: > ``` > + Filter: (ANY (base_tbl.a = $1) FROM SubPlan 1 (returns $1)) > + SubPlan 1 (returns $1) > ``` > ... arguably doesn't give much more information to the user comparing > to what we have now: > ``` > - Filter: (SubPlan 1) > - SubPlan 1 > ``` Yeah, I would probably not have thought to do this on my own; but we had an actual user request for it. I think arguably the main benefit is to confirm "yes, this is the sub-select you think it is". The main thing that's still missing compared to what is in the plan data structure is information about which Param is which. I think we have the subplan output Params relatively well covered through the expedient of listing them in the generated plan_name, but it's still not apparent to me how we could shoehorn subplan input Params into this (or whether it's worth doing). regards, tom lane
Re: Improving EXPLAIN's display of SubPlan nodes
Hi, > EXPLAIN has always been really poor at displaying SubPlan nodes > in expressions: you don't get much more than "(SubPlan N)". > This is mostly because every time I thought about it, I despaired > of trying to represent all the information in a SubPlan accurately. > However, a recent discussion[1] made me realize that we could do > a lot better just by displaying the SubLinkType and the testexpr > (if relevant). So here's a proposed patch. You can see what > it does by examining the regression test changes. > > There's plenty of room to bikeshed about exactly how to display > this stuff, and I'm open to suggestions. > > BTW, I was somewhat depressed to discover that we have exactly > zero regression coverage of the ROWCOMPARE_SUBLINK code paths; > not only was EXPLAIN output not covered, but the planner and > executor too. So I added some simple tests for that. Otherwise > I think existing coverage is enough for this. I reviewed the code and tested the patch on MacOS. It looks good to me. Although something like: ``` + Filter: (ANY (base_tbl.a = $1) FROM SubPlan 1 (returns $1)) + SubPlan 1 (returns $1) ``` ... arguably doesn't give much more information to the user comparing to what we have now: ``` - Filter: (SubPlan 1) - SubPlan 1 ``` ... I believe this is the right step toward more detailed EXPLAINs, and perhaps could be useful for debugging and/or educational purposes. Also the patch improves code coverage. -- Best regards, Aleksander Alekseev