Re: pg16: XX000: could not find pathkey item to sort
On Mon, 18 Mar 2024 at 18:50, Ashutosh Bapat wrote: > If the problem you speculate is different from this one, I am not able to see > it. It might help give an example query or explain more. I looked at this again and I might have been wrong about there being a problem. I set a breakpoint in create_gather_merge_path() and adjusted the startup and total cost to 1 when I saw the pathkeys containing {a,b}. It turns out this is the non-partitionwise aggregate path, and of course, the targetlist there does contain the "b" column, so it's fine in that case that the pathkeys are {a,b}. I had previously thought that this was for the partition-wise aggregate plan, in which case the targetlist would contain a, sum(b order by b), of which there's no single value of "b" that we can legally sort by. Here's the full plan. postgres=# explain verbose SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a; QUERY PLAN --- GroupAggregate (cost=1.00..25.60 rows=200 width=12) Output: t.a, sum(t.b ORDER BY t.b) Group Key: t.a -> Gather Merge (cost=1.00..1.00 rows=4520 width=8) Output: t.a, t.b Workers Planned: 2 -> Sort (cost=158.36..163.07 rows=1882 width=8) Output: t.a, t.b Sort Key: t.a, t.b -> Parallel Append (cost=0.00..56.00 rows=1882 width=8) -> Parallel Seq Scan on public.tp1 t_1 (cost=0.00..23.29 rows=1329 width=8) Output: t_1.a, t_1.b -> Parallel Seq Scan on public.td t_2 (cost=0.00..23.29 rows=1329 width=8) Output: t_2.a, t_2.b (14 rows) David
Re: pg16: XX000: could not find pathkey item to sort
On Thu, Mar 14, 2024 at 3:45 PM David Rowley wrote: > On Thu, 14 Mar 2024 at 18:23, Ashutosh Bapat > wrote: > > I don't understand why root->query_pathkeys has both a and b. "a" is > there because of GROUP BY and ORDER BY clause. But why "b"? > > So that the ORDER BY aggregate function can be evaluated without > nodeAgg.c having to perform the sort. See > adjust_group_pathkeys_for_groupagg(). > Thanks. To me, it looks like we are gathering pathkeys, which if used to sort the result of overall join, would avoid sorting in as many as aggregates as possible. relation_can_be_sorted_early() finds, pathkeys which if used to sort the given relation, would help sorting the overall join. Contrary to what I said earlier, it might help if the base relation is sorted on "a" and "b". What I find weird is that the sorting is not pushed down to the partitions, where it would help most. #explain verbose SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a; QUERY PLAN GroupAggregate (cost=362.21..398.11 rows=200 width=12) Output: t.a, sum(t.b ORDER BY t.b) Group Key: t.a -> Sort (cost=362.21..373.51 rows=4520 width=8) Output: t.a, t.b Sort Key: t.a, t.b -> Append (cost=0.00..87.80 rows=4520 width=8) -> Seq Scan on public.tp1 t_1 (cost=0.00..32.60 rows=2260 width=8) Output: t_1.a, t_1.b -> Seq Scan on public.td t_2 (cost=0.00..32.60 rows=2260 width=8) Output: t_2.a, t_2.b (11 rows) and that's the case even without parallel plans #explain verbose SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a; QUERY PLAN GroupAggregate (cost=362.21..398.11 rows=200 width=12) Output: t.a, sum(t.b ORDER BY t.b) Group Key: t.a -> Sort (cost=362.21..373.51 rows=4520 width=8) Output: t.a, t.b Sort Key: t.a, t.b -> Append (cost=0.00..87.80 rows=4520 width=8) -> Seq Scan on public.tp1 t_1 (cost=0.00..32.60 rows=2260 width=8) Output: t_1.a, t_1.b -> Seq Scan on public.td t_2 (cost=0.00..32.60 rows=2260 width=8) Output: t_2.a, t_2.b (11 rows) But it could be just because the corresponding plan was not found to be optimal. May be because there isn't enough data in those tables. If the problem you speculate is different from this one, I am not able to see it. It might help give an example query or explain more. -- Best Wishes, Ashutosh Bapat
Re: pg16: XX000: could not find pathkey item to sort
On Thu, 14 Mar 2024 at 12:00, David Rowley wrote: > I've attached a patch which fixes the problem for me. I've pushed the patch to fix gather_grouping_paths(). The issue with the RelOptInfo having the incorrect PathTarget->exprs after the partial phase of partition-wise aggregate remains. David
Re: pg16: XX000: could not find pathkey item to sort
On Thu, 14 Mar 2024 at 18:23, Ashutosh Bapat wrote: > I don't understand why root->query_pathkeys has both a and b. "a" is there > because of GROUP BY and ORDER BY clause. But why "b"? So that the ORDER BY aggregate function can be evaluated without nodeAgg.c having to perform the sort. See adjust_group_pathkeys_for_groupagg(). David
Re: pg16: XX000: could not find pathkey item to sort
On Thu, Mar 14, 2024 at 4:30 AM David Rowley wrote: > On Thu, 14 Mar 2024 at 06:00, Alexander Lakhin > wrote: > > I've stumbled upon the same error, but this time it apparently has > another > > cause. It can be produced (on REL_16_STABLE and master) as follows: > > CREATE TABLE t (a int, b int) PARTITION BY RANGE (a); > > CREATE TABLE td PARTITION OF t DEFAULT; > > CREATE TABLE tp1 PARTITION OF t FOR VALUES FROM (1) TO (2); > > SET enable_partitionwise_aggregate = on; > > SET parallel_setup_cost = 0; > > SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a; > > > > ERROR: could not find pathkey item to sort > > > > `git bisect` for this anomaly blames the same commit 1349d2790. > > Thanks for finding and for the recreator script. > > I've attached a patch which fixes the problem for me. > > On debugging this I uncovered some other stuff that looks broken which > seems to caused by partition-wise aggregates. With your example > query, in get_useful_pathkeys_for_relation(), we call > relation_can_be_sorted_early() to check if the pathkey can be used as > a set of pathkeys in useful_pathkeys_list. The problem is that in > your query the 'rel' is the base relation belonging to the partitioned > table and relation_can_be_sorted_early() looks through the targetlist > for that relation and finds columns "a" and "b" in there. The problem > is "b" has been aggregated away as partial aggregation has taken place > due to the partition-wise aggregation. I believe whichever rel we > should be using there should have an Aggref in the target exprs rather > than the plain unaggregated column. I've added Robert and Ashutosh to > see what their thoughts are on this. > I don't understand why root->query_pathkeys has both a and b. "a" is there because of GROUP BY and ORDER BY clause. But why "b"? Under the debugger this is what I observed: generate_useful_gather_paths() gets called twice, once for the base relation and second time for the upper relation. When it's called for base relation, it includes "a" and "b" both in the useful pathkeys. The plan doesn't use sortedness on b. But I don't think that's the problem of the relation used. It looks like root->query_pathkeys containing "b" may be a problem. When it's called for upper relation, the reltarget has "a" and Aggref() and it includes only "a" in the useful pathkeys which is as per your expectation. -- Best Wishes, Ashutosh Bapat
Re: pg16: XX000: could not find pathkey item to sort
On Thu, 14 Mar 2024 at 06:00, Alexander Lakhin wrote: > I've stumbled upon the same error, but this time it apparently has another > cause. It can be produced (on REL_16_STABLE and master) as follows: > CREATE TABLE t (a int, b int) PARTITION BY RANGE (a); > CREATE TABLE td PARTITION OF t DEFAULT; > CREATE TABLE tp1 PARTITION OF t FOR VALUES FROM (1) TO (2); > SET enable_partitionwise_aggregate = on; > SET parallel_setup_cost = 0; > SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a; > > ERROR: could not find pathkey item to sort > > `git bisect` for this anomaly blames the same commit 1349d2790. Thanks for finding and for the recreator script. I've attached a patch which fixes the problem for me. On debugging this I uncovered some other stuff that looks broken which seems to caused by partition-wise aggregates. With your example query, in get_useful_pathkeys_for_relation(), we call relation_can_be_sorted_early() to check if the pathkey can be used as a set of pathkeys in useful_pathkeys_list. The problem is that in your query the 'rel' is the base relation belonging to the partitioned table and relation_can_be_sorted_early() looks through the targetlist for that relation and finds columns "a" and "b" in there. The problem is "b" has been aggregated away as partial aggregation has taken place due to the partition-wise aggregation. I believe whichever rel we should be using there should have an Aggref in the target exprs rather than the plain unaggregated column. I've added Robert and Ashutosh to see what their thoughts are on this. David diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 443ab08d75..eaba6ddc03 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -7320,6 +7320,15 @@ gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel) { ListCell *lc; Path *cheapest_partial_path; + List *groupby_pathkeys; + + + /* trim off any pathkeys added for ORDER BY / DISTINCT aggregates */ + if (list_length(root->group_pathkeys) > root->num_groupby_pathkeys) + groupby_pathkeys = list_copy_head(root->group_pathkeys, + root->num_groupby_pathkeys); + else + groupby_pathkeys = root->group_pathkeys; /* Try Gather for unordered paths and Gather Merge for ordered ones. */ generate_useful_gather_paths(root, rel, true); @@ -7334,7 +7343,7 @@ gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel) int presorted_keys; double total_groups; - is_sorted = pathkeys_count_contained_in(root->group_pathkeys, + is_sorted = pathkeys_count_contained_in(groupby_pathkeys, path->pathkeys, _keys); @@ -7366,7 +7375,7 @@ gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel) path = (Path *) create_incremental_sort_path(root, rel, path, - root->group_pathkeys, + groupby_pathkeys, presorted_keys, -1.0); @@ -7375,7 +7384,7 @@ gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel) rel, path, rel->reltarget, - root->group_pathkeys, + groupby_pathkeys, NULL, _groups);
Re: pg16: XX000: could not find pathkey item to sort
Hello David, 09.10.2023 07:13, David Rowley wrote: On Mon, 9 Oct 2023 at 12:42, David Rowley wrote: Maybe it's worth checking the total planning time spent in a run of the regression tests with and without the patch to see how much overhead it adds to the "average case". I've now pushed the patch that trims off the Pathkeys for the ORDER BY / DISTINCT aggregates. I've stumbled upon the same error, but this time it apparently has another cause. It can be produced (on REL_16_STABLE and master) as follows: CREATE TABLE t (a int, b int) PARTITION BY RANGE (a); CREATE TABLE td PARTITION OF t DEFAULT; CREATE TABLE tp1 PARTITION OF t FOR VALUES FROM (1) TO (2); SET enable_partitionwise_aggregate = on; SET parallel_setup_cost = 0; SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a; ERROR: could not find pathkey item to sort `git bisect` for this anomaly blames the same commit 1349d2790. Best regards, Alexander
Re: pg16: XX000: could not find pathkey item to sort
On Mon, Oct 9, 2023 at 12:13 PM David Rowley wrote: > I've now pushed the patch that trims off the Pathkeys for the ORDER BY > / DISTINCT aggregates. Thanks for pushing! > Those results are a bit noisy. Perhaps a few more runs might yield > more consistency, but it seems that there's not too much overhead to > it. If I take the minimum value out of the 3 runs from each, it comes > to about 1.5% extra time spent in planning. Perhaps that's OK. I agree that the overhead is acceptable, especially it only happens in USE_ASSERT_CHECKING builds. Thanks Richard
Re: pg16: XX000: could not find pathkey item to sort
On Mon, 9 Oct 2023 at 12:42, David Rowley wrote: > Maybe it's worth checking the total planning time spent in a run of > the regression tests with and without the patch to see how much > overhead it adds to the "average case". I've now pushed the patch that trims off the Pathkeys for the ORDER BY / DISTINCT aggregates. As for the patch to verify the pathkeys during create plan, I patched master with the attached plan_times.patch.txt and used the following to check the time spent in the planner for 3 runs of make installcheck. $ for i in {1..3}; do pg_ctl start -D pgdata -l plantime.log > /dev/null && cd pg_src && make installcheck > /dev/null && cd .. && grep "planning time in" plantime.log|sed -E -e 's/.*planning time in (.*) nanoseconds/\1/'|awk '{nanoseconds += $1} END{print nanoseconds}' && pg_ctl stop -D pgdata > /dev/null && rm plantime.log; done Master: 1855788104 1839655412 1740769066 Patched: 1917797221 1766606115 1881322655 Those results are a bit noisy. Perhaps a few more runs might yield more consistency, but it seems that there's not too much overhead to it. If I take the minimum value out of the 3 runs from each, it comes to about 1.5% extra time spent in planning. Perhaps that's OK. David diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 1e4dd27dba..3c713782f1 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -17,6 +17,7 @@ #include #include +#include #include "access/genam.h" #include "access/htup_details.h" @@ -274,11 +275,22 @@ planner(Query *parse, const char *query_string, int cursorOptions, ParamListInfo boundParams) { PlannedStmt *result; + struct timespec start, end; + + clock_gettime(CLOCK_PROCESS_CPUTIME_ID, ); if (planner_hook) result = (*planner_hook) (parse, query_string, cursorOptions, boundParams); else result = standard_planner(parse, query_string, cursorOptions, boundParams); + + clock_gettime(CLOCK_PROCESS_CPUTIME_ID, ); + + elog(LOG, +"planning time in %f nanoseconds", +((double) (end.tv_sec * 10 + end.tv_nsec) - +(double) (start.tv_sec * 10 + start.tv_nsec))); + return result; }
Re: pg16: XX000: could not find pathkey item to sort
On Mon, Oct 9, 2023 at 7:42 AM David Rowley wrote: > On Sun, 8 Oct 2023 at 23:52, Richard Guo wrote: > > If the pathkeys that were added by adjust_group_pathkeys_for_groupagg() > > are computable from the targetlist, it seems that we do not need to trim > > them off, because prepare_sort_from_pathkeys() will add resjunk target > > entries for them. But it's also no harm if we trim them off. So I > > think the patch is a pretty safe fix. +1 to it. > > hmm, I think one of us does not understand what is going on here. I > tried to explain in [1] why we *need* to strip off the pathkeys added > by adjust_group_pathkeys_for_groupagg(). Sorry I didn't make myself clear. I understand why we need to trim off the pathkeys added by adjust_group_pathkeys_for_groupagg(). What I meant was that if the new added pathkeys are *computable* from the existing target entries, then prepare_sort_from_pathkeys() will add resjunk target entries for them, so there seems to be no problem even if we do not trim them off. For example explain (verbose, costs off) select a, count(distinct a+1) from prt1 group by a order by a; QUERY PLAN Result Output: prt1.a, (count(DISTINCT ((prt1.a + 1 -> Merge Append Sort Key: prt1.a, ((prt1.a + 1)) -> GroupAggregate Output: prt1.a, count(DISTINCT ((prt1.a + 1))), ((prt1.a + 1)) Group Key: prt1.a -> Sort Output: prt1.a, ((prt1.a + 1)) Sort Key: prt1.a, ((prt1.a + 1)) -> Seq Scan on public.prt1_p1 prt1 Output: prt1.a, (prt1.a + 1) ... Expression 'a+1' is *computable* from the existing entry 'a', so we just add a new resjunk target entry for 'a+1', and there is no error planning this query. But if we change 'a+1' to something that is not computable, then we would have problems (without your fix), and the reason has been well explained by your messages. explain (verbose, costs off) select a, count(distinct b) from prt1 group by a order by a; ERROR: could not find pathkey item to sort Having said that, I think it's the right thing to do to trim off the new added pathkeys, even if they are *computable*. In the plan above, the '(prt1.a + 1)' in GroupAggregate's targetlist and MergeAppend's pathkeys are actually redundant. It's good to remove it. > Can you explain why you think we can put a resjunk "b" in the target > list of the GroupAggregate in the above case? Hmm, I don't think we can do that, because 'b' is not *computable* from the existing target entries, as I explained above. Thanks Richard
Re: pg16: XX000: could not find pathkey item to sort
On Sun, 8 Oct 2023 at 23:52, Richard Guo wrote: > On Thu, Oct 5, 2023 at 2:26 PM David Rowley wrote: >> >> So in short, I propose the attached fix without any regression tests >> because I feel that any regression test would just mark that there was >> a big in create_agg_path() and not really help with ensuring we don't >> end up with some similar problem in the future. > > > If the pathkeys that were added by adjust_group_pathkeys_for_groupagg() > are computable from the targetlist, it seems that we do not need to trim > them off, because prepare_sort_from_pathkeys() will add resjunk target > entries for them. But it's also no harm if we trim them off. So I > think the patch is a pretty safe fix. +1 to it. hmm, I think one of us does not understand what is going on here. I tried to explain in [1] why we *need* to strip off the pathkeys added by adjust_group_pathkeys_for_groupagg(). Given the following example: create table ab (a int,b int); explain (costs off) select a,count(distinct b) from ab group by a; QUERY PLAN GroupAggregate Group Key: a -> Sort Sort Key: a, b -> Seq Scan on ab (5 rows) adjust_group_pathkeys_for_groupagg() will add the pathkey for the "b" column and that results in the Sort node sorting on {a,b}. It's simply not at all valid to have the GroupAggregate path claim that its pathkeys are also (effectively) {a,b}" as "b" does not and cannot legally exist after the aggregation takes place. We cannot put a resjunk "b" in the targetlist of the GroupAggregate either as there could be any number "b" values aggregated. Can you explain why you think we can put a resjunk "b" in the target list of the GroupAggregate in the above case? >> >> I have some concerns that the assert_pathkeys_in_target() function >> might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm >> not proposing to commit that without further discussion. > > > Yeah, it looks like some heavy to call assert_pathkeys_in_target() for > each path node. Can we run some benchmarks to see how much overhead it > would bring to USE_ASSERT_CHECKING build? I think it'll be easy to show that there is an overhead to it. It'll be in the realm of the ~41ms patched vs ~24ms unpatched that I showed in [2]. That's quite an extreme case. Maybe it's worth checking the total planning time spent in a run of the regression tests with and without the patch to see how much overhead it adds to the "average case". David [1] https://postgr.es/m/CAApHDvpJJigQRW29TppTOPYp+Aui0mtd3MpfRxyKv=n-tb6...@mail.gmail.com [2] https://postgr.es/m/caaphdvo7rzcqyw-gnkzr6qcijcqf8vjlkj4xfk-kawvyaw1...@mail.gmail.com
Re: pg16: XX000: could not find pathkey item to sort
On Thu, Oct 5, 2023 at 2:26 PM David Rowley wrote: > So in short, I propose the attached fix without any regression tests > because I feel that any regression test would just mark that there was > a big in create_agg_path() and not really help with ensuring we don't > end up with some similar problem in the future. If the pathkeys that were added by adjust_group_pathkeys_for_groupagg() are computable from the targetlist, it seems that we do not need to trim them off, because prepare_sort_from_pathkeys() will add resjunk target entries for them. But it's also no harm if we trim them off. So I think the patch is a pretty safe fix. +1 to it. > I have some concerns that the assert_pathkeys_in_target() function > might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm > not proposing to commit that without further discussion. Yeah, it looks like some heavy to call assert_pathkeys_in_target() for each path node. Can we run some benchmarks to see how much overhead it would bring to USE_ASSERT_CHECKING build? Thanks Richard
Re: pg16: XX000: could not find pathkey item to sort
On Tue, 3 Oct 2023 at 20:16, David Rowley wrote: > I wonder if the attached patch is too much of a special case fix. I > guess from the lack of complaints previously that there are no other > cases where we could possibly have pathkeys that belong to columns > that are aggregated. I've not gone to much effort to see if I can > craft a case that hits this without the ORDER BY/DISTINCT aggregate > optimisation, however. I spent more time on this today. I'd been wondering if there was any reason why create_agg_path() would receive a subpath with pathkeys that were anything but the PlannerInfo's group_pathkeys. I mean, how could we do Group Aggregate if it wasn't? I wondered if grouping sets might change that, but it seems the group_pathkeys will be set to the initial grouping set. Given that, it would seem it's safe to just trim off any pathkey that was added to the group_pathkeys by adjust_group_pathkeys_for_groupagg(). PlannerInfo.num_groupby_pathkeys marks the number of pathkeys that existed in group_pathkeys before adjust_group_pathkeys_for_groupagg() made any additions, so we can just trim the list length back to that. I've done this in the attached patch. I also considered if it was worth adding a regression test for this and I concluded that there are better ways to test for this and considered if we should add some code to createplan.c to check that all Path pathkeys have corresponding items in the PathTarget. I've included an additional patch which adds some code in USE_ASSERT_CHECKING builds to verify this. Without the fix it's simple enough to trigger this with a query such as: select two,count(distinct four) from tenk1 group by two order by two; Without the fix the additional asserts cause the regression tests to fail, but with the fix everything passes. Justin's case is quite an obscure way to hit this as it requires partitionwise aggregation plus a single partition so that the Append is removed due to only having a single subplan in setrefs.c. If there had been 2 partitions, then the AppendPath wouldn't have inherited the subpath's pathkeys per code at the end of create_append_path(). So in short, I propose the attached fix without any regression tests because I feel that any regression test would just mark that there was a big in create_agg_path() and not really help with ensuring we don't end up with some similar problem in the future. I have some concerns that the assert_pathkeys_in_target() function might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm not proposing to commit that without further discussion. Does anyone feel differently? If not, I plan to push the attached strip_aggregate_pathkeys_from_aggpaths_v2.patch early next week. David strip_aggregated_pathkeys_from_aggpaths_v2.patch Description: Binary data assert_pathkeys_in_target.patch Description: Binary data
Re: pg16: XX000: could not find pathkey item to sort
On Tue, 3 Oct 2023 at 09:11, David Rowley wrote: > I'm concerned that this patch will be too much overhead when creating > paths when a PathKey's EquivalenceClass has a large number of members > from partitioned tables. I just tried out the patch to see how much it affects the performance of the planner. I think we need to find a better way to strip off the pathkeys for the columns that have been aggregated. Setup: create table lp (a int, b int) partition by list (a); select 'create table lp'||x||' partition of lp for values in('||x||')' from generate_series(0,999)x; \gexec \pset pager off set enable_partitionwise_aggregate=1; Benchmark query: explain (summary on) select a,count(*) from lp group by a; Master: Planning Time: 23.945 ms Planning Time: 23.887 ms Planning Time: 23.927 ms perf top: 7.39% libc.so.6 [.] __memmove_avx_unaligned_erms 6.98% [kernel] [k] clear_page_rep 5.69% postgres [.] bms_is_subset 5.07% postgres [.] fetch_upper_rel 4.41% postgres [.] bms_equal Patched: Planning Time: 41.410 ms Planning Time: 41.474 ms Planning Time: 41.488 ms perf top: 19.02% postgres [.] bms_is_subset 6.91% postgres [.] find_ec_member_matching_expr 5.93% libc.so.6 [.] __memmove_avx_unaligned_erms 5.55% [kernel] [k] clear_page_rep 4.07% postgres [.] fetch_upper_rel 3.46% postgres [.] bms_equal > I wondered if we should instead just check > if the subpath's pathkeys match root->group_pathkeys and if they do > set the AggPath's pathkeys to list_copy_head(subpath->pathkeys, > root->num_groupby_pathkeys), that'll be much cheaper, but it just > feels a bit too much like a special case. I tried this approach (patch attached) and it does perform better than the other patch: create_agg_path_fix2.patch: Planning Time: 24.357 ms Planning Time: 24.293 ms Planning Time: 24.259 ms 7.45% libc.so.6 [.] __memmove_avx_unaligned_erms 6.90% [kernel] [k] clear_page_rep 5.56% postgres [.] bms_is_subset 5.38% postgres [.] bms_equal I wonder if the attached patch is too much of a special case fix. I guess from the lack of complaints previously that there are no other cases where we could possibly have pathkeys that belong to columns that are aggregated. I've not gone to much effort to see if I can craft a case that hits this without the ORDER BY/DISTINCT aggregate optimisation, however. David create_agg_path_fix2.patch Description: Binary data
Re: pg16: XX000: could not find pathkey item to sort
On Tue, 19 Sept 2023 at 23:45, Richard Guo wrote: > My first thought about the fix is that we artificially add resjunk > target entries to parse->targetList for the ordered aggregates' > arguments that are ORDER BY expressions, as attached. While this can > fix the given query, it would cause Assert failure for the query in > sql/triggers.sql. > Any thoughts? Unfortunately, we can't do that as it'll lead to target entries existing in the GroupAggregate's target list that have been aggregated. postgres=# explain verbose SELECT a, COUNT(DISTINCT b) FROM rp GROUP BY a; QUERY PLAN -- Append (cost=88.17..201.39 rows=400 width=44) -> GroupAggregate (cost=88.17..99.70 rows=200 width=44) Output: rp.a, count(DISTINCT rp.b), rp.b Your patch adds rp.b as an output column of the GroupAggregate. Logically, that column cannot exist there as there is no correct single value of rp.b after aggregation. I think the fix needs to go into create_agg_path(). The problem is that for AGG_SORTED we do: if (aggstrategy == AGG_SORTED) pathnode->path.pathkeys = subpath->pathkeys; /* preserves order */ which assumes that all of the columns before the aggregate will be available after the aggregate. That likely used to work ok before 1349d2790 as the planner wouldn't have requested any Pathkeys for columns that were not available below the Agg node. We can no longer take the subpath pathkey's verbatim. We need to strip off pathkeys for columns that are not in pathnode's targetlist. I've attached a patch which adds a new function to pathkeys.c to strip off any PathKeys in a list that don't have a corresponding item in the given PathTarget and just return a prefix of the input pathkey list up until the first expr that can't be found. I'm concerned that this patch will be too much overhead when creating paths when a PathKey's EquivalenceClass has a large number of members from partitioned tables. I wondered if we should instead just check if the subpath's pathkeys match root->group_pathkeys and if they do set the AggPath's pathkeys to list_copy_head(subpath->pathkeys, root->num_groupby_pathkeys), that'll be much cheaper, but it just feels a bit too much like a special case. David strip_aggregated_pathkeys_from_aggpaths.patch Description: Binary data
Re: pg16: XX000: could not find pathkey item to sort
On Mon, Sep 18, 2023 at 10:02 PM Justin Pryzby wrote: > This fails since 1349d2790b > > commit 1349d2790bf48a4de072931c722f39337e72055e > Author: David Rowley > Date: Tue Aug 2 23:11:45 2022 +1200 > > Improve performance of ORDER BY / DISTINCT aggregates > > ts=# CREATE TABLE t (a int, b text) PARTITION BY RANGE (a); > ts=# CREATE TABLE td PARTITION OF t DEFAULT; > ts=# INSERT INTO t SELECT 1 AS a, '' AS b; > ts=# SET enable_partitionwise_aggregate=on; > ts=# explain SELECT a, COUNT(DISTINCT b) FROM t GROUP BY a; > ERROR: XX000: could not find pathkey item to sort > LOCATION: prepare_sort_from_pathkeys, createplan.c:6235 Thanks for the report! I've looked at it a little bit. In function adjust_group_pathkeys_for_groupagg we add the pathkeys in ordered aggregates to root->group_pathkeys. But if the new added pathkeys do not have EC members that match the targetlist or can be computed from the targetlist, prepare_sort_from_pathkeys would have problem computing sort column info for the new added pathkeys. In the given example, the pathkey representing 'b' can not match or be computed from the current targetlist, so prepare_sort_from_pathkeys emits the error. My first thought about the fix is that we artificially add resjunk target entries to parse->targetList for the ordered aggregates' arguments that are ORDER BY expressions, as attached. While this can fix the given query, it would cause Assert failure for the query in sql/triggers.sql. -- inserts only insert into my_table values (1, 'AAA'), (2, 'BBB') on conflict (a) do update set b = my_table.b || ':' || excluded.b; I haven't looked into how that happens. Any thoughts? Thanks Richard v1-0001-Include-ordered-aggregates-arguments-in-targetList.patch Description: Binary data
pg16: XX000: could not find pathkey item to sort
This fails since 1349d2790b commit 1349d2790bf48a4de072931c722f39337e72055e Author: David Rowley Date: Tue Aug 2 23:11:45 2022 +1200 Improve performance of ORDER BY / DISTINCT aggregates ts=# CREATE TABLE t (a int, b text) PARTITION BY RANGE (a); ts=# CREATE TABLE td PARTITION OF t DEFAULT; ts=# INSERT INTO t SELECT 1 AS a, '' AS b; ts=# SET enable_partitionwise_aggregate=on; ts=# explain SELECT a, COUNT(DISTINCT b) FROM t GROUP BY a; ERROR: XX000: could not find pathkey item to sort LOCATION: prepare_sort_from_pathkeys, createplan.c:6235 -- Justin