fujii.y...@df.mitsubishielectric.co.jp писал 2023-07-19 03:43:
Hi Mr.Pyhalov, hackers.

3)
I modified the patch to safely do a partial aggregate pushdown for
queries which contain having clauses.


Hi.
Sorry, but I don't see how it could work.
For example, the attached test returns wrong result:

CREATE FUNCTION f() RETURNS INT AS $$
begin
  return 10;
end
$$ LANGUAGE PLPGSQL;

SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) < f() ORDER BY 1;
 b  | sum
----+-----
  0 |   0
 10 |   0
 20 |   0
 30 |   0
 40 |   0
+(5 rows)

In fact the above query should have returned 0 rows, as

SELECT b, sum(a) FROM pagg_tab GROUP BY b ORDER BY 1;
 b  | sum
----+------
  0 |  600
  1 |  660
  2 |  720
  3 |  780
  4 |  840
  5 |  900
  6 |  960
  7 | 1020
  8 | 1080
  9 | 1140
 10 |  600
 11 |  660
 12 |  720
....
shows no such rows.

Or, on the same data

SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) > 660 ORDER BY 1;

You'll get 0 rows.

But
SELECT b, sum(a) FROM pagg_tab GROUP BY b;
 b  | sum
----+------
 42 |  720
 29 | 1140
  4 |  840
 34 |  840
 41 |  660
  0 |  600
 40 |  600
gives.

The issue is that you can't calculate "partial" having. You should compare full aggregate in filter, but it's not possible on the level of one partition.
And you have this in plans

 Finalize GroupAggregate
   Output: pagg_tab.b, avg(pagg_tab.a), max(pagg_tab.a), count(*)
   Group Key: pagg_tab.b
   Filter: (sum(pagg_tab.a) < 700)
   ->  Sort
Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
         Sort Key: pagg_tab.b
         ->  Append
               ->  Foreign Scan
Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a)) Filter: ((PARTIAL sum(pagg_tab.a)) < 700) !!!! <--- here we can't compare anything yet, sum is incomplete. Relations: Aggregate on (public.fpagg_tab_p1 pagg_tab) Remote SQL: SELECT b, avg_p_int4(a), max(a), count(*), sum(a) FROM public.pagg_tab_p1 GROUP BY 1
               ->  Foreign Scan
Output: pagg_tab_1.b, (PARTIAL avg(pagg_tab_1.a)), (PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab_1.a))
                     Filter: ((PARTIAL sum(pagg_tab_1.a)) < 700)
Relations: Aggregate on (public.fpagg_tab_p2 pagg_tab_1) Remote SQL: SELECT b, avg_p_int4(a), max(a), count(*), sum(a) FROM public.pagg_tab_p2 GROUP BY 1
               ->  Foreign Scan
Output: pagg_tab_2.b, (PARTIAL avg(pagg_tab_2.a)), (PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab_2.a))
                     Filter: ((PARTIAL sum(pagg_tab_2.a)) < 700)
Relations: Aggregate on (public.fpagg_tab_p3 pagg_tab_2) Remote SQL: SELECT b, avg_p_int4(a), max(a), count(*), sum(a) FROM public.pagg_tab_p3 GROUP BY 1

In foreign_grouping_ok()
6586                         if (IsA(expr, Aggref))
6587                         {
6588                                 if (partial)
6589                                 {
6590 mark_partial_aggref((Aggref *) expr, AGGSPLIT_INITIAL_SERIAL);
6591                                         continue;
6592                                 }
6593 else if (!is_foreign_expr(root, grouped_rel, expr))
6594                                         return false;
6595
6596 tlist = add_to_flat_tlist(tlist, list_make1(expr));
6597                         }

at least you shouldn't do anything with expr, if is_foreign_expr() returned false. If we restrict pushing down queries with havingQuals, I'm not quite sure how Aggref can appear in local_conds.

As for changes in planner.c (setGroupClausePartial()) I have several questions.

1) Why don't we add non_group_exprs to pathtarget->exprs when partial_target->exprs is not set?

2) We replace extra->partial_target->exprs with partial_target->exprs after processing. Why are we sure that after this tleSortGroupRef is correct?

--
Best regards,
Alexander Pyhalov,
Postgres Professional
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 1824cb67fe9..c6f613019c3 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3025,6 +3025,22 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1;
 SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1;
 
+CREATE FUNCTION f() RETURNS INT AS $$
+begin
+  return 10;
+end 
+$$ LANGUAGE PLPGSQL;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) < f() ORDER BY 1;
+SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) < f() ORDER BY 1;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT b, sum(a) FROM pagg_tab GROUP BY b ORDER BY 1;
+SELECT b, sum(a) FROM pagg_tab GROUP BY b ORDER BY 1;
+
+DROP function f();
+
 -- Partial aggregates are safe to push down without having clause
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1;

Reply via email to