fujii.y...@df.mitsubishielectric.co.jp писал 2023-07-10 10:35:
I have modified the program except for the point "if the version of
the remote server is less than PG17".
Instead, we have addressed the following.
"If check_partial_aggregate_support is true and the remote server
version is older than the local server
version, postgres_fdw does not assume that the partial aggregate
function is on the remote server unless
the partial aggregate function and the aggregate function match."
The reason for this is to maintain compatibility with any aggregate
function that does not support partial
aggregate in one version of V1 (V1 is PG17 or higher), even if the
next version supports partial aggregate.
For example, string_agg does not support partial aggregation in PG15,
but it will support partial aggregation
in PG16.


Hi.

1) In foreign_join_ok() should we set fpinfo->user if fpinfo->check_partial_aggregate_support is set like it's done for fpinfo->use_remote_estimate? It seems we can end up with fpinfo->user = NULL if use_remote_estimate is not set.

2) It seeems we found an additional issue with original patch, which is present in current one. I'm attaching a patch which seems to fix it, but I'm not quite sure in it.


We have not been able to add a test for the case where the remote
server version is older than the
local server version to the regression test. Is there any way to add
such tests to the existing regression
tests?


--
Best regards,
Alexander Pyhalov,
Postgres Professional
From 187d15185200aabc22c5219bbe636bc950670a78 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov <a.pyha...@postgrespro.ru>
Date: Fri, 14 Jul 2023 16:34:02 +0300
Subject: [PATCH] For partial aggregation we can't rely on the fact that every
 var is a part of some GROUP BY expression

---
 .../postgres_fdw/expected/postgres_fdw.out    | 120 ++++++++++++++++++
 contrib/postgres_fdw/postgres_fdw.c           |  31 +++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   9 ++
 3 files changed, 152 insertions(+), 8 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 8d80ba0a6be..31ee461045c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9955,6 +9955,126 @@ SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1;
  49 | 19.0000000000000000 |  29 |    60
 (50 rows)
 
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT b/2, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+                                                           QUERY PLAN                                                           
+--------------------------------------------------------------------------------------------------------------------------------
+ Sort
+   Output: ((pagg_tab.b / 2)), (avg(pagg_tab.a)), (max(pagg_tab.a)), (count(*))
+   Sort Key: ((pagg_tab.b / 2))
+   ->  Finalize HashAggregate
+         Output: ((pagg_tab.b / 2)), avg(pagg_tab.a), max(pagg_tab.a), count(*)
+         Group Key: ((pagg_tab.b / 2))
+         ->  Append
+               ->  Foreign Scan
+                     Output: ((pagg_tab.b / 2)), (PARTIAL avg(pagg_tab.a)), (PARTIAL max(pagg_tab.a)), (PARTIAL count(*))
+                     Relations: Aggregate on (public.fpagg_tab_p1 pagg_tab)
+                     Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p1 GROUP BY 1
+               ->  Foreign Scan
+                     Output: ((pagg_tab_1.b / 2)), (PARTIAL avg(pagg_tab_1.a)), (PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*))
+                     Relations: Aggregate on (public.fpagg_tab_p2 pagg_tab_1)
+                     Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p2 GROUP BY 1
+               ->  Foreign Scan
+                     Output: ((pagg_tab_2.b / 2)), (PARTIAL avg(pagg_tab_2.a)), (PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*))
+                     Relations: Aggregate on (public.fpagg_tab_p3 pagg_tab_2)
+                     Remote SQL: SELECT (b / 2), avg_p_int4(a), max(a), count(*) FROM public.pagg_tab_p3 GROUP BY 1
+(19 rows)
+
+SELECT b/2, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+ ?column? |         avg         | max | count 
+----------+---------------------+-----+-------
+        0 | 10.5000000000000000 |  21 |   120
+        1 | 12.5000000000000000 |  23 |   120
+        2 | 14.5000000000000000 |  25 |   120
+        3 | 16.5000000000000000 |  27 |   120
+        4 | 18.5000000000000000 |  29 |   120
+        5 | 10.5000000000000000 |  21 |   120
+        6 | 12.5000000000000000 |  23 |   120
+        7 | 14.5000000000000000 |  25 |   120
+        8 | 16.5000000000000000 |  27 |   120
+        9 | 18.5000000000000000 |  29 |   120
+       10 | 10.5000000000000000 |  21 |   120
+       11 | 12.5000000000000000 |  23 |   120
+       12 | 14.5000000000000000 |  25 |   120
+       13 | 16.5000000000000000 |  27 |   120
+       14 | 18.5000000000000000 |  29 |   120
+       15 | 10.5000000000000000 |  21 |   120
+       16 | 12.5000000000000000 |  23 |   120
+       17 | 14.5000000000000000 |  25 |   120
+       18 | 16.5000000000000000 |  27 |   120
+       19 | 18.5000000000000000 |  29 |   120
+       20 | 10.5000000000000000 |  21 |   120
+       21 | 12.5000000000000000 |  23 |   120
+       22 | 14.5000000000000000 |  25 |   120
+       23 | 16.5000000000000000 |  27 |   120
+       24 | 18.5000000000000000 |  29 |   120
+(25 rows)
+
+-- It's unsafe to pushdown variables if we need both variable and variable-based expression
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT (b/2)::numeric, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+                                                                  QUERY PLAN                                                                  
+----------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+   Output: ((((pagg_tab.b / 2)))::numeric), (avg(pagg_tab.a)), (max(pagg_tab.a)), (count(*)), ((pagg_tab.b / 2))
+   Sort Key: ((((pagg_tab.b / 2)))::numeric)
+   ->  Finalize GroupAggregate
+         Output: (((pagg_tab.b / 2)))::numeric, avg(pagg_tab.a), max(pagg_tab.a), count(*), ((pagg_tab.b / 2))
+         Group Key: ((pagg_tab.b / 2))
+         ->  Sort
+               Output: ((pagg_tab.b / 2)), pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL max(pagg_tab.a)), (PARTIAL count(*))
+               Sort Key: ((pagg_tab.b / 2))
+               ->  Append
+                     ->  Partial HashAggregate
+                           Output: ((pagg_tab.b / 2)), pagg_tab.b, PARTIAL avg(pagg_tab.a), PARTIAL max(pagg_tab.a), PARTIAL count(*)
+                           Group Key: (pagg_tab.b / 2)
+                           ->  Foreign Scan on public.fpagg_tab_p1 pagg_tab
+                                 Output: (pagg_tab.b / 2), pagg_tab.b, pagg_tab.a
+                                 Remote SQL: SELECT a, b FROM public.pagg_tab_p1
+                     ->  Partial HashAggregate
+                           Output: ((pagg_tab_1.b / 2)), pagg_tab_1.b, PARTIAL avg(pagg_tab_1.a), PARTIAL max(pagg_tab_1.a), PARTIAL count(*)
+                           Group Key: (pagg_tab_1.b / 2)
+                           ->  Foreign Scan on public.fpagg_tab_p2 pagg_tab_1
+                                 Output: (pagg_tab_1.b / 2), pagg_tab_1.b, pagg_tab_1.a
+                                 Remote SQL: SELECT a, b FROM public.pagg_tab_p2
+                     ->  Partial HashAggregate
+                           Output: ((pagg_tab_2.b / 2)), pagg_tab_2.b, PARTIAL avg(pagg_tab_2.a), PARTIAL max(pagg_tab_2.a), PARTIAL count(*)
+                           Group Key: (pagg_tab_2.b / 2)
+                           ->  Foreign Scan on public.fpagg_tab_p3 pagg_tab_2
+                                 Output: (pagg_tab_2.b / 2), pagg_tab_2.b, pagg_tab_2.a
+                                 Remote SQL: SELECT a, b FROM public.pagg_tab_p3
+(28 rows)
+
+SELECT (b/2)::numeric, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+ numeric |         avg         | max | count 
+---------+---------------------+-----+-------
+       0 | 10.5000000000000000 |  21 |   120
+       1 | 12.5000000000000000 |  23 |   120
+       2 | 14.5000000000000000 |  25 |   120
+       3 | 16.5000000000000000 |  27 |   120
+       4 | 18.5000000000000000 |  29 |   120
+       5 | 10.5000000000000000 |  21 |   120
+       6 | 12.5000000000000000 |  23 |   120
+       7 | 14.5000000000000000 |  25 |   120
+       8 | 16.5000000000000000 |  27 |   120
+       9 | 18.5000000000000000 |  29 |   120
+      10 | 10.5000000000000000 |  21 |   120
+      11 | 12.5000000000000000 |  23 |   120
+      12 | 14.5000000000000000 |  25 |   120
+      13 | 16.5000000000000000 |  27 |   120
+      14 | 18.5000000000000000 |  29 |   120
+      15 | 10.5000000000000000 |  21 |   120
+      16 | 12.5000000000000000 |  23 |   120
+      17 | 14.5000000000000000 |  25 |   120
+      18 | 16.5000000000000000 |  27 |   120
+      19 | 18.5000000000000000 |  29 |   120
+      20 | 10.5000000000000000 |  21 |   120
+      21 | 12.5000000000000000 |  23 |   120
+      22 | 14.5000000000000000 |  25 |   120
+      23 | 16.5000000000000000 |  27 |   120
+      24 | 18.5000000000000000 |  29 |   120
+(25 rows)
+
 -- Partial aggregates are safe to push down for all built-in aggregates
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT /* aggregate function <> aggpartialfunc */
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 5c48c5d50dc..eec76a00488 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6381,13 +6381,17 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel,
 	ListCell   *lc;
 	int			i;
 	List	   *tlist = NIL;
+	bool	partial = false;
 
 	/* We currently don't support pushing Grouping Sets. */
 	if (query->groupingSets)
 		return false;
 
+	if (patype == PARTITIONWISE_AGGREGATE_PARTIAL)
+		partial = true;
+
 	/* It's unsafe to push having statements with partial aggregates */
-	if ((patype == PARTITIONWISE_AGGREGATE_PARTIAL) && havingQual)
+	if (partial && havingQual)
 		return false;
 
 	/* Get the fpinfo of the underlying scan relation. */
@@ -6461,11 +6465,17 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel,
 		}
 		else
 		{
+			/*
+			 * For partial aggregation we can't rely on the fact that every
+			 * var is a part of some GROUP BY expression, so extract variables
+			 * and if we found some, avoid to push it down.
+			 */
+
 			/*
 			 * Non-grouping expression we need to compute.  Can we ship it
 			 * as-is to the foreign server?
 			 */
-			if (is_foreign_expr(root, grouped_rel, expr) &&
+			if (!partial && is_foreign_expr(root, grouped_rel, expr) &&
 				!is_foreign_param(root, grouped_rel, expr))
 			{
 				/* Yes, so add to tlist as-is; OK to suppress duplicates */
@@ -6489,14 +6499,17 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel,
 					return false;
 
 				/*
-				 * Add aggregates, if any, into the targetlist.  Plain Vars
-				 * outside an aggregate can be ignored, because they should be
+				 * Add aggregates, if any, into the targetlist.  When dealing
+				 * with usual aggregations,  plain Vars outside an aggregate
+				 * can be ignored, because they should be
 				 * either same as some GROUP BY column or part of some GROUP
 				 * BY expression.  In either case, they are already part of
-				 * the targetlist and thus no need to add them again.  In fact
-				 * including plain Vars in the tlist when they do not match a
-				 * GROUP BY column would cause the foreign server to complain
-				 * that the shipped query is invalid.
+				 * the targetlist and thus no need to add them again. When
+				 * looking at partial aggregation, we can get additional vars
+				 * needed for partial aggregation itself, which do not match
+				 * group by column. In fact including plain Vars in the tlist
+				 * when they do not match a GROUP BY column would cause the
+				 * foreign server to complain that the shipped query is invalid.
 				 */
 				foreach(l, aggvars)
 				{
@@ -6504,6 +6517,8 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel,
 
 					if (IsA(aggref, Aggref))
 						tlist = add_to_flat_tlist(tlist, list_make1(aggref));
+					else if (partial)
+						return false;
 				}
 			}
 		}
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 827637b001a..4f93e1931de 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3030,6 +3030,15 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1;
 SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1;
 
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT b/2, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+SELECT b/2, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+
+-- It's unsafe to pushdown variables if we need both variable and variable-based expression
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT (b/2)::numeric, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+SELECT (b/2)::numeric, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2 ORDER BY 1;
+
 -- Partial aggregates are safe to push down for all built-in aggregates
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT /* aggregate function <> aggpartialfunc */
-- 
2.34.1

Reply via email to