On Sun, 29 Oct 2023 at 12:45, Bruce Momjian <br...@momjian.us> wrote: > Has anything been done about this issue?
Nothing has been done. I was hoping to get the attention of a few people who have dealt more with postgres_fdw in the past. I've attached a patch with adjusts DEFAULT_FDW_TUPLE_COST and sets it to 0.2. I set it to this because my experiment in [1] showed that it was about 21x lower than the actual costs (on my machine with a loopback fdw connecting to the same instance and database using my example query). Given that we have parallel_tuple_cost set to 0.1 by default, the network cost of a tuple from an FDW of 0.2 seems reasonable to me. Slightly higher is probably also reasonable, but given the seeming lack of complaints, I think I'd rather err on the low side. Changing it to 0.2, I see 4 plans change in postgres_fdw's regression tests. All of these changes are due to STD_FUZZ_FACTOR causing some other plan to win in add_path(). For example the query EXPLAIN (VERBOSE, ANALYZE) SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1; the plan switches from a HashAggregate to a GroupAggregate. This is because after increasing the DEFAULT_FDW_TUPLE_COST to 0.2 the sorted append child (fuzzily) costs the same as the unsorted seq scan path and the sorted path wins in add_path due to having better pathkeys. The seq scan path is then thrown away and we end up doing the Group Aggregate using the sorted append children. If I change STD_FUZZ_FACTOR to something like 1.0000001 then the plans no longer change when I do: alter server loopback options (add fdw_tuple_cost '0.01'); <run the query> alter server loopback options (drop fdw_tuple_cost); <run the query> Ordinarily, I'd not care too much about that, but I did test the performance of one of the plans and the new plan came out slower than the old one. I'm not exactly sure how best to proceed on this in the absence of any feedback. David [1] https://postgr.es/m/caaphdvopvjjfh5c1ed2hrvddfom2depmwwiu5-f1anmyprj...@mail.gmail.com
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 144c114d0f..e689394807 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9729,21 +9729,19 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE -- test FOR UPDATE; partitionwise join does not apply EXPLAIN (COSTS OFF) SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1; - QUERY PLAN --------------------------------------------------------------- + QUERY PLAN +-------------------------------------------------------- LockRows - -> Sort - Sort Key: t1.a - -> Hash Join - Hash Cond: (t2.b = t1.a) + -> Nested Loop + Join Filter: (t1.a = t2.b) + -> Append + -> Foreign Scan on ftprt1_p1 t1_1 + -> Foreign Scan on ftprt1_p2 t1_2 + -> Materialize -> Append -> Foreign Scan on ftprt2_p1 t2_1 -> Foreign Scan on ftprt2_p2 t2_2 - -> Hash - -> Append - -> Foreign Scan on ftprt1_p1 t1_1 - -> Foreign Scan on ftprt1_p2 t1_2 -(12 rows) +(10 rows) SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1; a | b @@ -9778,18 +9776,16 @@ ANALYZE fpagg_tab_p3; SET enable_partitionwise_aggregate TO false; EXPLAIN (COSTS OFF) SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1; - QUERY PLAN ------------------------------------------------------------ - Sort - Sort Key: pagg_tab.a - -> HashAggregate - Group Key: pagg_tab.a - Filter: (avg(pagg_tab.b) < '22'::numeric) - -> Append - -> Foreign Scan on fpagg_tab_p1 pagg_tab_1 - -> Foreign Scan on fpagg_tab_p2 pagg_tab_2 - -> Foreign Scan on fpagg_tab_p3 pagg_tab_3 -(9 rows) + QUERY PLAN +----------------------------------------------------- + GroupAggregate + Group Key: pagg_tab.a + Filter: (avg(pagg_tab.b) < '22'::numeric) + -> Append + -> Foreign Scan on fpagg_tab_p1 pagg_tab_1 + -> Foreign Scan on fpagg_tab_p2 pagg_tab_2 + -> Foreign Scan on fpagg_tab_p3 pagg_tab_3 +(7 rows) -- Plan with partitionwise aggregates is enabled SET enable_partitionwise_aggregate TO true; @@ -9823,34 +9819,32 @@ SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 O -- Should have all the columns in the target list for the given relation EXPLAIN (VERBOSE, COSTS OFF) SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1; - QUERY PLAN ------------------------------------------------------------------------- - Sort - Output: t1.a, (count(((t1.*)::pagg_tab))) + QUERY PLAN +-------------------------------------------------------------------------------------------- + Merge Append Sort Key: t1.a - -> Append - -> HashAggregate - Output: t1.a, count(((t1.*)::pagg_tab)) - Group Key: t1.a - Filter: (avg(t1.b) < '22'::numeric) - -> Foreign Scan on public.fpagg_tab_p1 t1 - Output: t1.a, t1.*, t1.b - Remote SQL: SELECT a, b, c FROM public.pagg_tab_p1 - -> HashAggregate - Output: t1_1.a, count(((t1_1.*)::pagg_tab)) - Group Key: t1_1.a - Filter: (avg(t1_1.b) < '22'::numeric) - -> Foreign Scan on public.fpagg_tab_p2 t1_1 - Output: t1_1.a, t1_1.*, t1_1.b - Remote SQL: SELECT a, b, c FROM public.pagg_tab_p2 - -> HashAggregate - Output: t1_2.a, count(((t1_2.*)::pagg_tab)) - Group Key: t1_2.a - Filter: (avg(t1_2.b) < '22'::numeric) - -> Foreign Scan on public.fpagg_tab_p3 t1_2 - Output: t1_2.a, t1_2.*, t1_2.b - Remote SQL: SELECT a, b, c FROM public.pagg_tab_p3 -(25 rows) + -> GroupAggregate + Output: t1.a, count(((t1.*)::pagg_tab)) + Group Key: t1.a + Filter: (avg(t1.b) < '22'::numeric) + -> Foreign Scan on public.fpagg_tab_p1 t1 + Output: t1.a, t1.*, t1.b + Remote SQL: SELECT a, b, c FROM public.pagg_tab_p1 ORDER BY a ASC NULLS LAST + -> GroupAggregate + Output: t1_1.a, count(((t1_1.*)::pagg_tab)) + Group Key: t1_1.a + Filter: (avg(t1_1.b) < '22'::numeric) + -> Foreign Scan on public.fpagg_tab_p2 t1_1 + Output: t1_1.a, t1_1.*, t1_1.b + Remote SQL: SELECT a, b, c FROM public.pagg_tab_p2 ORDER BY a ASC NULLS LAST + -> GroupAggregate + Output: t1_2.a, count(((t1_2.*)::pagg_tab)) + Group Key: t1_2.a + Filter: (avg(t1_2.b) < '22'::numeric) + -> Foreign Scan on public.fpagg_tab_p3 t1_2 + Output: t1_2.a, t1_2.*, t1_2.b + Remote SQL: SELECT a, b, c FROM public.pagg_tab_p3 ORDER BY a ASC NULLS LAST +(23 rows) SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1; a | count @@ -9866,24 +9860,23 @@ SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1; -- When GROUP BY clause does not match with PARTITION KEY. EXPLAIN (COSTS OFF) SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1; - QUERY PLAN ------------------------------------------------------------------ - Sort - Sort Key: pagg_tab.b - -> Finalize HashAggregate - Group Key: pagg_tab.b - Filter: (sum(pagg_tab.a) < 700) - -> Append - -> Partial HashAggregate - Group Key: pagg_tab.b - -> Foreign Scan on fpagg_tab_p1 pagg_tab - -> Partial HashAggregate - Group Key: pagg_tab_1.b - -> Foreign Scan on fpagg_tab_p2 pagg_tab_1 - -> Partial HashAggregate - Group Key: pagg_tab_2.b - -> Foreign Scan on fpagg_tab_p3 pagg_tab_2 -(15 rows) + QUERY PLAN +----------------------------------------------------------- + Finalize GroupAggregate + Group Key: pagg_tab.b + Filter: (sum(pagg_tab.a) < 700) + -> Merge Append + Sort Key: pagg_tab.b + -> Partial GroupAggregate + Group Key: pagg_tab.b + -> Foreign Scan on fpagg_tab_p1 pagg_tab + -> Partial GroupAggregate + Group Key: pagg_tab_1.b + -> Foreign Scan on fpagg_tab_p2 pagg_tab_1 + -> Partial GroupAggregate + Group Key: pagg_tab_2.b + -> Foreign Scan on fpagg_tab_p3 pagg_tab_2 +(14 rows) -- =================================================================== -- access rights and superuser diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 8b3206ceaa..6de2bec3b7 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -57,7 +57,7 @@ PG_MODULE_MAGIC; #define DEFAULT_FDW_STARTUP_COST 100.0 /* Default CPU cost to process 1 row (above and beyond cpu_tuple_cost). */ -#define DEFAULT_FDW_TUPLE_COST 0.01 +#define DEFAULT_FDW_TUPLE_COST 0.2 /* If no remote estimates, assume a sort costs 20% extra */ #define DEFAULT_FDW_SORT_MULTIPLIER 1.2