(2018/11/27 21:55), Etsuro Fujita wrote:
> While working on [1], I noticed that since we don't set the selectivity
> and cost of the local_conds (i.e., fpinfo->local_conds_sel and
> fpinfo->local_conds_cost) properly in add_foreign_grouping_paths and
> foreign_grouping_ok, estimate_path_cost_size produces considerably
> underestimated results when use_remote_estimate. I think this would
> cause problems in later planning steps. So, why not add something like
> this to add_foreign_grouping_paths, as done in other functions such as
> postgresGetForeignJopinPaths?
>
> --- a/contrib/postgres_fdw/postgres_fdw.c
> +++ b/contrib/postgres_fdw/postgres_fdw.c
> @@ -5496,6 +5496,19 @@ add_foreign_grouping_paths(PlannerInfo *root,
> RelOptInfo\
> *input_rel,
> if (!foreign_grouping_ok(root, grouped_rel, extra->havingQual))
> return;
>
> + /*
> + * Compute the selectivity and cost of the local_conds, so we don't have
> + * to do it over again for each path. The best we can do for these
> + * conditions is to estimate selectivity on the basis of local
> statistics.
> + */
> + fpinfo->local_conds_sel = clauselist_selectivity(root,
> + fpinfo->local_conds,
> + 0,
> + JOIN_INNER,
> + NULL);
> +
> + cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
> +
> /* Estimate the cost of push down */
> estimate_path_cost_size(root, grouped_rel, NIL, NIL,&rows,
> &width,&startup_cost,&total_cost);
Here is a patch for that.
BTW another thing I noticed is this comment on costing aggregate
pushdown paths using local statistics in estimate_path_cost_size:
* Also, core does not care about costing HAVING expressions and
* adding that to the costs. So similarly, here too we are not
* considering remote and local conditions for costing.
I think this was true when aggregate pushdown went in, but isn't anymore
because of commit 7b6c07547190f056b0464098bb5a2247129d7aa2. So we
should update estimate_path_cost_size so that it accounts for the
selectivity and cost of the HAVING expressions as well? I think this
comment needs to be updated at least, though.
Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 3209,3214 **** select array_agg(c1 order by c1 using operator(public.<^)) from ft2 where c2 = 6
--- 3209,3216 ----
Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (("C 1" < 100)) AND ((c2 = 6))
(6 rows)
+ -- Update local stats on ft2
+ ANALYZE ft2;
-- Add into extension
alter extension postgres_fdw add operator class my_op_class using btree;
alter extension postgres_fdw add function my_op_cmp(a int, b int);
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 5496,5501 **** add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
--- 5496,5514 ----
if (!foreign_grouping_ok(root, grouped_rel, extra->havingQual))
return;
+ /*
+ * Compute the selectivity and cost of the local_conds, so we don't have
+ * to do it over again for each path. The best we can do for these
+ * conditions is to estimate selectivity on the basis of local statistics.
+ */
+ fpinfo->local_conds_sel = clauselist_selectivity(root,
+ fpinfo->local_conds,
+ 0,
+ JOIN_INNER,
+ NULL);
+
+ cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
+
/* Estimate the cost of push down */
estimate_path_cost_size(root, grouped_rel, NIL, NIL, &rows,
&width, &startup_cost, &total_cost);
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 807,812 **** create operator class my_op_class for type int using btree family my_op_family a
--- 807,815 ----
explain (verbose, costs off)
select array_agg(c1 order by c1 using operator(public.<^)) from ft2 where c2 = 6 and c1 < 100 group by c2;
+ -- Update local stats on ft2
+ ANALYZE ft2;
+
-- Add into extension
alter extension postgres_fdw add operator class my_op_class using btree;
alter extension postgres_fdw add function my_op_cmp(a int, b int);