(2018/11/27 21:55), Etsuro Fujita wrote:
> While working on [1], I noticed that since we don't set the selectivity
> and cost of the local_conds (i.e., fpinfo->local_conds_sel and
> fpinfo->local_conds_cost) properly in add_foreign_grouping_paths and
> foreign_grouping_ok, estimate_path_cost_size produces considerably
> underestimated results when use_remote_estimate.  I think this would
> cause problems in later planning steps.  So, why not add something like
> this to add_foreign_grouping_paths, as done in other functions such as
> postgresGetForeignJopinPaths?
> 
> --- a/contrib/postgres_fdw/postgres_fdw.c
> +++ b/contrib/postgres_fdw/postgres_fdw.c
> @@ -5496,6 +5496,19 @@ add_foreign_grouping_paths(PlannerInfo *root,
> RelOptInfo\
>   *input_rel,
>      if (!foreign_grouping_ok(root, grouped_rel, extra->havingQual))
>          return;
> 
> +   /*
> +    * Compute the selectivity and cost of the local_conds, so we don't have
> +    * to do it over again for each path.  The best we can do for these
> +    * conditions is to estimate selectivity on the basis of local
> statistics.
> +    */
> +   fpinfo->local_conds_sel = clauselist_selectivity(root,
> +                                                    fpinfo->local_conds,
> +                                                    0,
> +                                                    JOIN_INNER,
> +                                                    NULL);
> +
> +   cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
> +
>      /* Estimate the cost of push down */
>      estimate_path_cost_size(root, grouped_rel, NIL, NIL,&rows,
>                              &width,&startup_cost,&total_cost);

Here is a patch for that.

BTW another thing I noticed is this comment on costing aggregate
pushdown paths using local statistics in estimate_path_cost_size:

             * Also, core does not care about costing HAVING expressions and
             * adding that to the costs.  So similarly, here too we are not
             * considering remote and local conditions for costing.

I think this was true when aggregate pushdown went in, but isn't anymore
because of commit 7b6c07547190f056b0464098bb5a2247129d7aa2.  So we
should update estimate_path_cost_size so that it accounts for the
selectivity and cost of the HAVING expressions as well?  I think this
comment needs to be updated at least, though.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 3209,3214 **** select array_agg(c1 order by c1 using operator(public.<^)) from ft2 where c2 = 6
--- 3209,3216 ----
           Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (("C 1" < 100)) AND ((c2 = 6))
  (6 rows)
  
+ -- Update local stats on ft2
+ ANALYZE ft2;
  -- Add into extension
  alter extension postgres_fdw add operator class my_op_class using btree;
  alter extension postgres_fdw add function my_op_cmp(a int, b int);
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 5496,5501 **** add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
--- 5496,5514 ----
  	if (!foreign_grouping_ok(root, grouped_rel, extra->havingQual))
  		return;
  
+ 	/*
+ 	 * Compute the selectivity and cost of the local_conds, so we don't have
+ 	 * to do it over again for each path.  The best we can do for these
+ 	 * conditions is to estimate selectivity on the basis of local statistics.
+ 	 */
+ 	fpinfo->local_conds_sel = clauselist_selectivity(root,
+ 													 fpinfo->local_conds,
+ 													 0,
+ 													 JOIN_INNER,
+ 													 NULL);
+ 
+ 	cost_qual_eval(&fpinfo->local_conds_cost, fpinfo->local_conds, root);
+ 
  	/* Estimate the cost of push down */
  	estimate_path_cost_size(root, grouped_rel, NIL, NIL, &rows,
  							&width, &startup_cost, &total_cost);
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 807,812 **** create operator class my_op_class for type int using btree family my_op_family a
--- 807,815 ----
  explain (verbose, costs off)
  select array_agg(c1 order by c1 using operator(public.<^)) from ft2 where c2 = 6 and c1 < 100 group by c2;
  
+ -- Update local stats on ft2
+ ANALYZE ft2;
+ 
  -- Add into extension
  alter extension postgres_fdw add operator class my_op_class using btree;
  alter extension postgres_fdw add function my_op_cmp(a int, b int);

Reply via email to