Re: [HACKERS] Rowcounts marked by create_foreignscan_path()

2014-03-06 Thread Tom Lane
Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 Maybe I'm missing something.  But I don't think
 postgresGetForeignPaths() marks the parameterized path with the correct
 row estimate.  Also, that function doesn't seem to estimate the cost of
 the parameterized path correctly.  Please find attached a patch.

[ looks at that... ]  Oh, I see what you're worried about: we select
a specific safe join clause to drive creation of the parameterized
path, but then there might be additional join clauses that come along
with that because they're also movable to the parameterized path.
You're right, the current postgres_fdw code is not accounting for that.

I experimented with this patch using this test case in the database
created by the postgres_fdw regression test:

explain verbose SELECT * FROM ft2 a, ft2 b WHERE
  a.c2 = 47 AND b.c1 = a.c1 and a.c6 = 'fooo' and b.c7=a.c7;

What I saw in this example was that with the patch, the join clause got
counted *twice* in the cost estimate, because the clause returned by
generate_implied_equalities_for_column() isn't pointer-equal to the one in
the ppi_clauses list (note the comment about such cases in
postgresGetForeignPlan).  So that's not right.

However, we've got bigger problems than that: with or without the patch,

explain verbose SELECT * FROM ft2 a, ft2 b WHERE
  a.c2 = 47 AND b.c1 = a.c1 and a.c6 = 'fooo' and b.c7=upper(a.c7);

fails with
TRAP: FailedAssertion(!(is_foreign_expr(root, baserel, rinfo-clause)), File: 
postgres_fdw.c, Line: 759)

This happens because we select the safe joinclause b.c1=a.c1, and then
the PPI machinery adds all the other movable join clauses for that outer
relation, including the unsafe one involving an upper() call.
postgresGetForeignPlan believes that the only joinclauses it can find in
scan_clauses are the safe one(s) selected by postgresGetForeignPaths, but
that's completely wrong.  I think this is probably relatively simple to
fix: in postgresGetForeignPlan, we need to actually test whether a join
clause is safe or not, and shove it into the appropriate list.

So far as the original issue goes, I think what this consideration shows
is that postgresGetForeignPaths is going about things incorrectly.
I thought that considering only one join clause per path would be valid,
if perhaps sometimes dumb.  But the way the movable-clause machinery works,
we don't get to consider only one join clause; it's all-or-nothing for a
given outer relation.  So what we really ought to be doing here is
generating one parameterized path per valid outer relation.

What we could possibly do to resolve this without much extra code is to
keep using the same logic as a guide to which outer relations are
interesting, but for each such relation, do get_baserel_parampathinfo()
and then use the ppi_clauses list as the quals-to-apply for purposes of
cost/selectivity estimation.  postgresGetForeignPaths would have to check
each of those clauses for safety rather than assuming anything.  One
possible benefit is that we could record the results in the path and not
have to repeat the is_foreign_expr tests in postgresGetForeignPlan.
I'm not entirely sure if that's worth the trouble though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rowcounts marked by create_foreignscan_path()

2014-03-03 Thread Etsuro Fujita
(2014/02/18 12:37), Tom Lane wrote:
 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 (2014/02/18 12:03), Tom Lane wrote:
 The calling FDW is supposed to do that; note the header comment.
 
 However, ISTM postgresGetForeignPaths() doesn't work like
 that.  It uses the same rowcount for all paths of a same parameterization?
 
 That's what we want no?

Maybe I'm missing something.  But I don't think
postgresGetForeignPaths() marks the parameterized path with the correct
row estimate.  Also, that function doesn't seem to estimate the cost of
the parameterized path correctly.  Please find attached a patch.

 Anyway, the point of using ppi_rows would be to enforce that all the
 rowcount estimates for a given parameterized relation are the same.
 In the FDW case, all those estimates are the FDW's responsibility,
 and so making them consistent is also its responsibility IMO.
 
 Another way of looking at this is that none of the pathnode creation
 routines in pathnode.c are responsible for setting rowcount estimates.
 That's done by whatever is setting the cost estimate; this must be so,
 else the cost estimate is surely bogus.  So any way you slice it, the
 FDW has to get it right.

Understood.  Thank you for the analysis.

Sorry for the delay.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 295,300  static bool postgresAnalyzeForeignTable(Relation relation,
--- 295,301 
  static void estimate_path_cost_size(PlannerInfo *root,
RelOptInfo *baserel,
List *join_conds,
+   List *param_conds,
double *p_rows, int *p_width,
Cost *p_startup_cost, Cost 
*p_total_cost);
  static void get_remote_estimate(const char *sql,
***
*** 493,499  postgresGetForeignRelSize(PlannerInfo *root,
 * values in fpinfo so we don't need to do it again to generate 
the
 * basic foreign path.
 */
!   estimate_path_cost_size(root, baserel, NIL,
fpinfo-rows, 
fpinfo-width,

fpinfo-startup_cost, fpinfo-total_cost);
  
--- 494,500 
 * values in fpinfo so we don't need to do it again to generate 
the
 * basic foreign path.
 */
! estimate_path_cost_size(root, baserel, NIL, NIL,
fpinfo-rows, 
fpinfo-width,

fpinfo-startup_cost, fpinfo-total_cost);
  
***
*** 523,529  postgresGetForeignRelSize(PlannerInfo *root,
set_baserel_size_estimates(root, baserel);
  
/* Fill in basically-bogus cost estimates for use later. */
!   estimate_path_cost_size(root, baserel, NIL,
fpinfo-rows, 
fpinfo-width,

fpinfo-startup_cost, fpinfo-total_cost);
}
--- 524,530 
set_baserel_size_estimates(root, baserel);
  
/* Fill in basically-bogus cost estimates for use later. */
!   estimate_path_cost_size(root, baserel, NIL, NIL,
fpinfo-rows, 
fpinfo-width,

fpinfo-startup_cost, fpinfo-total_cost);
}
***
*** 541,547  postgresGetForeignPaths(PlannerInfo *root,
--- 542,550 
PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel-fdw_private;
ForeignPath *path;
List   *join_quals;
+   List   *param_quals;
Relids  required_outer;
+   ParamPathInfo *param_info;
double  rows;
int width;
Coststartup_cost;
***
*** 591,604  postgresGetForeignPaths(PlannerInfo *root,
if (!is_foreign_expr(root, baserel, rinfo-clause))
continue;
  
-   /*
-* OK, get a cost estimate from the remote, and make a path.
-*/
-   join_quals = list_make1(rinfo);
-   estimate_path_cost_size(root, baserel, join_quals,
-   rows, width,
-   startup_cost, 
total_cost);
- 
/* Must calculate required outer rels for this path */
required_outer = bms_union(rinfo-clause_relids,
 

Re: [HACKERS] Rowcounts marked by create_foreignscan_path()

2014-02-17 Thread Tom Lane
Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 Why does create_foreignscan_path() not set the rowcounts based on
 ParamPathInfo when the path is a parameterized path?

The calling FDW is supposed to do that; note the header comment.
I'm not sure that it'd be an improvement to change the API spec
to be create_foreignscan_path has no intelligence, except that
sometimes it will decide to override your rows estimate anyway;
nonetheless, it takes your cost estimate as gospel.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rowcounts marked by create_foreignscan_path()

2014-02-17 Thread Etsuro Fujita
(2014/02/18 12:03), Tom Lane wrote:
 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 Why does create_foreignscan_path() not set the rowcounts based on
 ParamPathInfo when the path is a parameterized path?

 The calling FDW is supposed to do that; note the header comment.

Understood.  However, ISTM postgresGetForeignPaths() doesn't work like
that.  It uses the same rowcount for all paths of a same parameterization?

Thanks,

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rowcounts marked by create_foreignscan_path()

2014-02-17 Thread Tom Lane
Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 (2014/02/18 12:03), Tom Lane wrote:
 The calling FDW is supposed to do that; note the header comment.

 Understood.  However, ISTM postgresGetForeignPaths() doesn't work like
 that.  It uses the same rowcount for all paths of a same parameterization?

That's what we want no?

Anyway, the point of using ppi_rows would be to enforce that all the
rowcount estimates for a given parameterized relation are the same.
In the FDW case, all those estimates are the FDW's responsibility,
and so making them consistent is also its responsibility IMO.

Another way of looking at this is that none of the pathnode creation
routines in pathnode.c are responsible for setting rowcount estimates.
That's done by whatever is setting the cost estimate; this must be so,
else the cost estimate is surely bogus.  So any way you slice it, the
FDW has to get it right.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers