PFA the patch with all the comments addressed.
On Tue, Oct 13, 2015 at 10:07 PM, Robert Haas <[email protected]> wrote:
> On Tue, Oct 13, 2015 at 3:29 AM, Ashutosh Bapat
> <[email protected]> wrote:
> >> - You consider pushing down ORDER BY if any prefix of the query
> >> pathkeys satisfy is_foreign_expr(), but that doesn't seem right to me.
> >> If the user says SELECT * FROM remotetab ORDER BY a, unsafe(a),
> >> ordering the result set by a doesn't help us much. We've talked a few
> >> times about an incremental sort capability that would take a stream of
> >> tuples already ordered by one or more columns and sort each group by
> >> additional columns, but I don't think we have that currently. Without
> >> that capability, I don't think there's much benefit in sorting by a
> >> prefix of the pathkeys. I suspect that if we can't get them all, it's
> >> not worth doing.
> >
> > I somehow thought, we are using output, which is ordered by prefix of
> > pathkeys in Sort nodes. But as you rightly pointed out that's not the
> case.
> > Only complete pathkeys are useful.
>
> A truncated list of pathkeys is useful for merge joins, but not for
> toplevel ordering.
>
Ok. Taken care in the attached patch.
>
> >> - Right now, you have this code below the point where we bail out if
> >> use_remote_estimate is not set. If we keep it like that, the comment
> >> needs updating. But I suggest that we consider an ordered path even
> >> if we are not using remote estimates. Estimate the sorted path to
> >> cost somewhat more than the unsorted path, so that we only choose that
> >> path if the sort actually benefits us. I don't know exactly how to
> >> come up with a principled estimate given that we don't actually know
> >> whether the remote side will need an extra sort or not, but maybe a
> >> dumb estimate is still better than not trying a sorted path.
> >
> > I like that idea, although there are two questions
> > 1. How can we estimate cost of getting the data sorted? If there is an
> > appropriate index on foreign server we can get the data sorted at no
> extra
> > cost. If there isn't the cost of sorting is proportionate to NlogN where
> N
> > is the size of data. It seems unreasonable to arrive at the cost of
> sorting
> > by multiplying with some constant multiplier. Also, the constant
> multiplier
> > to the NlogN estimate depends heavily upon the properties of foreign
> server
> > e.g. size of memory available for sorting, disk and CPU speed etc. The
> last
> > two might have got factored into fdw_tuple_cost and fdw_startup_cost, so
> > that's probably taken care of. If the estimate we come up turns out to be
> > too pessimistic, we will not get sorted data even if that's the right
> thing
> > to do. If too optimistic, we will incur heavy cost at the time of
> execution.
> > Setting the cost estimate to some constant factor of NlogN would be too
> > pessimistic if there is an appropriate index on foreign server. Otherway
> > round if there isn't an appropriate index on foreign server.
> >
> > Even if we leave these arguments aside for a while, the question remains
> as
> > to what should be the constant factor 10% or 20% or 50% or 100% or
> something
> > else on top of the estimate for simple foreign table scan estimates (or
> > NlogN of that)? I am unable to justify any of these factors myself. What
> do
> > you say?
>
> I think we want to estimate the cost in such a way that we'll tend to
> pick the ordered path if it's useful, but skip it if it's not. So,
> say we pick 10%. That's definitely enough that we won't pick a remote
> sort when it's useless, but it's small enough that if a remote sort is
> useful, we will probably choose to do it. I think that's what we
> want. I believe we should err on the side of a small estimate because
> it's generally better to do as much work as possible on the remote
> side. In some cases the sort may turn out to be free at execution
> time because the remote server was going to generate the results in
> that order anyway, and it may know that because of its own pathkeys,
> and thus be able to skip the explicit ordering step.
>
The patch uses a factor of 1.1 (10% increase) to multiple the startup and
total costs in fpinfo for unsorted data.
This change has caused the plans for few queries in the test postgres_fdw
to change. There is ORDER BY and LIMIT clause in the queries in
postgres_fdw testcase to keep test outputs sane and consistent. These ORDER
BY clauses are not being pushed down the foreign servers. I tried using
values upto 2 for this but still the foreign paths with pathkeys won over
those without pathkeys.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 697de60..25d8650 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -186,23 +186,26 @@ is_foreign_expr(PlannerInfo *root,
* Check that the expression consists of nodes that are safe to execute
* remotely.
*/
glob_cxt.root = root;
glob_cxt.foreignrel = baserel;
loc_cxt.collation = InvalidOid;
loc_cxt.state = FDW_COLLATE_NONE;
if (!foreign_expr_walker((Node *) expr, &glob_cxt, &loc_cxt))
return false;
- /* Expressions examined here should be boolean, ie noncollatable */
- Assert(loc_cxt.collation == InvalidOid);
- Assert(loc_cxt.state == FDW_COLLATE_NONE);
+ /*
+ * The collation of the expression should be none or originate from a
+ * foreign var.
+ */
+ Assert(loc_cxt.state == FDW_COLLATE_NONE ||
+ loc_cxt.state == FDW_COLLATE_SAFE);
/*
* An expression which includes any mutable functions can't be sent over
* because its result is not stable. For example, sending now() remote
* side could cause confusion from clock offsets. Future versions might
* be able to make this choice with more granularity. (We check this last
* because it requires a lot of expensive catalog lookups.)
*/
if (contain_mutable_functions((Node *) expr))
return false;
@@ -1870,10 +1873,50 @@ printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod,
*/
static void
printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
deparse_expr_cxt *context)
{
StringInfo buf = context->buf;
char *ptypename = format_type_with_typemod(paramtype, paramtypmod);
appendStringInfo(buf, "((SELECT null::%s)::%s)", ptypename, ptypename);
}
+
+void
+appendOrderByClause(StringInfo buf, PlannerInfo *root, RelOptInfo *baserel, List *pathkeys)
+{
+ ListCell *lcell;
+ deparse_expr_cxt context;
+ int nestlevel;
+ char *delim = " ";
+
+ /* Set up context struct for recursion */
+ context.root = root;
+ context.foreignrel = baserel;
+ context.buf = buf;
+ context.params_list = NULL;
+
+ /* Make sure any constants in the exprs are printed portably */
+ nestlevel = set_transmission_modes();
+
+ appendStringInfo(buf, " ORDER BY");
+ foreach(lcell, pathkeys)
+ {
+ PathKey *pathkey = lfirst(lcell);
+ Expr *em_expr;
+
+ em_expr = find_em_expr_for_rel(pathkey->pk_eclass, baserel);
+ Assert(em_expr);
+ appendStringInfo(buf, "%s", delim);
+ deparseExpr(em_expr, &context);
+ if (pathkey->pk_strategy == BTLessStrategyNumber)
+ appendStringInfo(buf, " ASC");
+ else
+ appendStringInfo(buf, " DESC");
+
+ if (pathkey->pk_nulls_first)
+ appendStringInfo(buf, " NULLS FIRST");
+
+ delim = ", ";
+ }
+ reset_transmission_modes(nestlevel);
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 65ea6e8..e8d9e21 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -129,84 +129,76 @@ ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
-- Now we should be able to run ANALYZE.
-- To exercise multiple code paths, we use local stats on ft1
-- and remote-estimate mode on ft2.
ANALYZE ft1;
ALTER FOREIGN TABLE ft2 OPTIONS (use_remote_estimate 'true');
-- ===================================================================
-- simple queries
-- ===================================================================
-- single table, with/without alias
EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
- QUERY PLAN
----------------------------------
+ QUERY PLAN
+---------------------------
Limit
- -> Sort
- Sort Key: c3, c1
- -> Foreign Scan on ft1
-(4 rows)
+ -> Foreign Scan on ft1
+(2 rows)
SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
-----+----+-------+------------------------------+--------------------------+----+------------+-----
101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
(10 rows)
EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
- QUERY PLAN
--------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
Limit
Output: c1, c2, c3, c4, c5, c6, c7, c8
- -> Sort
+ -> Foreign Scan on public.ft1 t1
Output: c1, c2, c3, c4, c5, c6, c7, c8
- Sort Key: t1.c3, t1.c1
- -> Foreign Scan on public.ft1 t1
- Output: c1, c2, c3, c4, c5, c6, c7, c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
-(8 rows)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" ORDER BY c3 ASC, "C 1" ASC
+(5 rows)
SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
-----+----+-------+------------------------------+--------------------------+----+------------+-----
101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
(10 rows)
-- whole-row reference
EXPLAIN (VERBOSE, COSTS false) SELECT t1 FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
- QUERY PLAN
--------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
Limit
Output: t1.*, c3, c1
- -> Sort
+ -> Foreign Scan on public.ft1 t1
Output: t1.*, c3, c1
- Sort Key: t1.c3, t1.c1
- -> Foreign Scan on public.ft1 t1
- Output: t1.*, c3, c1
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
-(8 rows)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" ORDER BY c3 ASC, "C 1" ASC
+(5 rows)
SELECT t1 FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
t1
--------------------------------------------------------------------------------------------
(101,1,00101,"Fri Jan 02 00:00:00 1970 PST","Fri Jan 02 00:00:00 1970",1,"1 ",foo)
(102,2,00102,"Sat Jan 03 00:00:00 1970 PST","Sat Jan 03 00:00:00 1970",2,"2 ",foo)
(103,3,00103,"Sun Jan 04 00:00:00 1970 PST","Sun Jan 04 00:00:00 1970",3,"3 ",foo)
(104,4,00104,"Mon Jan 05 00:00:00 1970 PST","Mon Jan 05 00:00:00 1970",4,"4 ",foo)
(105,5,00105,"Tue Jan 06 00:00:00 1970 PST","Tue Jan 06 00:00:00 1970",5,"5 ",foo)
(106,6,00106,"Wed Jan 07 00:00:00 1970 PST","Wed Jan 07 00:00:00 1970",6,"6 ",foo)
@@ -643,20 +635,46 @@ SELECT * FROM ft1 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft2 WHERE c1 < 5));
SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5));
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
----+----+-------+------------------------------+--------------------------+----+------------+-----
1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
4 | 4 | 00004 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
(4 rows)
+-- pushing down pathkeys to the foreign server
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft2 t2 ORDER BY t2.c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------
+ Limit
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ -> Foreign Scan on public.ft2 t2
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" ORDER BY "C 1" ASC
+(5 rows)
+
+SELECT * FROM ft2 t2 ORDER BY t2.c1 OFFSET 100 LIMIT 10;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+-----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+ 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+ 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+ 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
+ 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
+ 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
+ 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
+ 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
+ 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
+ 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
+(10 rows)
+
-- ===================================================================
-- parameterized queries
-- ===================================================================
-- simple join
PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
QUERY PLAN
--------------------------------------------------------------------
Nested Loop
Output: t1.c3, t2.c3
@@ -676,71 +694,62 @@ EXECUTE st1(1, 1);
EXECUTE st1(101, 101);
c3 | c3
-------+-------
00101 | 00101
(1 row)
-- subquery using stable function (can't be sent to remote)
PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
- QUERY PLAN
-----------------------------------------------------------------------------------------------------------
- Sort
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------
+ Nested Loop Semi Join
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Sort Key: t1.c1
- -> Nested Loop Semi Join
+ -> Foreign Scan on public.ft1 t1
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Join Filter: (t1.c3 = t2.c3)
- -> Foreign Scan on public.ft1 t1
- Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20))
- -> Materialize
- Output: t2.c3
- -> Foreign Scan on public.ft2 t2
- Output: t2.c3
- Filter: (date(t2.c4) = '01-17-1970'::date)
- Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
-(15 rows)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) ORDER BY "C 1" ASC
+ -> Foreign Scan on public.ft2 t2
+ Output: t2.c3
+ Filter: (date(t2.c4) = '01-17-1970'::date)
+ Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND (($1::text = c3))
+(9 rows)
EXECUTE st2(10, 20);
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
----+----+-------+------------------------------+--------------------------+----+------------+-----
16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo
(1 row)
EXECUTE st2(101, 121);
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
-----+----+-------+------------------------------+--------------------------+----+------------+-----
116 | 6 | 00116 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo
(1 row)
-- subquery using immutable function (can be sent to remote)
PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
- Sort
+ Nested Loop Semi Join
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Sort Key: t1.c1
- -> Nested Loop Semi Join
+ Join Filter: (t1.c3 = t2.c3)
+ -> Foreign Scan on public.ft1 t1
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Join Filter: (t1.c3 = t2.c3)
- -> Foreign Scan on public.ft1 t1
- Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20))
- -> Materialize
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) ORDER BY "C 1" ASC
+ -> Materialize
+ Output: t2.c3
+ -> Foreign Scan on public.ft2 t2
Output: t2.c3
- -> Foreign Scan on public.ft2 t2
- Output: t2.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
-(14 rows)
+ Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
+(11 rows)
EXECUTE st3(10, 20);
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
----+----+-------+------------------------------+--------------------------+----+------------+-----
16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo
(1 row)
EXECUTE st3(20, 30);
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
----+----+----+----+----+----+----+----
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index e4d799c..d2ead22 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -40,20 +40,21 @@
#include "utils/sampling.h"
PG_MODULE_MAGIC;
/* Default CPU cost to start up a foreign query. */
#define DEFAULT_FDW_STARTUP_COST 100.0
/* Default CPU cost to process 1 row (above and beyond cpu_tuple_cost). */
#define DEFAULT_FDW_TUPLE_COST 0.01
+#define DEFAULT_FDW_SORT_MULTIPLIER 1.1
/*
* FDW-specific planner information kept in RelOptInfo.fdw_private for a
* foreign table. This information is collected by postgresGetForeignRelSize.
*/
typedef struct PgFdwRelationInfo
{
/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
List *remote_conds;
List *local_conds;
@@ -289,20 +290,21 @@ static bool postgresAnalyzeForeignTable(Relation relation,
BlockNumber *totalpages);
static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
Oid serverOid);
/*
* Helper functions
*/
static void estimate_path_cost_size(PlannerInfo *root,
RelOptInfo *baserel,
List *join_conds,
+ List *pathkeys,
double *p_rows, int *p_width,
Cost *p_startup_cost, Cost *p_total_cost);
static void get_remote_estimate(const char *sql,
PGconn *conn,
double *rows,
int *width,
Cost *startup_cost,
Cost *total_cost);
static bool ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
EquivalenceClass *ec, EquivalenceMember *em,
@@ -490,21 +492,21 @@ postgresGetForeignRelSize(PlannerInfo *root,
* average row width. Otherwise, estimate using whatever statistics we
* have locally, in a way similar to ordinary tables.
*/
if (fpinfo->use_remote_estimate)
{
/*
* Get cost/size estimates with help of remote server. Save the
* values in fpinfo so we don't need to do it again to generate the
* basic foreign path.
*/
- estimate_path_cost_size(root, baserel, NIL,
+ estimate_path_cost_size(root, baserel, NIL, NIL,
&fpinfo->rows, &fpinfo->width,
&fpinfo->startup_cost, &fpinfo->total_cost);
/* Report estimated baserel size to planner. */
baserel->rows = fpinfo->rows;
baserel->width = fpinfo->width;
}
else
{
/*
@@ -520,57 +522,126 @@ postgresGetForeignRelSize(PlannerInfo *root,
{
baserel->pages = 10;
baserel->tuples =
(10 * BLCKSZ) / (baserel->width + MAXALIGN(SizeofHeapTupleHeader));
}
/* Estimate baserel size as best we can with local statistics. */
set_baserel_size_estimates(root, baserel);
/* Fill in basically-bogus cost estimates for use later. */
- estimate_path_cost_size(root, baserel, NIL,
+ estimate_path_cost_size(root, baserel, NIL, NIL,
&fpinfo->rows, &fpinfo->width,
&fpinfo->startup_cost, &fpinfo->total_cost);
}
}
/*
* postgresGetForeignPaths
* Create possible scan paths for a scan on the foreign table
*/
static void
postgresGetForeignPaths(PlannerInfo *root,
RelOptInfo *baserel,
Oid foreigntableid)
{
PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
ForeignPath *path;
List *ppi_list;
ListCell *lc;
+ List *usable_pathkeys = NIL;
/*
* Create simplest ForeignScan path node and add it to baserel. This path
* corresponds to SeqScan path of regular tables (though depending on what
* baserestrict conditions we were able to send to remote, there might
* actually be an indexscan happening there). We already did all the work
* to estimate cost and size of this path.
*/
path = create_foreignscan_path(root, baserel,
fpinfo->rows,
fpinfo->startup_cost,
fpinfo->total_cost,
NIL, /* no pathkeys */
NULL, /* no outer rel either */
NIL); /* no fdw_private list */
add_path(baserel, (Path *) path);
/*
+ * If root->query_pathkeys belongs entirely to this baserel and
+ * the expressions involved can be evaluated on the foreign server, getting
+ * the rows sorted according to those pathkeys might be better than sorting
+ * the data locally. There can be indexes on the foreign server which can
+ * be used to sort the data faster at the foreign server.
+ */
+ foreach(lc, root->query_pathkeys)
+ {
+ PathKey *pathkey = (PathKey *)lfirst(lc);
+ EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
+ Expr *em_expr = find_em_expr_for_rel(pathkey_ec, baserel);
+
+ if (em_expr && is_foreign_expr(root, baserel, em_expr))
+ usable_pathkeys = lappend(usable_pathkeys, pathkey);
+ else
+ {
+ /* We found a pathkey which doesn't belong to given relation. */
+ list_free(usable_pathkeys);
+ usable_pathkeys = NIL;
+ break;
+ }
+ }
+
+ /* Create a path with useful pathkeys, if we found one. */
+ if (usable_pathkeys)
+ {
+ double rows;
+ int width;
+ Cost startup_cost;
+ Cost total_cost;
+
+ /*
+ * Use estimates from the foreign server if allowed, otherwise, cook
+ * them locally.
+ */
+ if (fpinfo->use_remote_estimate)
+ estimate_path_cost_size(root, baserel, NIL, usable_pathkeys,
+ &rows, &width, &startup_cost, &total_cost);
+ else
+ {
+ /*
+ * We are not allowed to consult the foreign server for estimating
+ * cost of sorting. If there are indexes on the columns on which we
+ * expect the data to be sorted and they cover the columns we fetch,
+ * we will get the data sorted at no extra cost. But if that's not
+ * the case, foreign server will incur cost for sorting it. Assume
+ * it to be 10% higher than the cost for getting the data unsorted.
+ * That is large enough not to make remote sort attractive when it's
+ * useless, but it's small enough that to make a remote sort
+ * attracive when useful. Erring on the smaller estimate side is
+ * generally better to do as much work as possible on the
+ * remote side.
+ */
+ fpinfo->startup_cost = fpinfo->startup_cost * DEFAULT_FDW_SORT_MULTIPLIER;
+ fpinfo->total_cost = fpinfo->total_cost * DEFAULT_FDW_SORT_MULTIPLIER;
+ }
+
+ path = create_foreignscan_path(root, baserel,
+ rows,
+ startup_cost,
+ total_cost,
+ usable_pathkeys,
+ NULL,
+ NIL);
+ add_path(baserel, (Path *)path);
+ }
+
+ /*
* If we're not using remote estimates, stop here. We have no way to
* estimate whether any join clauses would be worth sending across, so
* don't bother building parameterized paths.
*/
if (!fpinfo->use_remote_estimate)
return;
/*
* Thumb through all join clauses for the rel to identify which outer
* relations could supply one or more safe-to-send-to-remote join clauses.
@@ -703,21 +774,21 @@ postgresGetForeignPaths(PlannerInfo *root,
foreach(lc, ppi_list)
{
ParamPathInfo *param_info = (ParamPathInfo *) lfirst(lc);
double rows;
int width;
Cost startup_cost;
Cost total_cost;
/* Get a cost estimate from the remote */
estimate_path_cost_size(root, baserel,
- param_info->ppi_clauses,
+ param_info->ppi_clauses, NIL,
&rows, &width,
&startup_cost, &total_cost);
/*
* ppi_rows currently won't get looked at by anything, but still we
* may as well ensure that it matches our idea of the rowcount.
*/
param_info->ppi_rows = rows;
/* Make the path */
@@ -797,20 +868,24 @@ postgresGetForeignPlan(PlannerInfo *root,
* Build the query string to be sent for execution, and identify
* expressions to be sent as parameters.
*/
initStringInfo(&sql);
deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
&retrieved_attrs);
if (remote_conds)
appendWhereClause(&sql, root, baserel, remote_conds,
true, ¶ms_list);
+ /* Add ORDER BY clause if we found any useful pathkeys */
+ if (best_path->path.pathkeys)
+ appendOrderByClause(&sql, root, baserel, best_path->path.pathkeys);
+
/*
* Add FOR UPDATE/SHARE if appropriate. We apply locking during the
* initial row fetch, rather than later on as is done for local tables.
* The extra roundtrips involved in trying to duplicate the local
* semantics exactly don't seem worthwhile (see also comments for
* RowMarkType).
*
* Note: because we actually run the query as a cursor, this assumes that
* DECLARE CURSOR ... FOR UPDATE is supported, which it isn't before 8.3.
*/
@@ -1713,20 +1788,21 @@ postgresExplainForeignModify(ModifyTableState *mtstate,
* estimate_path_cost_size
* Get cost and size estimates for a foreign scan
*
* We assume that all the baserestrictinfo clauses will be applied, plus
* any join clauses listed in join_conds.
*/
static void
estimate_path_cost_size(PlannerInfo *root,
RelOptInfo *baserel,
List *join_conds,
+ List *pathkeys,
double *p_rows, int *p_width,
Cost *p_startup_cost, Cost *p_total_cost)
{
PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
double rows;
double retrieved_rows;
int width;
Cost startup_cost;
Cost total_cost;
Cost run_cost;
@@ -1765,20 +1841,23 @@ estimate_path_cost_size(PlannerInfo *root,
appendStringInfoString(&sql, "EXPLAIN ");
deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
&retrieved_attrs);
if (fpinfo->remote_conds)
appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
true, NULL);
if (remote_join_conds)
appendWhereClause(&sql, root, baserel, remote_join_conds,
(fpinfo->remote_conds == NIL), NULL);
+ if (pathkeys)
+ appendOrderByClause(&sql, root, baserel, pathkeys);
+
/* Get the remote estimate */
conn = GetConnection(fpinfo->server, fpinfo->user, false);
get_remote_estimate(sql.data, conn, &rows, &width,
&startup_cost, &total_cost);
ReleaseConnection(conn);
retrieved_rows = rows;
/* Factor in the selectivity of the locally-checked quals */
local_sel = clauselist_selectivity(root,
@@ -2987,10 +3066,39 @@ static void
conversion_error_callback(void *arg)
{
ConversionLocation *errpos = (ConversionLocation *) arg;
TupleDesc tupdesc = RelationGetDescr(errpos->rel);
if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
errcontext("column \"%s\" of foreign table \"%s\"",
NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname),
RelationGetRelationName(errpos->rel));
}
+
+/*
+ * find_em_expr_for_rel
+ * Find an equivalence class member expression, all Vars in which, belong to the
+ * given relation.
+ */
+extern Expr *
+find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
+{
+ ListCell *lc_em;
+ foreach(lc_em, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst(lc_em);
+
+ if (bms_equal(em->em_relids, rel->relids))
+ {
+ /*
+ * Found at least one expression whose all Vars come from given
+ * relation. If there are more than one of those, all of them
+ * will be equivalent. It's sufficient to find any one of such
+ * expressions.
+ */
+ return em->em_expr;
+ }
+ }
+
+ /* We didn't find any suitable equivalence class expression */
+ return NULL;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 3835ddb..8956cd2 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -67,12 +67,15 @@ extern void deparseUpdateSql(StringInfo buf, PlannerInfo *root,
List *targetAttrs, List *returningList,
List **retrieved_attrs);
extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
Index rtindex, Relation rel,
List *returningList,
List **retrieved_attrs);
extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
List **retrieved_attrs);
extern void deparseStringLiteral(StringInfo buf, const char *val);
+extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
+extern void appendOrderByClause(StringInfo buf, PlannerInfo *root,
+ RelOptInfo *baserel, List *pathkeys);
#endif /* POSTGRES_FDW_H */
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 11160f8..8f45238 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -207,20 +207,23 @@ EXPLAIN (VERBOSE, COSTS false)
SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
-- check both safe and unsafe join conditions
EXPLAIN (VERBOSE, COSTS false)
SELECT * FROM ft2 a, ft2 b
WHERE a.c2 = 6 AND b.c1 = a.c1 AND a.c8 = 'foo' AND b.c7 = upper(a.c7);
SELECT * FROM ft2 a, ft2 b
WHERE a.c2 = 6 AND b.c1 = a.c1 AND a.c8 = 'foo' AND b.c7 = upper(a.c7);
-- bug before 9.3.5 due to sloppy handling of remote-estimate parameters
SELECT * FROM ft1 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft2 WHERE c1 < 5));
SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5));
+-- pushing down pathkeys to the foreign server
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft2 t2 ORDER BY t2.c1 OFFSET 100 LIMIT 10;
+SELECT * FROM ft2 t2 ORDER BY t2.c1 OFFSET 100 LIMIT 10;
-- ===================================================================
-- parameterized queries
-- ===================================================================
-- simple join
PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
EXECUTE st1(1, 1);
EXECUTE st1(101, 101);
-- subquery using stable function (can't be sent to remote)
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers