Re: [HACKERS] Push down more full joins in postgres_fdw

Etsuro Fujita Tue, 31 Jan 2017 01:00:08 -0800

On 2017/01/30 21:05, Ashutosh Bapat wrote:

On Mon, Jan 30, 2017 at 5:00 PM, Etsuro Fujita
<[email protected]> wrote:

On 2017/01/27 21:25, Etsuro Fujita wrote:

Sorry, I started thinking we went in the wrong direction.  I added to
deparseSelectStmtForRel build_subquery_tlists, which creates a tlist for
each subquery present in a given join tree prior to deparsing a whole
remote query.  But that's nothing but an overhead; we need to create a
tlist for the top-level query because we use it as fdw_scan_tlist, but
not for subqueries, and we need to create retrieved_attrs for the
top-level query while deparsing the targetlist in
deparseExplicitTargetList, but not for subqueries.  So, we should need
some work to avoid such a useless overhead.

I think we can avoid the useless overhead by adding a new function,
deparseSubqueryTargetList, that deparses expressions in the given relation's
reltarget, not the tlist, as a SELECT clause of the subquery representing
the given relation.  That would also allow us to make the 1-to-1
relationship between the subquery output columns and their aliases more
explicit, which was your original comment.  Please find attached the new
version.  (The patch doesn't need the patch to avoid duplicate construction
of the tlist, discussed upthread.)

I have not looked at the patch, but the reason we use a tlist instead
of list of expressions is because fdw_scan_tlist is expected in that
form

Actually, we wouldn't need that to deparse a remote join query; a listof expressions would be enough.

and we don't want two different representations one to deparse
SELECT and one to interpret results from the foreign server. What you
describe above seems to introduce exactly that hazard.


I agree with you to some extent, BUT:

* I don't think it's a good idea to create a tlist for each base/joinrelation that is deparsed as a subquery, to just avoid that hazard. AsI said above, that's nothing but an overhead.* I think we would need to have two different representations for atleast base relations; we use fpinfo->attrs_used to deparse a simpleforeign table scan query for a base relation, but when deparsing thebase relation as a subquery, we would need to use the list ofexpressions in the base relation's reltarget, to deparse a SELECT clauseof the subquery, because we need to deparse a whole-row reference to thebase relation as ROW, not all the user columns expanded, as shown inthis extracted from the regression tests in the patch:


+ EXPLAIN (VERBOSE, COSTS OFF)

+ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 =50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL)ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;+QUERY PLAN

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

+  LockRows
+    Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+    ->  Nested Loop
+          Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+          ->  Foreign Scan
+                Output: ft4.c1, ft4.*, ft5.c1, ft5.*
+                Relations: (public.ft4) FULL JOIN (public.ft5)

+ Remote SQL: SELECT s8.c1, s8.c2, s9.c1, s9.c2 FROM((SELECT c1, ROW(c1, c2, c3) FROM "S 1"."T 3" WHERE ((c1 >= 50)) AND((c1 <= 60))) s8(c1, c2) FULL JOIN (SELECT c1, ROW(c1, c2, c3) FROM "S1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))) s9(c1, c2) ON (((s8.c1 =s9.c1)))) WHERE (((s8.c1 IS NULL) OR (s8.c1 IS NOT NULL))) ORDER BYs8.c1 ASC NULLS LAST, s9.c1 ASC NULLS LAST

+                ->  Hash Full Join
+                      Output: ft4.c1, ft4.*, ft5.c1, ft5.*
+                      Hash Cond: (ft4.c1 = ft5.c1)
+                      Filter: ((ft4.c1 IS NULL) OR (ft4.c1 IS NOT NULL))
+                      ->  Foreign Scan on public.ft4
+                            Output: ft4.c1, ft4.*

+ Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T3" WHERE ((c1 >= 50)) AND ((c1 <= 60))

+                      ->  Hash
+                            Output: ft5.c1, ft5.*
+                            ->  Foreign Scan on public.ft5
+                                  Output: ft5.c1, ft5.*

+ Remote SQL: SELECT c1, c2, c3 FROM "S1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))

+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (25 rows)

Other changes:
* I went back to make_outerrel_subquery and make_innerrel_subquery, which
are flags to indicate whether to deparse the input relations as subqueries.
is_subquery_rel would work well for handling the cases of full joins with
restrictions on the input relations, but I noticed that that wouldn't work
well when extending to handle the cases where we deparse the input relations
as subqueries to evaluate PHVs remotely.

I had objected to using a single variable instead of two previously
and you had argued against it in [1]. There you had mentioned that PHV
doesn't need two variables, but now you are taking the other stand,
without any apparent reason. Can you please clarify it?

Sorry, I missed some cases. Consider the join {A, B, C} satisfying theidentity 3 in optimizer/README, ie,


    (A leftjoin B on (Pab)) leftjoin C on (Pbc)
    = A leftjoin (B leftjoin C on (Pbc)) on (Pab)

where predicate Pbc fails for all-null B rows. Assume that B has a PHVin the relation's reltarget. While considering 2-way joins,foreign_join_ok would determine to deparse B as a subquery and hence setthe B's is_subquery_rel to true for the join relation {A, B}. However,if the planner selects the join order {A leftjoin (B leftjoin C on(Pbc)) on (Pab)} as the cheapest path for the join {A, B, C}, we woulddeparse B as a subquery according to the is_subquery_rel flag whilecreating the FROM clause entry for the join relation {B, C}. Thatwouldn't look good. That would be logically correct, though. To avoidthis, I'd like to go back to the two variables previously proposed.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

Reply via email to