On 2017/01/30 21:05, Ashutosh Bapat wrote:
On Mon, Jan 30, 2017 at 5:00 PM, Etsuro Fujita
<fujita.ets...@lab.ntt.co.jp> wrote:
On 2017/01/27 21:25, Etsuro Fujita wrote:
Sorry, I started thinking we went in the wrong direction.  I added to
deparseSelectStmtForRel build_subquery_tlists, which creates a tlist for
each subquery present in a given join tree prior to deparsing a whole
remote query.  But that's nothing but an overhead; we need to create a
tlist for the top-level query because we use it as fdw_scan_tlist, but
not for subqueries, and we need to create retrieved_attrs for the
top-level query while deparsing the targetlist in
deparseExplicitTargetList, but not for subqueries.  So, we should need
some work to avoid such a useless overhead.

I think we can avoid the useless overhead by adding a new function,
deparseSubqueryTargetList, that deparses expressions in the given relation's
reltarget, not the tlist, as a SELECT clause of the subquery representing
the given relation.  That would also allow us to make the 1-to-1
relationship between the subquery output columns and their aliases more
explicit, which was your original comment.  Please find attached the new
version.  (The patch doesn't need the patch to avoid duplicate construction
of the tlist, discussed upthread.)

I have not looked at the patch, but the reason we use a tlist instead
of list of expressions is because fdw_scan_tlist is expected in that
form

Actually, we wouldn't need that to deparse a remote join query; a list of expressions would be enough.

and we don't want two different representations one to deparse
SELECT and one to interpret results from the foreign server. What you
describe above seems to introduce exactly that hazard.

I agree with you to some extent, BUT:
* I don't think it's a good idea to create a tlist for each base/join relation that is deparsed as a subquery, to just avoid that hazard. As I said above, that's nothing but an overhead. * I think we would need to have two different representations for at least base relations; we use fpinfo->attrs_used to deparse a simple foreign table scan query for a base relation, but when deparsing the base relation as a subquery, we would need to use the list of expressions in the base relation's reltarget, to deparse a SELECT clause of the subquery, because we need to deparse a whole-row reference to the base relation as ROW, not all the user columns expanded, as shown in this extracted from the regression tests in the patch:

+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1; + QUERY PLAN


+ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+    ->  Nested Loop
+          Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+          ->  Foreign Scan
+                Output: ft4.c1, ft4.*, ft5.c1, ft5.*
+                Relations: (public.ft4) FULL JOIN (public.ft5)
+ Remote SQL: SELECT s8.c1, s8.c2, s9.c1, s9.c2 FROM ((SELECT c1, ROW(c1, c2, c3) FROM "S 1"."T 3" WHERE ((c1 >= 50)) AND ((c1 <= 60))) s8(c1, c2) FULL JOIN (SELECT c1, ROW(c1, c2, c3) FROM "S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))) s9(c1, c2) ON (((s8.c1 = s9.c1)))) WHERE (((s8.c1 IS NULL) OR (s8.c1 IS NOT NULL))) ORDER BY s8.c1 ASC NULLS LAST, s9.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: ft4.c1, ft4.*, ft5.c1, ft5.*
+                      Hash Cond: (ft4.c1 = ft5.c1)
+                      Filter: ((ft4.c1 IS NULL) OR (ft4.c1 IS NOT NULL))
+                      ->  Foreign Scan on public.ft4
+                            Output: ft4.c1, ft4.*
+ Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3" WHERE ((c1 >= 50)) AND ((c1 <= 60))
+                      ->  Hash
+                            Output: ft5.c1, ft5.*
+                            ->  Foreign Scan on public.ft5
+                                  Output: ft5.c1, ft5.*
+ Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (25 rows)

Other changes:
* I went back to make_outerrel_subquery and make_innerrel_subquery, which
are flags to indicate whether to deparse the input relations as subqueries.
is_subquery_rel would work well for handling the cases of full joins with
restrictions on the input relations, but I noticed that that wouldn't work
well when extending to handle the cases where we deparse the input relations
as subqueries to evaluate PHVs remotely.

I had objected to using a single variable instead of two previously
and you had argued against it in [1]. There you had mentioned that PHV
doesn't need two variables, but now you are taking the other stand,
without any apparent reason. Can you please clarify it?

Sorry, I missed some cases. Consider the join {A, B, C} satisfying the identity 3 in optimizer/README, ie,

    (A leftjoin B on (Pab)) leftjoin C on (Pbc)
    = A leftjoin (B leftjoin C on (Pbc)) on (Pab)

where predicate Pbc fails for all-null B rows. Assume that B has a PHV in the relation's reltarget. While considering 2-way joins, foreign_join_ok would determine to deparse B as a subquery and hence set the B's is_subquery_rel to true for the join relation {A, B}. However, if the planner selects the join order {A leftjoin (B leftjoin C on (Pbc)) on (Pab)} as the cheapest path for the join {A, B, C}, we would deparse B as a subquery according to the is_subquery_rel flag while creating the FROM clause entry for the join relation {B, C}. That wouldn't look good. That would be logically correct, though. To avoid this, I'd like to go back to the two variables previously proposed.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to