Hi Ashutosh,

Thanks for the review.

2015/04/22 19:28、Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> のメール:
> Tests
> -------
> 1.The postgres_fdw test is re/setting enable_mergejoin at various places. The 
> goal of these tests seems to be to test the sanity of foreign plans 
> generated. So, it might be better to reset enable_mergejoin (and may be all 
> of enable_hashjoin, enable_nestloop_join etc.) to false at the beginning of 
> the testcase and set them again at the end. That way, we will also make sure 
> that foreign plans are chosen irrespective of future planner changes.

I have different, rather opposite opinion about it.  I disabled other join 
types as least as the tests pass, because I worry oversights come from planner 
changes.  I hope to eliminate enable_foo from the test script, by improving 
costing model smarter.

> 2. In the patch, I see that the inheritance testcases have been deleted from 
> postgres_fdw.sql, is that intentional? I do not see those being replaced 
> anywhere else.

It’s accidental removal, I restored the tests about inheritance feature.

> 3. We need one test for each join type (or at least for INNER and LEFT OUTER) 
> where there are unsafe to push conditions in ON clause along-with 
> safe-to-push conditions. For INNER join, the join should get pushed down with 
> the safe conditions and for OUTER join it shouldn't be. Same goes for WHERE 
> clause, in which case the join will be pushed down but the unsafe-to-push 
> conditions will be applied locally.

Currently INNER JOINs with unsafe join conditions are not pushed down, so such 
test is not in the suit.  As you say, in theory, INNER JOINs can be pushed down 
even they have push-down-unsafe join conditions, because such conditions can be 
evaluated no local side against rows retrieved without those conditions.

> 4. All the tests have ORDER BY, LIMIT in them, so the setref code is being 
> exercised. But, something like aggregates would test the setref code better. 
> So, we should add at-least one test like select avg(ft1.c1 + ft2.c2) from ft1 
> join ft2 on (ft1.c1 = ft2.c1).

Added an aggregate case, and also added an UNION case for Append.

> 5. It will be good to add some test which contain join between few foreign 
> and few local tables to see whether we are able to push down the largest 
> possible foreign join tree to the foreign server. 
> 





> Code
> -------
> In classifyConditions(), the code is now appending RestrictInfo::clause 
> rather than RestrictInfo itself. But the callers of classifyConditions() have 
> not changed. Is this change intentional?

Yes, the purpose of the change is to make appendConditions (former name is 
appendWhereClause) can handle JOIN ON clause, list of Expr.

> The functions which consume the lists produced by this function handle 
> expressions as well RestrictInfo, so you may not have noticed it. Because of 
> this change, we might be missing some optimizations e.g. in function 
> postgresGetForeignPlan()
>  793         if (list_member_ptr(fpinfo->remote_conds, rinfo))
>  794             remote_conds = lappend(remote_conds, rinfo->clause);
>  795         else if (list_member_ptr(fpinfo->local_conds, rinfo))
>  796             local_exprs = lappend(local_exprs, rinfo->clause);
>  797         else if (is_foreign_expr(root, baserel, rinfo->clause))
>  798             remote_conds = lappend(remote_conds, rinfo->clause);
>  799         else
>  800             local_exprs = lappend(local_exprs, rinfo->clause);
> Finding a RestrictInfo in remote_conds avoids another call to 
> is_foreign_expr(). So with this change, I think we are doing an extra call to 
> is_foreign_expr().
> 

Hm, it seems better to revert my change and make appendConditions downcast 
given information into RestrictInfo or Expr according to the node tag.

> The function get_jointype_name() returns an empty string for unsupported join 
> types. Instead of that it should throw an error, if some code path 
> accidentally calls the function with unsupported join type e.g. SEMI_JOIN.

Agreed, fixed.

> While deparsing the SQL with rowmarks, the placement of FOR UPDATE/SHARE 
> clause in the original query is not being honored, which means that we will 
> end up locking the rows which are not part of the join result even when the 
> join is pushed to the foreign server. E.g take the following query (it uses 
> the tables created in postgres_fdw.sql tests)
> contrib_regression=# explain verbose select * from ft1 join ft2 on (ft1.c1 = 
> ft2.c1) for update of ft1;
>                                                                               
>                                                                          
>                                                                               
>                                                                          
>                                              QUERY PLAN                       
>                                                                          
>                                                                               
>                                                                          
>                                                                               
>                       
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------------
>  LockRows  (cost=100.00..124.66 rows=822 width=426)
>    Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, 
> ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft1.*, ft2.*
>    ->  Foreign Scan  (cost=100.00..116.44 rows=822 width=426)
>          Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, 
> ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft1.*,
>  ft2.*
>          Relations: (public.ft1) INNER JOIN (public.ft2)
>          Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8, 
> l.a9, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6, r.a7, r.a8, r.a9 FROM (SELECT l.a
> 10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17, ROW(l.a10, l.a11, l.a12, 
> l.a13, l.a14, l.a15, l.a16, l.a17) FROM (SELECT "C 1" a10, c2 a11, c3 a12
> , c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" FOR UPDATE) l) l 
> (a1, a2, a3, a4, a5, a6, a7, a8, a9) INNER JOIN (SELECT r.a9, r.a10, r.a12, 
> r.a13, r.a14, r.a15, r.a16, r.a17, ROW(r.a9, r.a10, r.a12, r.a13, r.a14, 
> r.a15, r.a16, r.a17) FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6
>  a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7, a8, 
> a9) ON ((l.a1 = r.a1))
> (6 rows)
> It's expected that only the rows which are part of join result will be locked 
> by FOR UPDATE clause. The query sent to the foreign server has attached the 
> FOR UPDATE clause to the sub-query for table ft1 ("S 1"."T 1" on foreign 
> server). As per the postgresql documentation, "When a locking clause appears 
> in a sub-SELECT, the rows locked are those returned to the outer query by the 
> sub-query.". So it's going to lock all rows from "S 1"."T 1", rather than 
> only the rows which are part of join. This is going to increase probability 
> of deadlocks, if the join is between a big table and small table where big 
> table is being used in many queries and the join is going to have only a 
> single row in the result.
> 



> Since there is no is_first argument to appendConditions(), we should remove 
> corresponding line from the function prologue.
> 

Oops, replaced with the description of prefix.


> The name TO_RELATIVE() doesn't convey the full meaning of the macro. May be 
> GET_RELATIVE_ATTNO() or something like that.

Fixed.

> 
> In postgresGetForeignJoinPaths(), while separating the conditions into join 
> quals and other quals,
> 3014     if (IS_OUTER_JOIN(jointype))
> 3015     {
> 3016         extract_actual_join_clauses(joinclauses, &joinclauses, 
> &otherclauses);
> 3017     }
> 3018     else
> 3019     {
> 3020         joinclauses = extract_actual_clauses(joinclauses, false);
> 3021         otherclauses = NIL;
> 3022     }
> we shouldn't differentiate between outer and inner join. For inner join the 
> join quals can be treated as other clauses and they will be returned as other 
> clauses, which is fine. Also, the following condition
> 3050     /*
> 3051      * Other condition for the join must be safe to push down.
> 3052      */
> 3053     foreach(lc, otherclauses)
> 3054     {
> 3055         Expr *expr = (Expr *) lfirst(lc);
> 3056 
> 3057         if (!is_foreign_expr(root, joinrel, expr))
> 3058         {
> 3059             ereport(DEBUG3, (errmsg("filter contains unsafe 
> conditions")));
> 3060             return;
> 3061         }
> 3062     }
> is unnecessary. I there are filter conditions which are unsafe to push down, 
> they can be applied locally after obtaining the join result from the foreign 
> server. The join quals are all needed to be safe to push down, since they 
> decide which rows will contain NULL inner side in an OUTER join. 

I’m not sure that we *shouldn’t* differentiate, but I agree that we *don’t 
need* to differentiate if we are talking about only the result of filtering.

IMO we *should* differentiate inner and outer (or differentiate join conditions 
and filter conditions) because all conditions of typical INNER JOINs go into 
otherclauses because their is_pushed_down flag is on, so such joins look like 
CROSS JOIN + WHERE filter.  In the latest patch EXPLAIN shows the join 
combinations of a foreign join scan node with join type, but your suggestion 
makes it looks like this:

fdw=# explain (verbose) select * from pgbench_branches b join pgbench_tellers t 
on t.bid = b.bid;
WARNING:  restrictlist: ({RESTRICTINFO :clause {OPEXPR :opno 96 :opfuncid 65 
:opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 
1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 
:varoattno 1 :location 85} {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1 
:varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 2 :location 77}) :location 
-1} :is_pushed_down true :outerjoin_delayed false :can_join true 
:pseudoconstant false :clause_relids (b 1 2) :required_relids (b 1 2) 
:outer_relids (b) :nullable_relids (b) :left_relids (b 1) :right_relids (b 2) 
:orclause <> :norm_selec 0.2000 :outer_selec -1.0000 :mergeopfamilies (o 1976) 
:left_em {EQUIVALENCEMEMBER :em_expr {VAR :varno 1 :varattno 1 :vartype 23 
:vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 
85} :em_relids (b 1) :em_nullable_relids (b) :em_is_const false :em_is_child 
false :em_datatype 23} :right_em {EQUIVALENCEMEMBER :em_expr {VAR :varno 2 
:varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 
:varoattno 2 :location 77} :em_relids (b 2) :em_nullable_relids (b) 
:em_is_const false :em_is_child false :em_datatype 23} :outer_is_left false 
:hashjoinoperator 96})
WARNING:  joinclauses: <>
WARNING:  otherclauses: ({OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 
:opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1 
:vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 
:location 85} {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 
:varlevelsup 0 :varnoold 2 :varoattno 2 :location 77}) :location -1})

   QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------
 Foreign Scan  (cost=100.00..101.00 rows=50 width=716)
   Output: b.bid, b.bbalance, b.filler, t.tid, t.bid, t.tbalance, t.filler
   Relations: (public.pgbench_branches b) CROSS JOIN (public.pgbench_tellers t)
   Remote SQL: SELECT l.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM (SELECT 
l.a9, l.a10, l.a11 FROM (SELECT bid a9, bbalance a10, filler a11 FROM 
public.pgbench_branches) l)
 l (a1, a2, a3) CROSS JOIN (SELECT r.a9, r.a10, r.a11, r.a12 FROM (SELECT tid 
a9, bid a10, tbalance a11, filler a12 FROM public.pgbench_tellers) r) r (a1, 
a2, a3, a4) WHERE
((l.a1 = r.a2))
(4 rows)

Thoughts?


Regards,
--
Shigeru HANADA
shigeru.han...@gmail.com






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to