[jira] [Commented] (CALCITE-457) Non-ansi join should not be processed as a filter on top of "on (true)" join

Jinfeng Ni (JIRA) Sat, 13 Jun 2015 08:27:38 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584655#comment-14584655
 ]


Jinfeng Ni commented on CALCITE-457:
------------------------------------

Hi Julian,

I modified the code based on your comments.  The PR is updated at 
https://github.com/apache/incubator-calcite/pull/92. Please take a look. 

Here are responses to your comments.


Comment 1 . In misc.oq, a field that was called "DESC" is now called "EXPR$1". 
Can you restore the old name?

[jni] I guess you are talking about the JdbcTest.testInnerJoinValues(), where 
"DESC" is changed to "EXPR$1" in EXPLAIN.

Actually, the field name change happens only in EXPLAIN PLAN output; the query 
result is still using "DESC". I spent quite some time trying to figure out why 
the field name is changed in EXPLAIN. I do not have a completely clear idea why 
it happened. But seems the cause might be related to the fact that Calcite put 
two RelNodes with different rowType under the same RelSet. 

The following is from the calcite trace "before" my code change:

Set#1, type: RecordType(INTEGER EXPR$0, CHAR(8) EXPR$1)
  rel#11:Subset#1.NONE.[], best=null, importance=0.5904900000000001
    rel#1:LogicalValues.NONE.[[0, 1], [1]](type=RecordType(INTEGER EXPR$0, 
CHAR(8) EXPR$1),tuples=[{ 10, 'SameName' }]), rowcount=1.0, cumulative 
cost={inf}
    rel#12:LogicalProject.NONE.[[0, 1], 
[1]](input=rel#11:Subset#1.NONE.[],ID=$0,DESC=$1), rowcount=1.0, cumulative 
cost={inf}


In the above, the LogicalValues has (EXPR$0, EXPR$1), LogicalProject has (ID, 
DESC). Yet Calcite still put them together under the same RelSet. My guess is 
CALCITE-457 patch will impact the order of some rules firing, and somehow the 
rowType of downstream RelNode is messed up.

On the other hand, I think Calcite's API does not guarantee that the 
intermediate RelNode will keep the same field name; the output field name is 
ensured to be "DESC" via validator.getValidatedNodeType(). That's why the query 
result still has "DESC" in the output. Therefore, I feel it might be ok to have 
different field name in the EXPLAIN plan output.

Another comment : this testcase does not have the join condition pushed, simply 
because f costing : Values() has one row only. I tried at least with two rows, 
and verified the condition will be pushed down.

Comment 2:  It's appropriate to always create a LogicalJoin when in 
SqlToRelConverter, less so in general-purpose planner rules. Can you use a join 
factory and not add method RelOptUtil.createJoin?

[jni] 
I tried to add join factory to FilterJoinRule, that seems require more change. 
In stead, I chose to passing in originalJoin, and use join.copy() method. That 
way, in the general-purpose planner rule or SqlToRelConverter, the new created 
Join will be of the same class of original join node.

Comment 3:  re-format 
[jni] Now I pass in param originalJoin. No need for such format.

Comment 4:  join.oq should end with newline
[jni] Done.

Comment 5: Can you add a test for FilterJoinRule's new capability to 
RelOptRulesTest.
[jni] testPushJoinCondDownToProject() is Added.

> Non-ansi join should not be processed as a filter on top of "on (true)" join
> ----------------------------------------------------------------------------
>
>                 Key: CALCITE-457
>                 URL: https://issues.apache.org/jira/browse/CALCITE-457
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 0.9.1-incubating
>            Reporter: Vladimir Sitnikov
>            Assignee: Julian Hyde
>
> I've tested two plans and it turns out the query with non-ansi joins has 
> extremely bad plan (note {{EnumerableJoinRel(condition=\[true\]}}):
> {code:sql}
> explain plan for select d."deptno", e."empid"
> from "hr"."emps" as e
>   , "hr"."depts" as d
> where e."deptno" = d."deptno"+0
> PLAN=EnumerableCalcRel(expr#0..2=[{inputs}], expr#3=[CAST($t1):INTEGER NOT 
> NULL], expr#4=[0], expr#5=[+($t2, $t4)], expr#6=[=($t3, $t5)], deptno=[$t2], 
> empid=[$t0], $condition=[$t6])
>   EnumerableJoinRel(condition=[true], joinType=[inner])
>     EnumerableCalcRel(expr#0..4=[{inputs}], proj#0..1=[{exprs}])
>       EnumerableTableAccessRel(table=[[hr, emps]])
>     EnumerableCalcRel(expr#0..2=[{inputs}], deptno=[$t0])
>       EnumerableTableAccessRel(table=[[hr, depts]])
> {code}
> Same works fine with ANSI style:
> {code:sql}
> explain plan for select d."deptno", e."empid"
> from "hr"."emps" as e
>   join "hr"."depts" as d
>  on (e."deptno" = d."deptno"+0)
> PLAN=EnumerableCalcRel(expr#0..3=[{inputs}], deptno=[$t2], empid=[$t0])
>   EnumerableJoinRel(condition=[=($1, $3)], joinType=[inner])
>     EnumerableCalcRel(expr#0..4=[{inputs}], expr#5=[CAST($t1):INTEGER NOT 
> NULL], empid=[$t0], $f5=[$t5])
>       EnumerableTableAccessRel(table=[[hr, emps]])
>     EnumerableCalcRel(expr#0..2=[{inputs}], expr#3=[0], expr#4=[+($t0, $t3)], 
> deptno=[$t0], $f3=[$t4])
>       EnumerableTableAccessRel(table=[[hr, depts]])
> {code}
> The query that does not use calculations works fine even with non-ansi style:
> {code:sql}
> explain plan for select d."deptno", e."empid"
> from "hr"."emps" as e
>   , "hr"."depts" as d
> where e."deptno" = d."deptno"
> PLAN=EnumerableCalcRel(expr#0..2=[{inputs}], deptno=[$t2], empid=[$t0])
>   EnumerableJoinRel(condition=[=($1, $2)], joinType=[inner])
>     EnumerableCalcRel(expr#0..4=[{inputs}], proj#0..1=[{exprs}])
>       EnumerableTableAccessRel(table=[[hr, emps]])
>     EnumerableCalcRel(expr#0..2=[{inputs}], deptno=[$t0])
>       EnumerableTableAccessRel(table=[[hr, depts]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CALCITE-457) Non-ansi join should not be processed as a filter on top of "on (true)" join

Reply via email to