[ 
https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656440#comment-16656440
 ] 

pengzhiwei commented on CALCITE-2630:
-------------------------------------

_you can have your optimized physical IN operator in your generated code_

[~julianhyde] "In Operator" has been translated to "OR" or "Join" after logical 
plan,If people want to build a physical IN operator in there own engine ,they 
may need to rebuild IN operator from the "OR" or "join" logical plan.It need 
lot of work.

So why we not keep "IN" as it is and leave the space for engine to implement 
there own physical IN.

> Convert SqlInOperator to In-Expression
> --------------------------------------
>
>                 Key: CALCITE-2630
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2630
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.17.0
>            Reporter: pengzhiwei
>            Assignee: pengzhiwei
>            Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
> operands less than "inSubQueryThreshold" or  to "Join" when the operands 
> count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" 
> expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), 
> true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
> DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>     LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>      LogicalProject(ROW_VALUE=[$0], $f1=[true])
>       LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>     LogicalProject(ROW_VALUE=[$0], $f1=[true])
>       LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is 
> no translation on "IN" and just keep it as it is.This would be much clear in 
> the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the 
> computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to