Re: Doubts about VolcanoPlannerPhase
Hi Aron, there was a small discussion about this several months ago [1]. [1] https://github.com/apache/calcite/pull/1840#discussion_r387967624 -- Roman Kondakov On 07.07.2020 09:52, JiaTao Tao wrote: Seems only use OPTIMIZE phase, can we remove this mechanism? Regards! Aron Tao
[jira] [Created] (CALCITE-4110) Provide more info when log rule pop
Jiatao Tao created CALCITE-4110: --- Summary: Provide more info when log rule pop Key: CALCITE-4110 URL: https://issues.apache.org/jira/browse/CALCITE-4110 Project: Calcite Issue Type: Improvement Components: core Reporter: Jiatao Tao Assignee: Jiatao Tao -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Doubts about VolcanoPlannerPhase
Hi Roman Thanks for you information, in this case, my opinion is to delete the related logical in Volcano planner, but keep the VolcanoPlannerPhase mechanism for future use. Regards! Aron Tao Roman Kondakov 于2020年7月7日周二 下午3:03写道: > Hi Aron, > > there was a small discussion about this several months ago [1]. > > [1] https://github.com/apache/calcite/pull/1840#discussion_r387967624 > > -- > Roman Kondakov > > On 07.07.2020 09:52, JiaTao Tao wrote: > > Seems only use OPTIMIZE phase, can we remove this mechanism? > > > > > > Regards! > > > > Aron Tao > > >
Re: Doubts about VolcanoPlannerPhase
Sorry, I think I was wrong, maybe I over complicated things. I don't mind removing it, which seems pretty useless. On 2020/07/07 07:03:20, Roman Kondakov wrote: > Hi Aron, > > there was a small discussion about this several months ago [1]. > > [1] https://github.com/apache/calcite/pull/1840#discussion_r387967624 > > -- > Roman Kondakov > > On 07.07.2020 09:52, JiaTao Tao wrote: > > Seems only use OPTIMIZE phase, can we remove this mechanism? > > > > > > Regards! > > > > Aron Tao > > >
[DISCUSS] Make RexNode serializable
Hi Community, In Apache Beam we are facing a use case where we need to keep RexNode in our distributed primitives. Because of the nature of distributed computing, Beam requires the usage of those primitives be serializable (thus those primitives can be sent over the network to backend/workers for further execution). In the Java world this requirement means to make RexNode implement the Java Serializable interface. A workaround right now is to create a bunch of classes to "clone" RexNode while making those classes implement the Serializable interface. So what do you think of the idea that makes RexNode implement the Serializable interface? -Rui
Correlated subquery losing aggregator operator in inner join?
Hi, I am following up on JIRA-4100. I am trying to understand the plan generated by the SQL query select e.empno, e.sal, e.deptno emp_dept, d.deptno dep_dept from emp e left join dept d on e.deptno = ( select max(sal) from emp where deptno = e.deptno) The plan produced is as follows LogicalProject(EMPNO=[$0], SAL=[$5], EMP_DEPT=[$7], DEP_DEPT=[$9]) LogicalJoin(condition=[=($0, $7)], joinType=[left]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) It seems to me that the MAX operator is still needed in the correlated subquery before the join, but it is dropped in the aggregator evaluation. Interestingly, a similar query using EXISTS retains the aggregator operator: Select * from emp where exists select 1 from dept where emp.deptno=dept.deptno This results in the following plan LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[CAST($9):INTEGER], $f1=[CAST($10):BOOLEAN]) LogicalJoin(condition=[=($7, $9)], joinType=[inner]) LogicalTableScan(table=[[CATALOG, SALES, EMP]]) LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) LogicalProject(DEPTNO=[$0], $f0=[true]) LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) Based on the exists example, I expect the logical plan should also contain something like after the scan on dept: LogicalAggregate(group=[{0}], agg#0=[MAX($5)]) LogicalProject(DEPTNO=[$9], $7=[true]) Thanks in advance for your input, Sean
Re: [DISCUSS] Make RexNode serializable
Rui Il Mar 7 Lug 2020, 20:30 Rui Wang ha scritto: > Hi Community, > > In Apache Beam we are facing a use case where we need to keep RexNode in > our distributed primitives. Because of the nature of distributed computing, > Beam requires the usage of those primitives be serializable (thus those > primitives can be sent over the network to backend/workers for > further execution). > > In the Java world this requirement means to make RexNode implement the Java > Serializable interface. > > A workaround right now is to create a bunch of classes to "clone" RexNode > while making those classes implement the Serializable interface. > Did you evaluate to use some framework like Kryo that allows you to serialize Jon serializable classes? I think that in general Java serialisation is not efficient as it is too general purpose. It also brings in a few Security issues. Maybe an alternative idea is to add some serialisation ad-hoc mechanism in RexNode. We should also ensure that every RexNode will be able to be serialized and deserialized. Enrico > So what do you think of the idea that makes RexNode implement the > Serializable interface? > > > -Rui >
Re: [DISCUSS] Make RexNode serializable
Hi Rui, AFAIK, RelNodes can be serialized to and deserialized from JSON format. See test [1] as an example. If I understand it correct, RelNodes are serialized along with enclosed RexNodes, so you can transfer them over the network as plain strings. [1] https://github.com/apache/calcite/blob/f64cdcbb9f6535650f0227da19640e736496a9c3/core/src/test/java/org/apache/calcite/plan/RelWriterTest.java#L88 -- Roman Kondakov On 07.07.2020 22:13, Enrico Olivelli wrote: Rui Il Mar 7 Lug 2020, 20:30 Rui Wang ha scritto: Hi Community, In Apache Beam we are facing a use case where we need to keep RexNode in our distributed primitives. Because of the nature of distributed computing, Beam requires the usage of those primitives be serializable (thus those primitives can be sent over the network to backend/workers for further execution). In the Java world this requirement means to make RexNode implement the Java Serializable interface. A workaround right now is to create a bunch of classes to "clone" RexNode while making those classes implement the Serializable interface. Did you evaluate to use some framework like Kryo that allows you to serialize Jon serializable classes? I think that in general Java serialisation is not efficient as it is too general purpose. It also brings in a few Security issues. Maybe an alternative idea is to add some serialisation ad-hoc mechanism in RexNode. We should also ensure that every RexNode will be able to be serialized and deserialized. Enrico So what do you think of the idea that makes RexNode implement the Serializable interface? -Rui
Re: Correlated subquery losing aggregator operator in inner join?
Since this discussion is moved to email thread, I just copy something I found and replied in that JIRA for the reference: SqlToRelConverter ends with creating a RexSubQuery for that subquery, and leaves it to JOIN condition. The string representation of RexSubQuery seems just $7. If you check RexSubQuery, it hosts a RelNode [1] for that subquery. So that operator is not dropped. It might be just the intention of design, in which engines to implement LogicalJoin can still access the subquery's RelNode and create the correct execution plan. There are some special logic to replace subquery and insert logical nodes into plan tree (e.g. for IN) [2]. If a change is needed, likely it should happen in [2]. [1]: https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexSubQuery.java#L39 [2]: https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1054 -Rui On Tue, Jul 7, 2020 at 11:57 AM Sean Broeder wrote: > > Hi, > I am following up on JIRA-4100. I am trying to understand the plan generated > by the SQL query select e.empno, e.sal, e.deptno emp_dept, d.deptno dep_dept > from emp e > left join > dept d > on e.deptno = ( > select max(sal) > from emp > where deptno = e.deptno) > > The plan produced is as follows > LogicalProject(EMPNO=[$0], SAL=[$5], EMP_DEPT=[$7], DEP_DEPT=[$9]) > LogicalJoin(condition=[=($0, $7)], joinType=[left]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > > It seems to me that the MAX operator is still needed in the correlated > subquery before the join, but it is dropped in the aggregator evaluation. > > Interestingly, a similar query using EXISTS retains the aggregator operator: > Select * from emp where exists select 1 from dept where emp.deptno=dept.deptno > > This results in the following plan > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8]) > LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], > SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[CAST($9):INTEGER], > $f1=[CAST($10):BOOLEAN]) > LogicalJoin(condition=[=($7, $9)], joinType=[inner]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalAggregate(group=[{0}], agg#0=[MIN($1)]) > LogicalProject(DEPTNO=[$0], $f0=[true]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > > > Based on the exists example, I expect the logical plan should also contain > something like after the scan on dept: > LogicalAggregate(group=[{0}], agg#0=[MAX($5)]) > LogicalProject(DEPTNO=[$9], $7=[true]) > > Thanks in advance for your input, > Sean > >
Re: [DISCUSS] Make RexNode serializable
Serialize the RexNode as Json format is a solution but I’m afraid it can not solve the problem completely. One problem with it is how to re-parse the json format back to RexNode, the current RelJsonReader can only re-parse the RelNode but not RexNode, and it needs the RelOptSchema to lookup the operators. In the distributed scenarios of Beam, I’m afraid it is hard to get the RelOptSchema because it is execution, we usually see the RelOptSchema during SQL compile time. Best, Danny Chan 在 2020年7月8日 +0800 AM3:39,Roman Kondakov ,写道: > Hi Rui, > > AFAIK, RelNodes can be serialized to and deserialized from JSON format. > See test [1] as an example. If I understand it correct, RelNodes are > serialized along with enclosed RexNodes, so you can transfer them over > the network as plain strings. > > [1] > https://github.com/apache/calcite/blob/f64cdcbb9f6535650f0227da19640e736496a9c3/core/src/test/java/org/apache/calcite/plan/RelWriterTest.java#L88 > > -- > Roman Kondakov > > On 07.07.2020 22:13, Enrico Olivelli wrote: > > Rui > > > > Il Mar 7 Lug 2020, 20:30 Rui Wang ha scritto: > > > > > Hi Community, > > > > > > In Apache Beam we are facing a use case where we need to keep RexNode in > > > our distributed primitives. Because of the nature of distributed > > > computing, > > > Beam requires the usage of those primitives be serializable (thus those > > > primitives can be sent over the network to backend/workers for > > > further execution). > > > > > > In the Java world this requirement means to make RexNode implement the > > > Java > > > Serializable interface. > > > > > > A workaround right now is to create a bunch of classes to "clone" RexNode > > > while making those classes implement the Serializable interface. > > > > > > > Did you evaluate to use some framework like Kryo that allows you to > > serialize Jon serializable classes? > > > > I think that in general Java serialisation is not efficient as it is too > > general purpose. > > It also brings in a few Security issues. > > > > Maybe an alternative idea is to add some serialisation ad-hoc mechanism in > > RexNode. > > We should also ensure that every RexNode will be able to be serialized and > > deserialized. > > > > Enrico > > > > > > > So what do you think of the idea that makes RexNode implement the > > > Serializable interface? > > > > > > > > > -Rui > > > > >
[jira] [Created] (CALCITE-4111) Remove VolcanoPlannerPhase in VolcanoPlanner
Jiatao Tao created CALCITE-4111: --- Summary: Remove VolcanoPlannerPhase in VolcanoPlanner Key: CALCITE-4111 URL: https://issues.apache.org/jira/browse/CALCITE-4111 Project: Calcite Issue Type: Improvement Components: core Reporter: Jiatao Tao Assignee: Jiatao Tao -- This message was sent by Atlassian Jira (v8.3.4#803005)