Re: Doubts about VolcanoPlannerPhase

2020-07-07 Thread Roman Kondakov

Hi Aron,

there was a small discussion about this several months ago [1].

[1] https://github.com/apache/calcite/pull/1840#discussion_r387967624

--
Roman Kondakov

On 07.07.2020 09:52, JiaTao Tao wrote:

Seems only use OPTIMIZE phase, can we remove this mechanism?


Regards!

Aron Tao



[jira] [Created] (CALCITE-4110) Provide more info when log rule pop

2020-07-07 Thread Jiatao Tao (Jira)
Jiatao Tao created CALCITE-4110:
---

 Summary: Provide more info when log rule pop
 Key: CALCITE-4110
 URL: https://issues.apache.org/jira/browse/CALCITE-4110
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Jiatao Tao
Assignee: Jiatao Tao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Doubts about VolcanoPlannerPhase

2020-07-07 Thread JiaTao Tao
Hi Roman
Thanks for you information, in this case, my opinion is to delete the
related logical in Volcano planner, but keep the VolcanoPlannerPhase mechanism
for future use.


Regards!

Aron Tao


Roman Kondakov  于2020年7月7日周二 下午3:03写道:

> Hi Aron,
>
> there was a small discussion about this several months ago [1].
>
> [1] https://github.com/apache/calcite/pull/1840#discussion_r387967624
>
> --
> Roman Kondakov
>
> On 07.07.2020 09:52, JiaTao Tao wrote:
> > Seems only use OPTIMIZE phase, can we remove this mechanism?
> >
> >
> > Regards!
> >
> > Aron Tao
> >
>


Re: Doubts about VolcanoPlannerPhase

2020-07-07 Thread Haisheng Yuan
Sorry, I think I was wrong, maybe I over complicated things.
I don't mind removing it, which seems pretty useless.

On 2020/07/07 07:03:20, Roman Kondakov  wrote: 
> Hi Aron,
> 
> there was a small discussion about this several months ago [1].
> 
> [1] https://github.com/apache/calcite/pull/1840#discussion_r387967624
> 
> -- 
> Roman Kondakov
> 
> On 07.07.2020 09:52, JiaTao Tao wrote:
> > Seems only use OPTIMIZE phase, can we remove this mechanism?
> > 
> > 
> > Regards!
> > 
> > Aron Tao
> > 
> 


[DISCUSS] Make RexNode serializable

2020-07-07 Thread Rui Wang
Hi Community,

In Apache Beam we are facing a use case where we need to keep RexNode in
our distributed primitives. Because of the nature of distributed computing,
Beam requires the usage of those primitives be serializable (thus those
primitives can be sent over the network to backend/workers for
further execution).

In the Java world this requirement means to make RexNode implement the Java
Serializable interface.

A workaround right now is to create a bunch of classes to "clone" RexNode
while making those classes implement the Serializable interface.

So what do you think of the idea that makes RexNode implement the
Serializable interface?


-Rui


Correlated subquery losing aggregator operator in inner join?

2020-07-07 Thread Sean Broeder
Hi,
I am following up on JIRA-4100.  I am trying to understand the plan generated 
by the SQL query select  e.empno, e.sal, e.deptno emp_dept, d.deptno dep_dept  
from emp e 
left join
dept d
on e.deptno = (
 select max(sal)
 from emp
 where deptno = e.deptno)

The plan produced is as follows
LogicalProject(EMPNO=[$0], SAL=[$5], EMP_DEPT=[$7], DEP_DEPT=[$9])
  LogicalJoin(condition=[=($0, $7)], joinType=[left])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])

It seems to me that the MAX operator is still needed in the correlated subquery 
before the join, but it is dropped in the aggregator evaluation.  

Interestingly, a similar query using EXISTS retains the aggregator operator:
Select * from emp where exists select 1 from dept where emp.deptno=dept.deptno

This results in the following plan
LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[CAST($9):INTEGER], 
$f1=[CAST($10):BOOLEAN])
LogicalJoin(condition=[=($7, $9)], joinType=[inner])
  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
  LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(DEPTNO=[$0], $f0=[true])
  LogicalTableScan(table=[[CATALOG, SALES, DEPT]])


Based on the exists example, I expect the logical plan should also contain 
something like after the scan on dept:
  LogicalAggregate(group=[{0}], agg#0=[MAX($5)])
LogicalProject(DEPTNO=[$9], $7=[true])

Thanks in advance for your input,
Sean




Re: [DISCUSS] Make RexNode serializable

2020-07-07 Thread Enrico Olivelli
Rui

Il Mar 7 Lug 2020, 20:30 Rui Wang  ha scritto:

> Hi Community,
>
> In Apache Beam we are facing a use case where we need to keep RexNode in
> our distributed primitives. Because of the nature of distributed computing,
> Beam requires the usage of those primitives be serializable (thus those
> primitives can be sent over the network to backend/workers for
> further execution).
>
> In the Java world this requirement means to make RexNode implement the Java
> Serializable interface.
>
> A workaround right now is to create a bunch of classes to "clone" RexNode
> while making those classes implement the Serializable interface.
>

Did you evaluate to use some framework like Kryo that allows you to
serialize Jon serializable classes?

I think that in general Java serialisation is not efficient as it is too
general purpose.
It also brings in a few Security issues.

Maybe an alternative idea is to add some serialisation ad-hoc mechanism in
RexNode.
We should also ensure that every RexNode will be able to be serialized and
deserialized.

Enrico


> So what do you think of the idea that makes RexNode implement the
> Serializable interface?
>
>
> -Rui
>


Re: [DISCUSS] Make RexNode serializable

2020-07-07 Thread Roman Kondakov

Hi Rui,

AFAIK, RelNodes can be serialized to and deserialized from JSON format. 
See test [1] as an example. If I understand it correct,  RelNodes are 
serialized along with enclosed RexNodes, so you can transfer them over 
the network as plain strings.


[1] 
https://github.com/apache/calcite/blob/f64cdcbb9f6535650f0227da19640e736496a9c3/core/src/test/java/org/apache/calcite/plan/RelWriterTest.java#L88


--
Roman Kondakov

On 07.07.2020 22:13, Enrico Olivelli wrote:

Rui

Il Mar 7 Lug 2020, 20:30 Rui Wang  ha scritto:


Hi Community,

In Apache Beam we are facing a use case where we need to keep RexNode in
our distributed primitives. Because of the nature of distributed computing,
Beam requires the usage of those primitives be serializable (thus those
primitives can be sent over the network to backend/workers for
further execution).

In the Java world this requirement means to make RexNode implement the Java
Serializable interface.

A workaround right now is to create a bunch of classes to "clone" RexNode
while making those classes implement the Serializable interface.



Did you evaluate to use some framework like Kryo that allows you to
serialize Jon serializable classes?

I think that in general Java serialisation is not efficient as it is too
general purpose.
It also brings in a few Security issues.

Maybe an alternative idea is to add some serialisation ad-hoc mechanism in
RexNode.
We should also ensure that every RexNode will be able to be serialized and
deserialized.

Enrico



So what do you think of the idea that makes RexNode implement the
Serializable interface?


-Rui





Re: Correlated subquery losing aggregator operator in inner join?

2020-07-07 Thread Rui Wang
Since this discussion is moved to email thread, I just copy something
I found and replied in that JIRA for the reference:

SqlToRelConverter ends with creating a RexSubQuery for that subquery,
and leaves it to JOIN condition. The string representation of
RexSubQuery seems just $7. If you check RexSubQuery, it hosts a
RelNode [1] for that subquery. So that operator is not dropped. It
might be just the intention of design, in which engines to implement
LogicalJoin can still access the subquery's RelNode and create the
correct execution plan.

There are some special logic to replace subquery and insert logical
nodes into plan tree (e.g. for IN) [2]. If a change is needed, likely
it should happen in [2].

[1]: 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexSubQuery.java#L39
[2]: 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1054


-Rui


On Tue, Jul 7, 2020 at 11:57 AM Sean Broeder  wrote:
>
> Hi,
> I am following up on JIRA-4100.  I am trying to understand the plan generated 
> by the SQL query select  e.empno, e.sal, e.deptno emp_dept, d.deptno dep_dept
> from emp e
> left join
> dept d
> on e.deptno = (
>  select max(sal)
>  from emp
>  where deptno = e.deptno)
>
> The plan produced is as follows
> LogicalProject(EMPNO=[$0], SAL=[$5], EMP_DEPT=[$7], DEP_DEPT=[$9])
>   LogicalJoin(condition=[=($0, $7)], joinType=[left])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>
> It seems to me that the MAX operator is still needed in the correlated 
> subquery before the join, but it is dropped in the aggregator evaluation.
>
> Interestingly, a similar query using EXISTS retains the aggregator operator:
> Select * from emp where exists select 1 from dept where emp.deptno=dept.deptno
>
> This results in the following plan
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], DEPTNO0=[CAST($9):INTEGER], 
> $f1=[CAST($10):BOOLEAN])
> LogicalJoin(condition=[=($7, $9)], joinType=[inner])
>   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> LogicalProject(DEPTNO=[$0], $f0=[true])
>   LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>
>
> Based on the exists example, I expect the logical plan should also contain 
> something like after the scan on dept:
>   LogicalAggregate(group=[{0}], agg#0=[MAX($5)])
> LogicalProject(DEPTNO=[$9], $7=[true])
>
> Thanks in advance for your input,
> Sean
>
>


Re: [DISCUSS] Make RexNode serializable

2020-07-07 Thread Danny Chan
Serialize the RexNode as Json format is a solution but I’m afraid it can not 
solve the problem completely.
One problem with it is how to re-parse the json format back to RexNode, the 
current RelJsonReader can only re-parse the RelNode but not RexNode, and it 
needs the RelOptSchema to lookup the operators.

In the distributed scenarios of Beam, I’m afraid it is hard to get the 
RelOptSchema because it is execution, we usually see the RelOptSchema during 
SQL compile time.

Best,
Danny Chan
在 2020年7月8日 +0800 AM3:39,Roman Kondakov ,写道:
> Hi Rui,
>
> AFAIK, RelNodes can be serialized to and deserialized from JSON format.
> See test [1] as an example. If I understand it correct, RelNodes are
> serialized along with enclosed RexNodes, so you can transfer them over
> the network as plain strings.
>
> [1]
> https://github.com/apache/calcite/blob/f64cdcbb9f6535650f0227da19640e736496a9c3/core/src/test/java/org/apache/calcite/plan/RelWriterTest.java#L88
>
> --
> Roman Kondakov
>
> On 07.07.2020 22:13, Enrico Olivelli wrote:
> > Rui
> >
> > Il Mar 7 Lug 2020, 20:30 Rui Wang  ha scritto:
> >
> > > Hi Community,
> > >
> > > In Apache Beam we are facing a use case where we need to keep RexNode in
> > > our distributed primitives. Because of the nature of distributed 
> > > computing,
> > > Beam requires the usage of those primitives be serializable (thus those
> > > primitives can be sent over the network to backend/workers for
> > > further execution).
> > >
> > > In the Java world this requirement means to make RexNode implement the 
> > > Java
> > > Serializable interface.
> > >
> > > A workaround right now is to create a bunch of classes to "clone" RexNode
> > > while making those classes implement the Serializable interface.
> > >
> >
> > Did you evaluate to use some framework like Kryo that allows you to
> > serialize Jon serializable classes?
> >
> > I think that in general Java serialisation is not efficient as it is too
> > general purpose.
> > It also brings in a few Security issues.
> >
> > Maybe an alternative idea is to add some serialisation ad-hoc mechanism in
> > RexNode.
> > We should also ensure that every RexNode will be able to be serialized and
> > deserialized.
> >
> > Enrico
> >
> >
> > > So what do you think of the idea that makes RexNode implement the
> > > Serializable interface?
> > >
> > >
> > > -Rui
> > >
> >


[jira] [Created] (CALCITE-4111) Remove VolcanoPlannerPhase in VolcanoPlanner

2020-07-07 Thread Jiatao Tao (Jira)
Jiatao Tao created CALCITE-4111:
---

 Summary: Remove VolcanoPlannerPhase in VolcanoPlanner
 Key: CALCITE-4111
 URL: https://issues.apache.org/jira/browse/CALCITE-4111
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Jiatao Tao
Assignee: Jiatao Tao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)