Translation of SQL EXISTS

2020-02-19 Thread Christian Beikov

Hello,

I'm a bit confused about how the SQL EXISTS predicate is translated. I'd 
assume that an EXISTS is translated in relational algebra to a SEMI- and 
NOT EXISTS to an ANTI-join, but it's not.


PlannerImpl p = new PlannerImpl(config);
SqlNode sqlNode = p.parse("delete from _order o where exists (select 1 
from order_position p where o.id = p.order_id)");

p.validate(sqlNode);
RelRoot rel = p.rel(sqlNode);
RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
sqlWriter.format(result.asStatement());

Worse, when printing this, I only get DELETE FROM "public"."_order" i.e. 
the EXISTS part is not rendered. This is the plan I get.


LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])

  LogicalProject(inputs=[0])
    LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
  LogicalJoin(condition=[=($0, $1)], joinType=[inner])
    JdbcTableScan(table=[[adhoc, _order]])
    LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
  LogicalProject(exprs=[[$1, true]])
    JdbcTableScan(table=[[adhoc, order_position]])

I'd expect something along the lines of

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])

  LogicalProject(inputs=[0])
  LogicalJoin(condition=[=($0, $1)], joinType=[semi])
    JdbcTableScan(table=[[adhoc, _order]])
    JdbcTableScan(table=[[adhoc, order_position]])

and for NOT EXISTS

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])

  LogicalProject(inputs=[0])
  LogicalJoin(condition=[=($0, $1)], joinType=[anti])
    JdbcTableScan(table=[[adhoc, _order]])
    JdbcTableScan(table=[[adhoc, order_position]])

Am I missing something and the current aggregate function translation 
makes sense?


I constructed relational algebra structures for some other statements 
with SEMI- and ANTI-joins and already noticed that these join types 
weren't handled in 
org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join), 
which I fixed locally. Is the lack of a translation intentional?


Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an 
over-simplification or would you say it's correct? As far as I 
understood from https://en.wikipedia.org/wiki/Relational_algebra this is 
correct.


I'd be happy to contribute that back. I didn't look into the Sql-to-Rel 
translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I 
assume that's not that hard and I could add that.


Regards,

Christian



Re: Translation of SQL EXISTS

2020-02-19 Thread Haisheng Yuan
Hi Christian,

For the query in your example, Calcite first generates inner join plan with 
aggregate child, then through SemJoinRule transform the inner join to semi or 
antisemi join. The reason to have inner join is that it allows join 
commutativity, which is good for generating a potential better plan with 
nestedloop join or hash join.

Admittedly, this process in Calcite is counter intuitive. It should be in 
reverse order, first generate a semi or anti-semi join, then generate an 
inner/outer join.

- Haisheng

--
发件人:Christian Beikov
日 期:2020年02月19日 21:12:13
收件人:
主 题:Translation of SQL EXISTS

Hello,

I'm a bit confused about how the SQL EXISTS predicate is translated. I'd 
assume that an EXISTS is translated in relational algebra to a SEMI- and 
NOT EXISTS to an ANTI-join, but it's not.

PlannerImpl p = new PlannerImpl(config);
SqlNode sqlNode = p.parse("delete from _order o where exists (select 1 
from order_position p where o.id = p.order_id)");
p.validate(sqlNode);
RelRoot rel = p.rel(sqlNode);
RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
sqlWriter.format(result.asStatement());

Worse, when printing this, I only get DELETE FROM "public"."_order" i.e. 
the EXISTS part is not rendered. This is the plan I get.

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])
   LogicalProject(inputs=[0])
 LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
   LogicalJoin(condition=[=($0, $1)], joinType=[inner])
 JdbcTableScan(table=[[adhoc, _order]])
 LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
   LogicalProject(exprs=[[$1, true]])
 JdbcTableScan(table=[[adhoc, order_position]])

I'd expect something along the lines of

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])
   LogicalProject(inputs=[0])
   LogicalJoin(condition=[=($0, $1)], joinType=[semi])
 JdbcTableScan(table=[[adhoc, _order]])
 JdbcTableScan(table=[[adhoc, order_position]])

and for NOT EXISTS

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])
   LogicalProject(inputs=[0])
   LogicalJoin(condition=[=($0, $1)], joinType=[anti])
 JdbcTableScan(table=[[adhoc, _order]])
 JdbcTableScan(table=[[adhoc, order_position]])

Am I missing something and the current aggregate function translation 
makes sense?

I constructed relational algebra structures for some other statements 
with SEMI- and ANTI-joins and already noticed that these join types 
weren't handled in 
org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),
 
which I fixed locally. Is the lack of a translation intentional?

Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an 
over-simplification or would you say it's correct? As far as I 
understood from https://en.wikipedia.org/wiki/Relational_algebra this is 
correct.

I'd be happy to contribute that back. I didn't look into the Sql-to-Rel 
translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I 
assume that's not that hard and I could add that.

Regards,

Christian




Re: Translation of SQL EXISTS

2020-02-19 Thread Christian Beikov

Hey Haisheng,

it is nice to have a rule that detects such patterns but do you agree 
that it would be better to generate SEMI- and ANTI-joins already during 
Sql-to-Rel conversion?


I implemented the Rel-to-Sql conversion for SEMI- and ANTI-join and it 
seems to work for my use cases, but it feels a bit ugly as I have to do 
some transformations which might be ad-hoc. I had to replace identifiers 
that refer to the outcome of the SEMI- and ANTI-joins with the select 
items of the LHS because the SEMI- and ANTI-join does not get an alias.


See for yourself: 
https://github.com/beikov/calcite/commit/6ca67e2e8df44c2081e71407ee6893a961f36271


I'll add a few tests and a comment, but could anyone please comment on 
whether this approach looks ok?


Regards,

Christian

Am 19.02.2020 um 19:03 schrieb Haisheng Yuan:

Hi Christian,

For the query in your example, Calcite first generates inner join plan 
with aggregate child, then through SemJoinRule transform the inner 
join to semi or antisemi join. The reason to have inner join is that 
it allows join commutativity, which is good for generating a potential 
better plan with nestedloop join or hash join.


Admittedly, this process in Calcite is counter intuitive. It should be 
in reverse order, first generate a semi or anti-semi join, then 
generate an inner/outer join.


- Haisheng

--
发件人:Christian Beikov
日 期:2020年02月19日 21:12:13
收件人:
    主 题:Translation of SQL EXISTS

Hello,

I'm a bit confused about how the SQL EXISTS predicate is translated. I'd

assume that an EXISTS is translated in relational algebra to a SEMI- and

NOT EXISTS to an ANTI-join, but it's not.

PlannerImpl p = new PlannerImpl(config);
SqlNode sqlNode = p.parse("delete from _order o where exists (select 1

from order_position p where o.id = p.order_id)");
p.validate(sqlNode);
RelRoot rel = p.rel(sqlNode);
RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
sqlWriter.format(result.asStatement());

Worse, when printing this, I only get DELETE FROM "public"."_order" i.e.

the EXISTS part is not rendered. This is the plan I get.

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
   LogicalProject(inputs=[0])
 LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
   LogicalJoin(condition=[=($0, $1)], joinType=[inner])
 JdbcTableScan(table=[[adhoc, _order]])
 LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
   LogicalProject(exprs=[[$1, true]])
 JdbcTableScan(table=[[adhoc, order_position]])

I'd expect something along the lines of

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
   LogicalProject(inputs=[0])
   LogicalJoin(condition=[=($0, $1)], joinType=[semi])
 JdbcTableScan(table=[[adhoc, _order]])
 JdbcTableScan(table=[[adhoc, order_position]])

and for NOT EXISTS

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
   LogicalProject(inputs=[0])
   LogicalJoin(condition=[=($0, $1)], joinType=[anti])
 JdbcTableScan(table=[[adhoc, _order]])
 JdbcTableScan(table=[[adhoc, order_position]])

Am I missing something and the current aggregate function translation
makes sense?

I constructed relational algebra structures for some other statements
with SEMI- and ANTI-joins and already noticed that these join types
weren't handled in

org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),

which I fixed locally. Is the lack of a translation intentional?

Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an

over-simplification or would you say it's correct? As far as I
understood from https://en.wikipedia.org/wiki/Relational_algebra this is

correct.

I'd be happy to contribute that back. I didn't look into the Sql-to-Rel

translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I

assume that's not that hard and I could add that.

Regards,

Christian




Re: Translation of SQL EXISTS

2020-02-19 Thread Julian Hyde
Agreed, it should be in reverse order. Translate to semi-join (or
anti-join for NOT EXISTS), then optionally use a rule to rewrite semi-
or anti-join to Join+Aggregate.

Note that if the EXISTS is in a disjunction (e.g. "delete from orders
where exists (select null from order_items where ...) or shipping >
20") we cannot use a semi-join. We have to use a left join, using an
indicator column on the right-hand side that will be null iff there is
no match. Which is what we do currently.

On Wed, Feb 19, 2020 at 10:03 AM Haisheng Yuan  wrote:
>
> Hi Christian,
>
> For the query in your example, Calcite first generates inner join plan with 
> aggregate child, then through SemJoinRule transform the inner join to semi or 
> antisemi join. The reason to have inner join is that it allows join 
> commutativity, which is good for generating a potential better plan with 
> nestedloop join or hash join.
>
> Admittedly, this process in Calcite is counter intuitive. It should be in 
> reverse order, first generate a semi or anti-semi join, then generate an 
> inner/outer join.
>
> - Haisheng
>
> --
> 发件人:Christian Beikov
> 日 期:2020年02月19日 21:12:13
> 收件人:
> 主 题:Translation of SQL EXISTS
>
> Hello,
>
> I'm a bit confused about how the SQL EXISTS predicate is translated. I'd
> assume that an EXISTS is translated in relational algebra to a SEMI- and
> NOT EXISTS to an ANTI-join, but it's not.
>
> PlannerImpl p = new PlannerImpl(config);
> SqlNode sqlNode = p.parse("delete from _order o where exists (select 1
> from order_position p where o.id = p.order_id)");
> p.validate(sqlNode);
> RelRoot rel = p.rel(sqlNode);
> RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
> SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
> sqlWriter.format(result.asStatement());
>
> Worse, when printing this, I only get DELETE FROM "public"."_order" i.e.
> the EXISTS part is not rendered. This is the plan I get.
>
> LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
> flattened=[true])
>LogicalProject(inputs=[0])
>  LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
>LogicalJoin(condition=[=($0, $1)], joinType=[inner])
>  JdbcTableScan(table=[[adhoc, _order]])
>  LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>LogicalProject(exprs=[[$1, true]])
>  JdbcTableScan(table=[[adhoc, order_position]])
>
> I'd expect something along the lines of
>
> LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
> flattened=[true])
>LogicalProject(inputs=[0])
>LogicalJoin(condition=[=($0, $1)], joinType=[semi])
>  JdbcTableScan(table=[[adhoc, _order]])
>  JdbcTableScan(table=[[adhoc, order_position]])
>
> and for NOT EXISTS
>
> LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
> flattened=[true])
>LogicalProject(inputs=[0])
>LogicalJoin(condition=[=($0, $1)], joinType=[anti])
>  JdbcTableScan(table=[[adhoc, _order]])
>  JdbcTableScan(table=[[adhoc, order_position]])
>
> Am I missing something and the current aggregate function translation
> makes sense?
>
> I constructed relational algebra structures for some other statements
> with SEMI- and ANTI-joins and already noticed that these join types
> weren't handled in
> org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),
> which I fixed locally. Is the lack of a translation intentional?
>
> Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an
> over-simplification or would you say it's correct? As far as I
> understood from https://en.wikipedia.org/wiki/Relational_algebra this is
> correct.
>
> I'd be happy to contribute that back. I didn't look into the Sql-to-Rel
> translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I
> assume that's not that hard and I could add that.
>
> Regards,
>
> Christian
>
>


Re: Translation of SQL EXISTS

2020-02-20 Thread Christian Beikov

Alright, so here is the PR for the Rel-to-Sql translation.

I thought about what you wrote here already Julian, but thanks for 
mentioning it.


Would be great if this could make it into 1.22.0, or is it too late for 
that already?


Am 19.02.2020 um 22:49 schrieb Julian Hyde:

Agreed, it should be in reverse order. Translate to semi-join (or
anti-join for NOT EXISTS), then optionally use a rule to rewrite semi-
or anti-join to Join+Aggregate.

Note that if the EXISTS is in a disjunction (e.g. "delete from orders
where exists (select null from order_items where ...) or shipping >
20") we cannot use a semi-join. We have to use a left join, using an
indicator column on the right-hand side that will be null iff there is
no match. Which is what we do currently.

On Wed, Feb 19, 2020 at 10:03 AM Haisheng Yuan  wrote:

Hi Christian,

For the query in your example, Calcite first generates inner join plan with 
aggregate child, then through SemJoinRule transform the inner join to semi or 
antisemi join. The reason to have inner join is that it allows join 
commutativity, which is good for generating a potential better plan with 
nestedloop join or hash join.

Admittedly, this process in Calcite is counter intuitive. It should be in 
reverse order, first generate a semi or anti-semi join, then generate an 
inner/outer join.

- Haisheng

--
发件人:Christian Beikov
日 期:2020年02月19日 21:12:13
收件人:
主 题:Translation of SQL EXISTS

Hello,

I'm a bit confused about how the SQL EXISTS predicate is translated. I'd
assume that an EXISTS is translated in relational algebra to a SEMI- and
NOT EXISTS to an ANTI-join, but it's not.

PlannerImpl p = new PlannerImpl(config);
SqlNode sqlNode = p.parse("delete from _order o where exists (select 1
from order_position p where o.id = p.order_id)");
p.validate(sqlNode);
RelRoot rel = p.rel(sqlNode);
RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
sqlWriter.format(result.asStatement());

Worse, when printing this, I only get DELETE FROM "public"."_order" i.e.
the EXISTS part is not rendered. This is the plan I get.

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
  LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
LogicalJoin(condition=[=($0, $1)], joinType=[inner])
  JdbcTableScan(table=[[adhoc, _order]])
  LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(exprs=[[$1, true]])
  JdbcTableScan(table=[[adhoc, order_position]])

I'd expect something along the lines of

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[semi])
  JdbcTableScan(table=[[adhoc, _order]])
  JdbcTableScan(table=[[adhoc, order_position]])

and for NOT EXISTS

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[anti])
  JdbcTableScan(table=[[adhoc, _order]])
  JdbcTableScan(table=[[adhoc, order_position]])

Am I missing something and the current aggregate function translation
makes sense?

I constructed relational algebra structures for some other statements
with SEMI- and ANTI-joins and already noticed that these join types
weren't handled in
org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),
which I fixed locally. Is the lack of a translation intentional?

Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an
over-simplification or would you say it's correct? As far as I
understood from https://en.wikipedia.org/wiki/Relational_algebra this is
correct.

I'd be happy to contribute that back. I didn't look into the Sql-to-Rel
translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I
assume that's not that hard and I could add that.

Regards,

Christian




Re: Translation of SQL EXISTS

2020-02-20 Thread Julian Hyde
Thanks. I guess you mean https://issues.apache.org/jira/browse/CALCITE-3810 
<https://issues.apache.org/jira/browse/CALCITE-3810>.


> On Feb 20, 2020, at 11:21 AM, Christian Beikov  
> wrote:
> 
> Alright, so here is the PR for the Rel-to-Sql translation.
> 
> I thought about what you wrote here already Julian, but thanks for mentioning 
> it.
> 
> Would be great if this could make it into 1.22.0, or is it too late for that 
> already?
> 
> Am 19.02.2020 um 22:49 schrieb Julian Hyde:
>> Agreed, it should be in reverse order. Translate to semi-join (or
>> anti-join for NOT EXISTS), then optionally use a rule to rewrite semi-
>> or anti-join to Join+Aggregate.
>> 
>> Note that if the EXISTS is in a disjunction (e.g. "delete from orders
>> where exists (select null from order_items where ...) or shipping >
>> 20") we cannot use a semi-join. We have to use a left join, using an
>> indicator column on the right-hand side that will be null iff there is
>> no match. Which is what we do currently.
>> 
>> On Wed, Feb 19, 2020 at 10:03 AM Haisheng Yuan  
>> wrote:
>>> Hi Christian,
>>> 
>>> For the query in your example, Calcite first generates inner join plan with 
>>> aggregate child, then through SemJoinRule transform the inner join to semi 
>>> or antisemi join. The reason to have inner join is that it allows join 
>>> commutativity, which is good for generating a potential better plan with 
>>> nestedloop join or hash join.
>>> 
>>> Admittedly, this process in Calcite is counter intuitive. It should be in 
>>> reverse order, first generate a semi or anti-semi join, then generate an 
>>> inner/outer join.
>>> 
>>> - Haisheng
>>> 
>>> --
>>> 发件人:Christian Beikov
>>> 日 期:2020年02月19日 21:12:13
>>> 收件人:
>>> 主 题:Translation of SQL EXISTS
>>> 
>>> Hello,
>>> 
>>> I'm a bit confused about how the SQL EXISTS predicate is translated. I'd
>>> assume that an EXISTS is translated in relational algebra to a SEMI- and
>>> NOT EXISTS to an ANTI-join, but it's not.
>>> 
>>> PlannerImpl p = new PlannerImpl(config);
>>> SqlNode sqlNode = p.parse("delete from _order o where exists (select 1
>>> from order_position p where o.id = p.order_id)");
>>> p.validate(sqlNode);
>>> RelRoot rel = p.rel(sqlNode);
>>> RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
>>> SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
>>> sqlWriter.format(result.asStatement());
>>> 
>>> Worse, when printing this, I only get DELETE FROM "public"."_order" i.e.
>>> the EXISTS part is not rendered. This is the plan I get.
>>> 
>>> LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
>>> flattened=[true])
>>>LogicalProject(inputs=[0])
>>>  LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
>>>LogicalJoin(condition=[=($0, $1)], joinType=[inner])
>>>  JdbcTableScan(table=[[adhoc, _order]])
>>>  LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>>>LogicalProject(exprs=[[$1, true]])
>>>  JdbcTableScan(table=[[adhoc, order_position]])
>>> 
>>> I'd expect something along the lines of
>>> 
>>> LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
>>> flattened=[true])
>>>LogicalProject(inputs=[0])
>>>LogicalJoin(condition=[=($0, $1)], joinType=[semi])
>>>  JdbcTableScan(table=[[adhoc, _order]])
>>>  JdbcTableScan(table=[[adhoc, order_position]])
>>> 
>>> and for NOT EXISTS
>>> 
>>> LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
>>> flattened=[true])
>>>LogicalProject(inputs=[0])
>>>LogicalJoin(condition=[=($0, $1)], joinType=[anti])
>>>  JdbcTableScan(table=[[adhoc, _order]])
>>>  JdbcTableScan(table=[[adhoc, order_position]])
>>> 
>>> Am I missing something and the current aggregate function translation
>>> makes sense?
>>> 
>>> I constructed relational algebra structures for some other statements
>>> with SEMI- and ANTI-joins and already noticed that these join types
>>> weren't handled in
>>> org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),
>>> which I fixed locally. Is the lack of a translation intentional?
>>> 
>>> Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an
>>> over-simplification or would you say it's correct? As far as I
>>> understood from https://en.wikipedia.org/wiki/Relational_algebra this is
>>> correct.
>>> 
>>> I'd be happy to contribute that back. I didn't look into the Sql-to-Rel
>>> translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I
>>> assume that's not that hard and I could add that.
>>> 
>>> Regards,
>>> 
>>> Christian
>>> 
>>> 



Re: Translation of SQL EXISTS

2020-02-20 Thread Christian Beikov

Yeah right, somehow forgot to add the link.

Am 20.02.2020 um 21:46 schrieb Julian Hyde:

Thanks. I guess you mean https://issues.apache.org/jira/browse/CALCITE-3810 
<https://issues.apache.org/jira/browse/CALCITE-3810>.



On Feb 20, 2020, at 11:21 AM, Christian Beikov  
wrote:

Alright, so here is the PR for the Rel-to-Sql translation.

I thought about what you wrote here already Julian, but thanks for mentioning 
it.

Would be great if this could make it into 1.22.0, or is it too late for that 
already?

Am 19.02.2020 um 22:49 schrieb Julian Hyde:

Agreed, it should be in reverse order. Translate to semi-join (or
anti-join for NOT EXISTS), then optionally use a rule to rewrite semi-
or anti-join to Join+Aggregate.

Note that if the EXISTS is in a disjunction (e.g. "delete from orders
where exists (select null from order_items where ...) or shipping >
20") we cannot use a semi-join. We have to use a left join, using an
indicator column on the right-hand side that will be null iff there is
no match. Which is what we do currently.

On Wed, Feb 19, 2020 at 10:03 AM Haisheng Yuan  wrote:

Hi Christian,

For the query in your example, Calcite first generates inner join plan with 
aggregate child, then through SemJoinRule transform the inner join to semi or 
antisemi join. The reason to have inner join is that it allows join 
commutativity, which is good for generating a potential better plan with 
nestedloop join or hash join.

Admittedly, this process in Calcite is counter intuitive. It should be in 
reverse order, first generate a semi or anti-semi join, then generate an 
inner/outer join.

- Haisheng

--
发件人:Christian Beikov
日 期:2020年02月19日 21:12:13
收件人:
主 题:Translation of SQL EXISTS

Hello,

I'm a bit confused about how the SQL EXISTS predicate is translated. I'd
assume that an EXISTS is translated in relational algebra to a SEMI- and
NOT EXISTS to an ANTI-join, but it's not.

PlannerImpl p = new PlannerImpl(config);
SqlNode sqlNode = p.parse("delete from _order o where exists (select 1
from order_position p where o.id = p.order_id)");
p.validate(sqlNode);
RelRoot rel = p.rel(sqlNode);
RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
sqlWriter.format(result.asStatement());

Worse, when printing this, I only get DELETE FROM "public"."_order" i.e.
the EXISTS part is not rendered. This is the plan I get.

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
  LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
LogicalJoin(condition=[=($0, $1)], joinType=[inner])
  JdbcTableScan(table=[[adhoc, _order]])
  LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(exprs=[[$1, true]])
  JdbcTableScan(table=[[adhoc, order_position]])

I'd expect something along the lines of

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[semi])
  JdbcTableScan(table=[[adhoc, _order]])
  JdbcTableScan(table=[[adhoc, order_position]])

and for NOT EXISTS

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE],
flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[anti])
  JdbcTableScan(table=[[adhoc, _order]])
  JdbcTableScan(table=[[adhoc, order_position]])

Am I missing something and the current aggregate function translation
makes sense?

I constructed relational algebra structures for some other statements
with SEMI- and ANTI-joins and already noticed that these join types
weren't handled in
org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),
which I fixed locally. Is the lack of a translation intentional?

Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an
over-simplification or would you say it's correct? As far as I
understood from https://en.wikipedia.org/wiki/Relational_algebra this is
correct.

I'd be happy to contribute that back. I didn't look into the Sql-to-Rel
translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I
assume that's not that hard and I could add that.

Regards,

Christian






Re: Re: Translation of SQL EXISTS

2020-02-19 Thread Haisheng Yuan
Hi Christian,

Yes, I do agree it is better to generate SEMI/ANTI during Sql-to-Rel conversion 
if possible.

With regard to the missing Rel-to-Sql conversion for SEMI/ANTI joins, you can 
log a JIRA and open a pull request for it, people are happy to take a look.

- Haisheng

--
发件人:Christian Beikov
日 期:2020年02月20日 03:11:02
收件人:Haisheng Yuan; Apache Calcite dev 
list
主 题:Re: Translation of SQL EXISTS


Hey Haisheng,
it is nice to have a rule that detects such patterns but do you agree that it 
would be better to generate SEMI- and ANTI-joins already during Sql-to-Rel 
conversion?
I implemented the Rel-to-Sql conversion for SEMI- and ANTI-join and it seems to 
work for my use cases, but it feels a bit ugly as I have to do some 
transformations which might be ad-hoc. I had to replace identifiers that refer 
to the outcome of the SEMI- and ANTI-joins with the select items of the LHS 
because the SEMI- and ANTI-join does not get an alias.
See for yourself: 
https://github.com/beikov/calcite/commit/6ca67e2e8df44c2081e71407ee6893a961f36271
I'll add a few tests and a comment, but could anyone please comment on whether 
this approach looks ok?
Regards,
Christian
Am 19.02.2020 um 19:03 schrieb Haisheng Yuan:
 
Hi Christian,

For the query in your example, Calcite first generates inner join plan with 
aggregate child, then through SemJoinRule transform the inner join to semi or 
antisemi join. The reason to have inner join is that it allows join 
commutativity, which is good for generating a potential better plan with 
nestedloop join or hash join.

Admittedly, this process in Calcite is counter intuitive. It should be in 
reverse order, first generate a semi or anti-semi join, then generate an 
inner/outer join.

- Haisheng

--
 发件人:Christian Beikov
 日 期:2020年02月19日 21:12:13
 收件人:
 主 题:Translation of SQL EXISTS

 Hello,

 I'm a bit confused about how the SQL EXISTS predicate is translated. I'd 
 assume that an EXISTS is translated in relational algebra to a SEMI- and 
 NOT EXISTS to an ANTI-join, but it's not.

 PlannerImpl p = new PlannerImpl(config);
 SqlNode sqlNode = p.parse("delete from _order o where exists (select 1 
 from order_position p where o.id = p.order_id)");
 p.validate(sqlNode);
 RelRoot rel = p.rel(sqlNode);
 RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
 SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
 sqlWriter.format(result.asStatement());

 Worse, when printing this, I only get DELETE FROM "public"."_order" i.e. 
 the EXISTS part is not rendered. This is the plan I get.

 LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
 flattened=[true])
LogicalProject(inputs=[0])
  LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
LogicalJoin(condition=[=($0, $1)], joinType=[inner])
  JdbcTableScan(table=[[adhoc, _order]])
  LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(exprs=[[$1, true]])
  JdbcTableScan(table=[[adhoc, order_position]])

 I'd expect something along the lines of

 LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
 flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[semi])
  JdbcTableScan(table=[[adhoc, _order]])
  JdbcTableScan(table=[[adhoc, order_position]])

 and for NOT EXISTS

 LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
 flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[anti])
  JdbcTableScan(table=[[adhoc, _order]])
  JdbcTableScan(table=[[adhoc, order_position]])

 Am I missing something and the current aggregate function translation 
 makes sense?

 I constructed relational algebra structures for some other statements 
 with SEMI- and ANTI-joins and already noticed that these join types 
 weren't handled in 
 
org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),
 
 which I fixed locally. Is the lack of a translation intentional?

 Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an 
 over-simplification or would you say it's correct? As far as I 
 understood from https://en.wikipedia.org/wiki/Relational_algebra this is 
 correct.

 I'd be happy to contribute that back. I didn't look into the Sql-to-Rel 
 translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I 
 assume that's not that hard and I could add that.

 Regards,

 Christian