Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN

2020-07-20 Thread Danny Chan
If it is only constant NOT IN predicate, how difficult it is to rewrite it into 
a normal composite AND predicate before entering the planning phrase ?

Best,
Danny Chan
在 2020年7月21日 +0800 PM12:35,Haisheng Yuan ,写道:
> Thanks Jinpeng for providing a good example for not in subquery.
>
> I 100% agree with you that correlated query won't be represented by 
> ANTI_NOTIN join type, and it is not the proposal's intention. Here what we 
> are discussing is not to use ANTI_NOTIN to represent all the NOT IN 
> sub-queries, that is impossible. Instead, if you take a close look at the 
> example query, it is a simple uncorrelated NOT IN sub-query. That is the 
> target. Let's focus on that kind of query, ask ourselves this question: Can 
> such a simple query be transformed into a ANTI join to make the plan 
> efficient?
>
> Sadly no. The reality is that this kind of query is not uncommon, may be much 
> more common than correlated NOT IN sub-queries.
>
>
> Reply to Julian:
> > > How about making a sub-query type (in RexSubQuery), so it is gone
> > > before we reach algebra.
> It will be nice to have a NOT_IN subquery type, without expanding NOT IN to 
> NOT(IN).
> However, if there is no ANTI_NOTIN in the join type (without reaching 
> algebra), does that mean the optimizer still can't generate efficient plan 
> for simple NOT IN sub-queries?
>
> > > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is
> > > the opposite of NOT IN?!
> It depends how people interpret ANTI. You interpret it as "opposite", I 
> interpret it as "ANTI JOIN", means "anti join for NOT IN, instead of NOT 
> EXISTS". But it is just a naming issue, I am OK to change it whatever name 
> that makes sense to the community, as long as it can convey the meaning.
>
> Thanks,
> Haisheng
>
> On 2020/07/21 03:02:20, Jinpeng Wu  wrote:
> > Hi.
> >
> > In some SQL engine, the query
> > select * from A where c1 not in ( select c1 from B where B.c2 = A.c2);
> > is transformed to a plan like
> > select * from A LEFT ANTI JOIN B on A.c2 = B.c2 AND (A.c1 = B.c1 OR A.c1 is
> > null OR B.c1 is null);
> >
> > Here, the "LEFT ANTI JOIN" is nothing more than traditional definition. One
> > thing seems to be a problem is that A.c1 cannot be used as a join key in
> > the new plan. However, the problem is also there for ANTI_NOTIN, and even
> > other NOT-IN-SUBQUERY physical implementations.
> >
> > Thanks,
> > Qiupeng
> >
> > On Tue, Jul 21, 2020 at 5:30 AM Julian Hyde  wrote:
> >
> > > How about making a sub-query type (in RexSubQuery), so it is gone
> > > before we reach algebra.
> > >
> > > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is
> > > the opposite of NOT IN?!
> > >
> > > On Mon, Jul 20, 2020 at 1:00 PM Haisheng Yuan  wrote:
> > > >
> > > > Typo:
> > > > We can just add a security guard saying that it is supported.
> > > > Should be
> > > > We can just add a security guard saying that it is NOT supported.
> > > >
> > > > On 2020/07/20 19:57:34, Haisheng Yuan  wrote:
> > > > > I am not sure I got your implication by "pollute". If you mean
> > > changes, yes, it requires some changes in rules. Do we need to change
> > > enumerables? Not necessary. We can just add a security guard saying that 
> > > it
> > > is supported. Not everyone requires the Enumerable operators to support
> > > everything. More importantly, currently there is no logic or rules to
> > > translate sub-query directly to SEMI/ANTI joins, let alone translating
> > > directly to ANTI_NOTIN. Currently NOT IN is expanded to NOT(IN ...) before
> > > entering RelNode land. That means we don't even have the chance to 
> > > generate
> > > the NOT IN anti join. Is that still a concern?
> > > > >
> > > > > Even if some day, some contributor extends Calcite's parser and
> > > SubqueryRemovalRule to be able to transform NOT_IN subquery into NOT IN
> > > anti join, we still have chance to disable it. Is that still a concern?
> > > > >
> > > > > There are many ways to play it safe.
> > > > >
> > > > > > Brainstorming: maybe we could consider it as a separate logical
> > > operator
> > > > > > (with its corresponding enumerable implementation)?
> > > > > It doesn't sound cool. It requires much more work. You have to
> > > duplicate all the rules, metadata handler that deal with LogicalJoin, and
> > > for some rule that matches Join base class, you have to check it is a
> > > LogicalJoin or the logical operator for ANTI_NOTIN.
> > > > >
> > > > > On 2020/07/20 08:28:42, Ruben Q L  wrote:
> > > > > > I have some concerns that this new type would "pollute" the existing
> > > Join
> > > > > > logic, rules and enumerable implementations.
> > > > > >
> > > > > > Brainstorming: maybe we could consider it as a separate logical
> > > operator
> > > > > > (with its corresponding enumerable implementation)?
> > > > > >
> > > > > >
> > > > > > Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan 
> > > > > > 
> > > a
> > > > > > écrit :
> > > > > >
> > > > > > > I agree that 

Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN

2020-07-20 Thread Haisheng Yuan
Thanks Jinpeng for providing a good example for not in subquery.

I 100% agree with you that correlated query won't be represented by ANTI_NOTIN 
join type, and it is not the proposal's intention. Here what we are discussing 
is not to use ANTI_NOTIN to represent all the NOT IN sub-queries, that is 
impossible. Instead, if you take a close look at the example query, it is a 
simple uncorrelated NOT IN sub-query. That is the target. Let's focus on that 
kind of query, ask ourselves this question:  Can such a simple query be 
transformed into a ANTI join to make the plan efficient?

Sadly no. The reality is that this kind of query is not uncommon, may be much 
more common than correlated NOT IN sub-queries.


Reply to Julian:
> > How about making a sub-query type (in RexSubQuery), so it is gone
> > before we reach algebra.
It will be nice to have a NOT_IN subquery type, without expanding NOT IN to 
NOT(IN). 
However, if there is no  ANTI_NOTIN in the join type (without reaching 
algebra), does that mean the optimizer still can't generate efficient plan for 
simple NOT IN sub-queries?

> > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is
> > the opposite of NOT IN?!
It depends how people interpret ANTI. You interpret it as "opposite", I 
interpret it as "ANTI JOIN", means "anti join for NOT IN, instead of NOT 
EXISTS". But it is just a naming issue, I am OK to change it whatever name that 
makes sense to the community, as long as it can convey the meaning.

Thanks,
Haisheng

On 2020/07/21 03:02:20, Jinpeng Wu  wrote: 
> Hi.
> 
> In some SQL engine, the query
> select * from A where c1 not in ( select c1 from B where B.c2 = A.c2);
> is transformed to a plan like
> select * from A LEFT ANTI JOIN B on A.c2 = B.c2 AND (A.c1 = B.c1 OR A.c1 is
> null OR B.c1 is null);
> 
> Here, the "LEFT ANTI JOIN" is nothing more than traditional definition. One
> thing seems to be a problem is that A.c1 cannot be used as a join key in
> the new plan. However, the problem is also there for ANTI_NOTIN, and even
> other NOT-IN-SUBQUERY physical implementations.
> 
> Thanks,
> Qiupeng
> 
> On Tue, Jul 21, 2020 at 5:30 AM Julian Hyde  wrote:
> 
> > How about making a sub-query type (in RexSubQuery), so it is gone
> > before we reach algebra.
> >
> > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is
> > the opposite of NOT IN?!
> >
> > On Mon, Jul 20, 2020 at 1:00 PM Haisheng Yuan  wrote:
> > >
> > > Typo:
> > > We can just add a security guard saying that it is supported.
> > > Should be
> > > We can just add a security guard saying that it is NOT supported.
> > >
> > > On 2020/07/20 19:57:34, Haisheng Yuan  wrote:
> > > > I am not sure I got your implication by "pollute". If you mean
> > changes, yes, it requires some changes in rules. Do we need to change
> > enumerables? Not necessary. We can just add a security guard saying that it
> > is supported. Not everyone requires the Enumerable operators to support
> > everything. More importantly, currently there is no logic or rules to
> > translate sub-query directly to SEMI/ANTI joins, let alone translating
> > directly to ANTI_NOTIN. Currently NOT IN is expanded to NOT(IN ...) before
> > entering RelNode land. That means we don't even have the chance to generate
> > the NOT IN anti join. Is that still a concern?
> > > >
> > > > Even if some day, some contributor extends Calcite's parser and
> > SubqueryRemovalRule to be able to  transform NOT_IN subquery into NOT IN
> > anti join, we still have chance to disable it. Is that still a concern?
> > > >
> > > > There are many ways to play it safe.
> > > >
> > > > > Brainstorming: maybe we could consider it as a separate logical
> > operator
> > > > > (with its corresponding enumerable implementation)?
> > > > It doesn't sound cool. It requires much more work. You have to
> > duplicate all the rules, metadata handler that deal with LogicalJoin, and
> > for some rule that matches Join base class, you have to check it is a
> > LogicalJoin or the logical operator for ANTI_NOTIN.
> > > >
> > > > On 2020/07/20 08:28:42, Ruben Q L  wrote:
> > > > > I have some concerns that this new type would "pollute" the existing
> > Join
> > > > > logic, rules and enumerable implementations.
> > > > >
> > > > > Brainstorming: maybe we could consider it as a separate logical
> > operator
> > > > > (with its corresponding enumerable implementation)?
> > > > >
> > > > >
> > > > > Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan 
> > a
> > > > > écrit :
> > > > >
> > > > > > I agree that NOT IN is toxic, and it is error-prone.
> > > > > > But you can't prevent people writing SQL with not in sub-queries,
> > would
> > > > > > you rather let optimizer generate inefficient plan?
> > > > > >
> > > > > > - Haisheng
> > > > > >
> > > > > > --
> > > > > > 发件人:Julian Hyde
> > > > > > 日 期:2020年07月20日 11:56:35
> > > > > > 收件人:dev@calcite.apache.org
> > > > > > 主 题:Re: 

Re: [VOTE] Release apache-calcite-1.24.0 (release candidate 0)

2020-07-20 Thread Francis Chuang

Thanks for making this release available for voting, Chunwei!

Verified GPG Signature - OK
Verified SHA512 - OK
Ran tests per HOWTO (./gradlew check) - OK
Quickly skimmed release notes - Looks good, but I agree with Julian's 
comments.

Spotted checked a few JARs in the Maven repository - OK

Environment (OpenJDK:latest docker container):
Gradle 6.3 (via gradlew)
Oracle Linux Server 7.8
openjdk version "14.0.2" 2020-07-14
OpenJDK Runtime Environment (build 14.0.2+12-46)
OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing)

My vote is: +1 (binding)

Francis

On 21/07/2020 12:07 pm, Haisheng Yuan wrote:

Environment:
Mac OS X 10.15.1, JDK 1.8.0_162

- Checked signatures and checksums, OK
- Ran unit tests (./gradlew build), OK

+1 (binding)


* why is 4032 'breaking'?

With that change, the CalcMergeRule won't match PhysicalNode(including 
EnumerableCalc) in VolcanoPlanner. Perhaps I should elaborate in the release 
notes.


* why is 3786 breaking? (recomputeDigest was not present in 1.23; the
remarks about caching digests are useful, so why aren't they in the
javadoc?)

recomputeDigest() has been there since b0dab68 (2012-05-07). I will add the 
remarks into the javadoc after release.

Thanks,
Haisheng

On 2020/07/21 01:14:17, Julian Hyde  wrote:

Downloaded, checked hashes, built and ran tests on Ubuntu/JDK 14;
checked distro against git (see issue 1); reviewed release notes (see
issue 2).

+1 (binding) but issues 1 and 2 need to be fixed right after the release.

Issue 1. License file is not the same as in source control:

diff -r ./LICENSE /tmp/apache-calcite-1.24.0-src/LICENSE
177a178,189


Additional License files can be found in the 'licenses' folder located in the 
same directory as the LICENSE file (i.e. this file)

- Software produced outside the ASF which is available under other licenses 
(not Apache-2.0)

MIT
* cobyism:html5shiv:3.7.2
* font-awesome:font-awesome-code:4.2.0
* gridsim:gridsim:
* jekyll:jekyll:
* normalize:normalize:3.0.2
* respond:respond:1.4.2


Can you fix the release instructions that the generated LICENSE needs
to be committed (probably at the same time you revise the release
notes).

Issue 2. Release notes

For the 'highlights', I prefer a paragraph with hyperlinks over a list
(see 
https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md#1180--2018-12-21).

Regarding categorization:
* why is 4032 'breaking'?
* why is 3786 breaking? (recomputeDigest was not present in 1.23; the
remarks about caching digests are useful, so why aren't they in the
javadoc?)
* we need a note that a bunch of methods are deprecated in this
release and will be removed before 1.25 (see 3923, 4023 and 4079).
This will break semantic versioning in 1.25, so is a big deal.
* 4073, 3224, 4056, 4008, 3972, 4060 are listed as new features, but I
think they are bug fixes or improved implementations
* 3946, 4089, 4087 are listed as fixes but could be listed as new features
* 4075 should be under 'test suite'
* 4094 description does not need 'follow-up after review comments'
* 4086 is an upgrade, so should be in 'bug fixes', not documentation
* A few places SQL and Java keywords are not in code font (e.g. NPE,
IllegalArgumentException, RexNode, Expression, HAVING, ARRAY, MAP,
CAST)

Julian

On Mon, Jul 20, 2020 at 12:01 PM Michael Mior  wrote:


+1

Checked hash and signature and compiled and ran tests. Thanks Chunwei!

--
Michael Mior
mm...@apache.org

Le lun. 20 juil. 2020 à 11:41, Chunwei Lei  a écrit :


Hi all,

I have created a build for Apache Calcite 1.24.0, release
candidate 0.

Thanks to everyone who has contributed to this release.

You can read the release notes here:
https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md

The commit to be voted upon:
https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=4b5b9100e59ae4a43424156c9beabec6805f3d7c

Its hash is 4b5b9100e59ae4a43424156c9beabec6805f3d7c

Tag:
https://github.com/apache/calcite/tree/calcite-1.24.0-rc0

The artifacts to be voted on are located here:
https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0
(revision 40574)

The hashes of the artifacts are as follows:
ffc7821089a444d50be228b0f0d9d8fb875c98f3b31ed0ad5a81cf5f56b9139dd353fd2c866b5bfd42a06c2a09bca579bcf6ed1e05322be1ae228fd7848f4aec
*apache-calcite-1.24.0-src.tar.gz

A staged Maven repository is available for review at:
https://repository.apache.org/content/repositories/orgapachecalcite-1096/org/apache/calcite/

Release artifacts are signed with the following key:
https://www.apache.org/dist/calcite/KEYS

N.B.
To create the jars and test Apache Calcite: "./gradlew build".

If you do not have a Java environment available, you can run the tests
using docker. To do so, install docker and docker-compose, then run
"docker-compose run test" from the root of the directory.

Please vote on releasing this package as Apache Calcite 1.24.0.

The vote is open for the next 72 hours and passes if a majority of 

Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN

2020-07-20 Thread Jinpeng Wu
Hi.

In some SQL engine, the query
select * from A where c1 not in ( select c1 from B where B.c2 = A.c2);
is transformed to a plan like
select * from A LEFT ANTI JOIN B on A.c2 = B.c2 AND (A.c1 = B.c1 OR A.c1 is
null OR B.c1 is null);

Here, the "LEFT ANTI JOIN" is nothing more than traditional definition. One
thing seems to be a problem is that A.c1 cannot be used as a join key in
the new plan. However, the problem is also there for ANTI_NOTIN, and even
other NOT-IN-SUBQUERY physical implementations.

Thanks,
Qiupeng

On Tue, Jul 21, 2020 at 5:30 AM Julian Hyde  wrote:

> How about making a sub-query type (in RexSubQuery), so it is gone
> before we reach algebra.
>
> ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is
> the opposite of NOT IN?!
>
> On Mon, Jul 20, 2020 at 1:00 PM Haisheng Yuan  wrote:
> >
> > Typo:
> > We can just add a security guard saying that it is supported.
> > Should be
> > We can just add a security guard saying that it is NOT supported.
> >
> > On 2020/07/20 19:57:34, Haisheng Yuan  wrote:
> > > I am not sure I got your implication by "pollute". If you mean
> changes, yes, it requires some changes in rules. Do we need to change
> enumerables? Not necessary. We can just add a security guard saying that it
> is supported. Not everyone requires the Enumerable operators to support
> everything. More importantly, currently there is no logic or rules to
> translate sub-query directly to SEMI/ANTI joins, let alone translating
> directly to ANTI_NOTIN. Currently NOT IN is expanded to NOT(IN ...) before
> entering RelNode land. That means we don't even have the chance to generate
> the NOT IN anti join. Is that still a concern?
> > >
> > > Even if some day, some contributor extends Calcite's parser and
> SubqueryRemovalRule to be able to  transform NOT_IN subquery into NOT IN
> anti join, we still have chance to disable it. Is that still a concern?
> > >
> > > There are many ways to play it safe.
> > >
> > > > Brainstorming: maybe we could consider it as a separate logical
> operator
> > > > (with its corresponding enumerable implementation)?
> > > It doesn't sound cool. It requires much more work. You have to
> duplicate all the rules, metadata handler that deal with LogicalJoin, and
> for some rule that matches Join base class, you have to check it is a
> LogicalJoin or the logical operator for ANTI_NOTIN.
> > >
> > > On 2020/07/20 08:28:42, Ruben Q L  wrote:
> > > > I have some concerns that this new type would "pollute" the existing
> Join
> > > > logic, rules and enumerable implementations.
> > > >
> > > > Brainstorming: maybe we could consider it as a separate logical
> operator
> > > > (with its corresponding enumerable implementation)?
> > > >
> > > >
> > > > Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan 
> a
> > > > écrit :
> > > >
> > > > > I agree that NOT IN is toxic, and it is error-prone.
> > > > > But you can't prevent people writing SQL with not in sub-queries,
> would
> > > > > you rather let optimizer generate inefficient plan?
> > > > >
> > > > > - Haisheng
> > > > >
> > > > > --
> > > > > 发件人:Julian Hyde
> > > > > 日 期:2020年07月20日 11:56:35
> > > > > 收件人:dev@calcite.apache.org
> > > > > 主 题:Re: [DISCUSS] New Join Type: ANTI_NOTIN
> > > > >
> > > > > Yuck!
> > > > >
> > > > > NOT IN is toxic. I'd rather keep it out of the algebra.
> > > > >
> > > > > On Sun, Jul 19, 2020 at 8:24 PM Haisheng Yuan 
> wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Currently, JoinRelType.ANTI only represents NOT_EXISTS subquery
> (thanks
> > > > > to Ruben for reminding).
> > > > > > For some simple boolean context NOT_IN subquery, we can't
> transform it
> > > > > to ANTI join. e.g.:
> > > > > >
> > > > > > SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- bar.b is
> > > > > nullable
> > > > > >
> > > > > > Because if there is a null value in the results of subquery, the
> NOT IN
> > > > > predicate will return false, the whole query returns empty. And in
> Calcite,
> > > > > the plan for this kind of query is inefficient.
> > > > > >
> > > > > > If we have ANTI_NOTIN to represent this kind of join, we can
> generate
> > > > > more efficient plan, as long as the query executor support it.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > Haisheng Yuan
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
>


Re: [VOTE] Release apache-calcite-1.24.0 (release candidate 0)

2020-07-20 Thread Haisheng Yuan
Environment: 
Mac OS X 10.15.1, JDK 1.8.0_162

- Checked signatures and checksums, OK
- Ran unit tests (./gradlew build), OK

+1 (binding)

> * why is 4032 'breaking'?
With that change, the CalcMergeRule won't match PhysicalNode(including 
EnumerableCalc) in VolcanoPlanner. Perhaps I should elaborate in the release 
notes.

> * why is 3786 breaking? (recomputeDigest was not present in 1.23; the
> remarks about caching digests are useful, so why aren't they in the
> javadoc?)
recomputeDigest() has been there since b0dab68 (2012-05-07). I will add the 
remarks into the javadoc after release.

Thanks,
Haisheng

On 2020/07/21 01:14:17, Julian Hyde  wrote: 
> Downloaded, checked hashes, built and ran tests on Ubuntu/JDK 14;
> checked distro against git (see issue 1); reviewed release notes (see
> issue 2).
> 
> +1 (binding) but issues 1 and 2 need to be fixed right after the release.
> 
> Issue 1. License file is not the same as in source control:
> 
> diff -r ./LICENSE /tmp/apache-calcite-1.24.0-src/LICENSE
> 177a178,189
> >
> > Additional License files can be found in the 'licenses' folder located in 
> > the same directory as the LICENSE file (i.e. this file)
> >
> > - Software produced outside the ASF which is available under other licenses 
> > (not Apache-2.0)
> >
> > MIT
> > * cobyism:html5shiv:3.7.2
> > * font-awesome:font-awesome-code:4.2.0
> > * gridsim:gridsim:
> > * jekyll:jekyll:
> > * normalize:normalize:3.0.2
> > * respond:respond:1.4.2
> 
> Can you fix the release instructions that the generated LICENSE needs
> to be committed (probably at the same time you revise the release
> notes).
> 
> Issue 2. Release notes
> 
> For the 'highlights', I prefer a paragraph with hyperlinks over a list
> (see 
> https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md#1180--2018-12-21).
> 
> Regarding categorization:
> * why is 4032 'breaking'?
> * why is 3786 breaking? (recomputeDigest was not present in 1.23; the
> remarks about caching digests are useful, so why aren't they in the
> javadoc?)
> * we need a note that a bunch of methods are deprecated in this
> release and will be removed before 1.25 (see 3923, 4023 and 4079).
> This will break semantic versioning in 1.25, so is a big deal.
> * 4073, 3224, 4056, 4008, 3972, 4060 are listed as new features, but I
> think they are bug fixes or improved implementations
> * 3946, 4089, 4087 are listed as fixes but could be listed as new features
> * 4075 should be under 'test suite'
> * 4094 description does not need 'follow-up after review comments'
> * 4086 is an upgrade, so should be in 'bug fixes', not documentation
> * A few places SQL and Java keywords are not in code font (e.g. NPE,
> IllegalArgumentException, RexNode, Expression, HAVING, ARRAY, MAP,
> CAST)
> 
> Julian
> 
> On Mon, Jul 20, 2020 at 12:01 PM Michael Mior  wrote:
> >
> > +1
> >
> > Checked hash and signature and compiled and ran tests. Thanks Chunwei!
> >
> > --
> > Michael Mior
> > mm...@apache.org
> >
> > Le lun. 20 juil. 2020 à 11:41, Chunwei Lei  a écrit 
> > :
> > >
> > > Hi all,
> > >
> > > I have created a build for Apache Calcite 1.24.0, release
> > > candidate 0.
> > >
> > > Thanks to everyone who has contributed to this release.
> > >
> > > You can read the release notes here:
> > > https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md
> > >
> > > The commit to be voted upon:
> > > https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=4b5b9100e59ae4a43424156c9beabec6805f3d7c
> > >
> > > Its hash is 4b5b9100e59ae4a43424156c9beabec6805f3d7c
> > >
> > > Tag:
> > > https://github.com/apache/calcite/tree/calcite-1.24.0-rc0
> > >
> > > The artifacts to be voted on are located here:
> > > https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0
> > > (revision 40574)
> > >
> > > The hashes of the artifacts are as follows:
> > > ffc7821089a444d50be228b0f0d9d8fb875c98f3b31ed0ad5a81cf5f56b9139dd353fd2c866b5bfd42a06c2a09bca579bcf6ed1e05322be1ae228fd7848f4aec
> > > *apache-calcite-1.24.0-src.tar.gz
> > >
> > > A staged Maven repository is available for review at:
> > > https://repository.apache.org/content/repositories/orgapachecalcite-1096/org/apache/calcite/
> > >
> > > Release artifacts are signed with the following key:
> > > https://www.apache.org/dist/calcite/KEYS
> > >
> > > N.B.
> > > To create the jars and test Apache Calcite: "./gradlew build".
> > >
> > > If you do not have a Java environment available, you can run the tests
> > > using docker. To do so, install docker and docker-compose, then run
> > > "docker-compose run test" from the root of the directory.
> > >
> > > Please vote on releasing this package as Apache Calcite 1.24.0.
> > >
> > > The vote is open for the next 72 hours and passes if a majority of at
> > > least three +1 PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache Calcite 1.24.0
> > > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > > [ ] -1 Do 

Re: [VOTE] Release apache-calcite-1.24.0 (release candidate 0)

2020-07-20 Thread Julian Hyde
Downloaded, checked hashes, built and ran tests on Ubuntu/JDK 14;
checked distro against git (see issue 1); reviewed release notes (see
issue 2).

+1 (binding) but issues 1 and 2 need to be fixed right after the release.

Issue 1. License file is not the same as in source control:

diff -r ./LICENSE /tmp/apache-calcite-1.24.0-src/LICENSE
177a178,189
>
> Additional License files can be found in the 'licenses' folder located in the 
> same directory as the LICENSE file (i.e. this file)
>
> - Software produced outside the ASF which is available under other licenses 
> (not Apache-2.0)
>
> MIT
> * cobyism:html5shiv:3.7.2
> * font-awesome:font-awesome-code:4.2.0
> * gridsim:gridsim:
> * jekyll:jekyll:
> * normalize:normalize:3.0.2
> * respond:respond:1.4.2

Can you fix the release instructions that the generated LICENSE needs
to be committed (probably at the same time you revise the release
notes).

Issue 2. Release notes

For the 'highlights', I prefer a paragraph with hyperlinks over a list
(see 
https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md#1180--2018-12-21).

Regarding categorization:
* why is 4032 'breaking'?
* why is 3786 breaking? (recomputeDigest was not present in 1.23; the
remarks about caching digests are useful, so why aren't they in the
javadoc?)
* we need a note that a bunch of methods are deprecated in this
release and will be removed before 1.25 (see 3923, 4023 and 4079).
This will break semantic versioning in 1.25, so is a big deal.
* 4073, 3224, 4056, 4008, 3972, 4060 are listed as new features, but I
think they are bug fixes or improved implementations
* 3946, 4089, 4087 are listed as fixes but could be listed as new features
* 4075 should be under 'test suite'
* 4094 description does not need 'follow-up after review comments'
* 4086 is an upgrade, so should be in 'bug fixes', not documentation
* A few places SQL and Java keywords are not in code font (e.g. NPE,
IllegalArgumentException, RexNode, Expression, HAVING, ARRAY, MAP,
CAST)

Julian

On Mon, Jul 20, 2020 at 12:01 PM Michael Mior  wrote:
>
> +1
>
> Checked hash and signature and compiled and ran tests. Thanks Chunwei!
>
> --
> Michael Mior
> mm...@apache.org
>
> Le lun. 20 juil. 2020 à 11:41, Chunwei Lei  a écrit :
> >
> > Hi all,
> >
> > I have created a build for Apache Calcite 1.24.0, release
> > candidate 0.
> >
> > Thanks to everyone who has contributed to this release.
> >
> > You can read the release notes here:
> > https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md
> >
> > The commit to be voted upon:
> > https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=4b5b9100e59ae4a43424156c9beabec6805f3d7c
> >
> > Its hash is 4b5b9100e59ae4a43424156c9beabec6805f3d7c
> >
> > Tag:
> > https://github.com/apache/calcite/tree/calcite-1.24.0-rc0
> >
> > The artifacts to be voted on are located here:
> > https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0
> > (revision 40574)
> >
> > The hashes of the artifacts are as follows:
> > ffc7821089a444d50be228b0f0d9d8fb875c98f3b31ed0ad5a81cf5f56b9139dd353fd2c866b5bfd42a06c2a09bca579bcf6ed1e05322be1ae228fd7848f4aec
> > *apache-calcite-1.24.0-src.tar.gz
> >
> > A staged Maven repository is available for review at:
> > https://repository.apache.org/content/repositories/orgapachecalcite-1096/org/apache/calcite/
> >
> > Release artifacts are signed with the following key:
> > https://www.apache.org/dist/calcite/KEYS
> >
> > N.B.
> > To create the jars and test Apache Calcite: "./gradlew build".
> >
> > If you do not have a Java environment available, you can run the tests
> > using docker. To do so, install docker and docker-compose, then run
> > "docker-compose run test" from the root of the directory.
> >
> > Please vote on releasing this package as Apache Calcite 1.24.0.
> >
> > The vote is open for the next 72 hours and passes if a majority of at
> > least three +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Calcite 1.24.0
> > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > [ ] -1 Do not release this package because...
> >
> >
> > Here is my vote:
> >
> > +1 (non-binding)
> >
> >
> >
> > Best,
> > Chunwei


[jira] [Created] (CALCITE-4135) Cannot find SqlDdlParserImpl class, which seems necessary for parsing DDL.

2020-07-20 Thread Stewart Bryson (Jira)
Stewart Bryson created CALCITE-4135:
---

 Summary: Cannot find SqlDdlParserImpl class, which seems necessary 
for parsing DDL.
 Key: CALCITE-4135
 URL: https://issues.apache.org/jira/browse/CALCITE-4135
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.23.0
 Environment: Custom Java classes built using Gradle.
Reporter: Stewart Bryson


I am trying to use parse DDL statements, and tried code similar to the 
following:

[https://stackoverflow.com/questions/53801005/apache-calcite-cant-seem-to-parse-ddl-statements]

However, I cannot find the SqlDdlParserImpl class anywhere. I see several 
references to it in the GitHub repo: 
[https://github.com/apache/calcite/search?q=SqlDdlParserImpl_q=SqlDdlParserImpl]

I have tried using the calcite-core and calcite-server dependencies. What am I 
missing?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] New RexNode: RexListCmp

2020-07-20 Thread Stamatis Zampetakis
Another quick thought as far as it concerns the IN operator would be to use
RexCall as it is right now where the first operand in the list is a
RexInputRef for instance and the rest are the literals.
I assume that taking this direction would need to change a bit the
respective SqlOperator.

I haven't thought of this thoroughly so maybe there are important things
that I am missing.

Best,
Stamatis


On Tue, Jul 21, 2020 at 12:41 AM Julian Hyde  wrote:

> The name isn't very intuitive.
>
> The concept of a list and a comparison operator seems OK. As Vladimir
> points out, it is somewhat similar to RexSubQuery, so maybe this could
> be a sub-class (but organizing the data a bit more efficiently).
>
> I would be very wary of null semantics. RexNode scalar operators are
> forced to do 3-valued logic, but this is almost a relational operator
> and it would be better without that burden.
>
> Julian
>
>
>
> On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov
>  wrote:
> >
> > >Do you know what is the impact on Enumerable implementation?
> >
> > I guess there are plenty of options there.
> >
> > The key question regarding RexListCmp is as we introduce a new Rex node,
> > all the planning rules and all engines
> > must support it somehow.
> >
> > Technically speaking, we have RexSubQuery.
> > Haisheng, have you considered an option to stick with RexSubQuery to
> avoid
> > having two more-or-less the same rex classes?
> >
> > Vladimir
>


Re: custom metadata logic

2020-07-20 Thread Stamatis Zampetakis
Hi Taz,

If you are relying on the RelMetadataQuery [1] API then you may need to set
your provided into THREAD_PROVIDERS in a similar way that it is done in
RelMetadataTest [2].

Best,
Stamatis

[1]
https://github.com/apache/calcite/blob/7a462f2b2f78aa12068b691c1e423ea4c8a825e4/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L75
[2]
https://github.com/apache/calcite/blob/7a462f2b2f78aa12068b691c1e423ea4c8a825e4/core/src/test/java/org/apache/calcite/test/RelMetadataTest.java#L893

On Mon, Jul 20, 2020 at 6:15 PM Tal Glanzman  wrote:

> hi,
>
> I am trying to figure out how to add custom logic for providing metadata to
> nodes inside my adapter with no luck.
>
> I have an implementation of my own RelMetadataProvider, as described in the
> docs.
>
> After looking around, i still couldn't find a way to use this provider in
> the planning phase
> - no property in the jdbc connection string
> - tried to invoke setMetadataProvider(provider) on the cluster of the
> initial nodes of the tree
>
> drilling a bit, i notice that the provider in use is
> JaninoRelMetadataProvider with provider=DefaultRelMetadataProvider.
>
> Is there a way to do such a thing using the standard execution?
>
> any guidance will be much appreciated, thanks!
>


Re: [DISCUSS] New RexNode: RexListCmp

2020-07-20 Thread Julian Hyde
The name isn't very intuitive.

The concept of a list and a comparison operator seems OK. As Vladimir
points out, it is somewhat similar to RexSubQuery, so maybe this could
be a sub-class (but organizing the data a bit more efficiently).

I would be very wary of null semantics. RexNode scalar operators are
forced to do 3-valued logic, but this is almost a relational operator
and it would be better without that burden.

Julian



On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov
 wrote:
>
> >Do you know what is the impact on Enumerable implementation?
>
> I guess there are plenty of options there.
>
> The key question regarding RexListCmp is as we introduce a new Rex node,
> all the planning rules and all engines
> must support it somehow.
>
> Technically speaking, we have RexSubQuery.
> Haisheng, have you considered an option to stick with RexSubQuery to avoid
> having two more-or-less the same rex classes?
>
> Vladimir


Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN

2020-07-20 Thread Haisheng Yuan
Typo:
We can just add a security guard saying that it is supported.
Should be 
We can just add a security guard saying that it is NOT supported.

On 2020/07/20 19:57:34, Haisheng Yuan  wrote: 
> I am not sure I got your implication by "pollute". If you mean changes, yes, 
> it requires some changes in rules. Do we need to change enumerables? Not 
> necessary. We can just add a security guard saying that it is supported. Not 
> everyone requires the Enumerable operators to support everything. More 
> importantly, currently there is no logic or rules to translate sub-query 
> directly to SEMI/ANTI joins, let alone translating directly to ANTI_NOTIN. 
> Currently NOT IN is expanded to NOT(IN ...) before entering RelNode land. 
> That means we don't even have the chance to generate the NOT IN anti join. Is 
> that still a concern?
> 
> Even if some day, some contributor extends Calcite's parser and 
> SubqueryRemovalRule to be able to  transform NOT_IN subquery into NOT IN anti 
> join, we still have chance to disable it. Is that still a concern?
> 
> There are many ways to play it safe.
> 
> > Brainstorming: maybe we could consider it as a separate logical operator
> > (with its corresponding enumerable implementation)?
> It doesn't sound cool. It requires much more work. You have to duplicate all 
> the rules, metadata handler that deal with LogicalJoin, and for some rule 
> that matches Join base class, you have to check it is a LogicalJoin or the 
> logical operator for ANTI_NOTIN.
> 
> On 2020/07/20 08:28:42, Ruben Q L  wrote: 
> > I have some concerns that this new type would "pollute" the existing Join
> > logic, rules and enumerable implementations.
> > 
> > Brainstorming: maybe we could consider it as a separate logical operator
> > (with its corresponding enumerable implementation)?
> > 
> > 
> > Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan  a
> > écrit :
> > 
> > > I agree that NOT IN is toxic, and it is error-prone.
> > > But you can't prevent people writing SQL with not in sub-queries, would
> > > you rather let optimizer generate inefficient plan?
> > >
> > > - Haisheng
> > >
> > > --
> > > 发件人:Julian Hyde
> > > 日 期:2020年07月20日 11:56:35
> > > 收件人:dev@calcite.apache.org
> > > 主 题:Re: [DISCUSS] New Join Type: ANTI_NOTIN
> > >
> > > Yuck!
> > >
> > > NOT IN is toxic. I'd rather keep it out of the algebra.
> > >
> > > On Sun, Jul 19, 2020 at 8:24 PM Haisheng Yuan  wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Currently, JoinRelType.ANTI only represents NOT_EXISTS subquery (thanks
> > > to Ruben for reminding).
> > > > For some simple boolean context NOT_IN subquery, we can't transform it
> > > to ANTI join. e.g.:
> > > >
> > > > SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- bar.b is
> > > nullable
> > > >
> > > > Because if there is a null value in the results of subquery, the NOT IN
> > > predicate will return false, the whole query returns empty. And in 
> > > Calcite,
> > > the plan for this kind of query is inefficient.
> > > >
> > > > If we have ANTI_NOTIN to represent this kind of join, we can generate
> > > more efficient plan, as long as the query executor support it.
> > > >
> > > > Thoughts?
> > > >
> > > > Haisheng Yuan
> > > >
> > > >
> > >
> > 
> 


Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN

2020-07-20 Thread Haisheng Yuan
I am not sure I got your implication by "pollute". If you mean changes, yes, it 
requires some changes in rules. Do we need to change enumerables? Not 
necessary. We can just add a security guard saying that it is supported. Not 
everyone requires the Enumerable operators to support everything. More 
importantly, currently there is no logic or rules to translate sub-query 
directly to SEMI/ANTI joins, let alone translating directly to ANTI_NOTIN. 
Currently NOT IN is expanded to NOT(IN ...) before entering RelNode land. That 
means we don't even have the chance to generate the NOT IN anti join. Is that 
still a concern?

Even if some day, some contributor extends Calcite's parser and 
SubqueryRemovalRule to be able to  transform NOT_IN subquery into NOT IN anti 
join, we still have chance to disable it. Is that still a concern?

There are many ways to play it safe.

> Brainstorming: maybe we could consider it as a separate logical operator
> (with its corresponding enumerable implementation)?
It doesn't sound cool. It requires much more work. You have to duplicate all 
the rules, metadata handler that deal with LogicalJoin, and for some rule that 
matches Join base class, you have to check it is a LogicalJoin or the logical 
operator for ANTI_NOTIN.

On 2020/07/20 08:28:42, Ruben Q L  wrote: 
> I have some concerns that this new type would "pollute" the existing Join
> logic, rules and enumerable implementations.
> 
> Brainstorming: maybe we could consider it as a separate logical operator
> (with its corresponding enumerable implementation)?
> 
> 
> Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan  a
> écrit :
> 
> > I agree that NOT IN is toxic, and it is error-prone.
> > But you can't prevent people writing SQL with not in sub-queries, would
> > you rather let optimizer generate inefficient plan?
> >
> > - Haisheng
> >
> > --
> > 发件人:Julian Hyde
> > 日 期:2020年07月20日 11:56:35
> > 收件人:dev@calcite.apache.org
> > 主 题:Re: [DISCUSS] New Join Type: ANTI_NOTIN
> >
> > Yuck!
> >
> > NOT IN is toxic. I'd rather keep it out of the algebra.
> >
> > On Sun, Jul 19, 2020 at 8:24 PM Haisheng Yuan  wrote:
> > >
> > > Hi all,
> > >
> > > Currently, JoinRelType.ANTI only represents NOT_EXISTS subquery (thanks
> > to Ruben for reminding).
> > > For some simple boolean context NOT_IN subquery, we can't transform it
> > to ANTI join. e.g.:
> > >
> > > SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- bar.b is
> > nullable
> > >
> > > Because if there is a null value in the results of subquery, the NOT IN
> > predicate will return false, the whole query returns empty. And in Calcite,
> > the plan for this kind of query is inefficient.
> > >
> > > If we have ANTI_NOTIN to represent this kind of join, we can generate
> > more efficient plan, as long as the query executor support it.
> > >
> > > Thoughts?
> > >
> > > Haisheng Yuan
> > >
> > >
> >
> 


Re: [VOTE] Release apache-calcite-1.24.0 (release candidate 0)

2020-07-20 Thread Michael Mior
+1

Checked hash and signature and compiled and ran tests. Thanks Chunwei!

--
Michael Mior
mm...@apache.org

Le lun. 20 juil. 2020 à 11:41, Chunwei Lei  a écrit :
>
> Hi all,
>
> I have created a build for Apache Calcite 1.24.0, release
> candidate 0.
>
> Thanks to everyone who has contributed to this release.
>
> You can read the release notes here:
> https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md
>
> The commit to be voted upon:
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=4b5b9100e59ae4a43424156c9beabec6805f3d7c
>
> Its hash is 4b5b9100e59ae4a43424156c9beabec6805f3d7c
>
> Tag:
> https://github.com/apache/calcite/tree/calcite-1.24.0-rc0
>
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0
> (revision 40574)
>
> The hashes of the artifacts are as follows:
> ffc7821089a444d50be228b0f0d9d8fb875c98f3b31ed0ad5a81cf5f56b9139dd353fd2c866b5bfd42a06c2a09bca579bcf6ed1e05322be1ae228fd7848f4aec
> *apache-calcite-1.24.0-src.tar.gz
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachecalcite-1096/org/apache/calcite/
>
> Release artifacts are signed with the following key:
> https://www.apache.org/dist/calcite/KEYS
>
> N.B.
> To create the jars and test Apache Calcite: "./gradlew build".
>
> If you do not have a Java environment available, you can run the tests
> using docker. To do so, install docker and docker-compose, then run
> "docker-compose run test" from the root of the directory.
>
> Please vote on releasing this package as Apache Calcite 1.24.0.
>
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Calcite 1.24.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (non-binding)
>
>
>
> Best,
> Chunwei


[VOTE] Release apache-calcite-1.24.0 (release candidate 0)

2020-07-20 Thread Chunwei Lei
Hi all,

I have created a build for Apache Calcite 1.24.0, release
candidate 0.

Thanks to everyone who has contributed to this release.

You can read the release notes here:
https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md

The commit to be voted upon:
https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=4b5b9100e59ae4a43424156c9beabec6805f3d7c

Its hash is 4b5b9100e59ae4a43424156c9beabec6805f3d7c

Tag:
https://github.com/apache/calcite/tree/calcite-1.24.0-rc0

The artifacts to be voted on are located here:
https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0
(revision 40574)

The hashes of the artifacts are as follows:
ffc7821089a444d50be228b0f0d9d8fb875c98f3b31ed0ad5a81cf5f56b9139dd353fd2c866b5bfd42a06c2a09bca579bcf6ed1e05322be1ae228fd7848f4aec
*apache-calcite-1.24.0-src.tar.gz

A staged Maven repository is available for review at:
https://repository.apache.org/content/repositories/orgapachecalcite-1096/org/apache/calcite/

Release artifacts are signed with the following key:
https://www.apache.org/dist/calcite/KEYS

N.B.
To create the jars and test Apache Calcite: "./gradlew build".

If you do not have a Java environment available, you can run the tests
using docker. To do so, install docker and docker-compose, then run
"docker-compose run test" from the root of the directory.

Please vote on releasing this package as Apache Calcite 1.24.0.

The vote is open for the next 72 hours and passes if a majority of at
least three +1 PMC votes are cast.

[ ] +1 Release this package as Apache Calcite 1.24.0
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...


Here is my vote:

+1 (non-binding)



Best,
Chunwei


custom metadata logic

2020-07-20 Thread Tal Glanzman
hi,

I am trying to figure out how to add custom logic for providing metadata to
nodes inside my adapter with no luck.

I have an implementation of my own RelMetadataProvider, as described in the
docs.

After looking around, i still couldn't find a way to use this provider in
the planning phase
- no property in the jdbc connection string
- tried to invoke setMetadataProvider(provider) on the cluster of the
initial nodes of the tree

drilling a bit, i notice that the provider in use is
JaninoRelMetadataProvider with provider=DefaultRelMetadataProvider.

Is there a way to do such a thing using the standard execution?

any guidance will be much appreciated, thanks!


[jira] [Created] (CALCITE-4133) Shouldn't trim fields when it's under Union

2020-07-20 Thread Jiatao Tao (Jira)
Jiatao Tao created CALCITE-4133:
---

 Summary: Shouldn't trim fields when it's under Union
 Key: CALCITE-4133
 URL: https://issues.apache.org/jira/browse/CALCITE-4133
 Project: Calcite
  Issue Type: Bug
Reporter: Jiatao Tao
Assignee: Jiatao Tao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS] SqlDialect.getCastSpec() doesn't work when unparsing SqlNode tree

2020-07-20 Thread Pavel Gubin
Currently SqlDialect.getCastSpec() is used to amend types to cast to for 
different dialects and works during Rel to Sql conversion. That means it cannot 
be used when unparsing SqlNode tree to a specific dialect. Does somebody know 
was it made intentionally and would it be better to provide alternative method 
to convert cast types fully in SqlNode tree that can be used during 
SqlNode.unparse()?

Thanks,
Pavel Gubin

Re: [DISCUSS] New RexNode: RexListCmp

2020-07-20 Thread Vladimir Sitnikov
>Do you know what is the impact on Enumerable implementation?

I guess there are plenty of options there.

The key question regarding RexListCmp is as we introduce a new Rex node,
all the planning rules and all engines
must support it somehow.

Technically speaking, we have RexSubQuery.
Haisheng, have you considered an option to stick with RexSubQuery to avoid
having two more-or-less the same rex classes?

Vladimir


Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN

2020-07-20 Thread Ruben Q L
I have some concerns that this new type would "pollute" the existing Join
logic, rules and enumerable implementations.

Brainstorming: maybe we could consider it as a separate logical operator
(with its corresponding enumerable implementation)?


Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan  a
écrit :

> I agree that NOT IN is toxic, and it is error-prone.
> But you can't prevent people writing SQL with not in sub-queries, would
> you rather let optimizer generate inefficient plan?
>
> - Haisheng
>
> --
> 发件人:Julian Hyde
> 日 期:2020年07月20日 11:56:35
> 收件人:dev@calcite.apache.org
> 主 题:Re: [DISCUSS] New Join Type: ANTI_NOTIN
>
> Yuck!
>
> NOT IN is toxic. I'd rather keep it out of the algebra.
>
> On Sun, Jul 19, 2020 at 8:24 PM Haisheng Yuan  wrote:
> >
> > Hi all,
> >
> > Currently, JoinRelType.ANTI only represents NOT_EXISTS subquery (thanks
> to Ruben for reminding).
> > For some simple boolean context NOT_IN subquery, we can't transform it
> to ANTI join. e.g.:
> >
> > SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- bar.b is
> nullable
> >
> > Because if there is a null value in the results of subquery, the NOT IN
> predicate will return false, the whole query returns empty. And in Calcite,
> the plan for this kind of query is inefficient.
> >
> > If we have ANTI_NOTIN to represent this kind of join, we can generate
> more efficient plan, as long as the query executor support it.
> >
> > Thoughts?
> >
> > Haisheng Yuan
> >
> >
>


Re: [DISCUSS] New RexNode: RexListCmp

2020-07-20 Thread Rui Wang
Hi Haisheng,

Do you know what is the impact on Enumerable implementation? Will
RexListCmp still be converted AND/OR something else in that layer or there
will be new logic introduced to generate code for RexListCmp?

-Rui

On Sun, Jul 19, 2020 at 10:13 PM Enrico Olivelli 
wrote:

> Il Lun 20 Lug 2020, 03:00 Haisheng Yuan  ha scritto:
>
> > Hi all,
> >
> > This is a rough idea, I'd like to see how the community think about it.
> >
> > RexListCmp extends RexNode / RexCall {
> >   public final SqlOperator op;
> >   public final RexNode left;
> >   public final ImmutableList list;
> >   public final RexQuantifier quantifier;
> >   public final RelDataType type;
> > }
> >
> > Enum RexQuantifier {
> >   ALL,
> >   ANY
> > }
> >
> > Background:
> >
> > It is not uncommon that the query contains large number of constant IN
> > list, e.g.
> > 1) SELECT * FROM foo WHERE a NOT IN (1, 2, 3, , 1);
> > 2) SELECT * FROM bar WHERE b IN (1, 2, 3, , 1);
> >
> > Currently, Calcite either translates it into a Join, or expand to OR/AND,
> > which is inefficient, and may cause problems.
> >
>
> Yes. It is not efficient.
>
> I would love to see this new feature
>
> Thanks
> Enrico
>
>
>
>
> > With RexListCmp, the predicate in query 1) will be represented as:
> > RexListCmp {
> >   op = "<>",
> >   left = "a"
> >   list = "1,2,3...1"
> >   quantifier = "ALL"
> > }
> >
> > The predicate in query 2) will be represented as:
> > RexListCmp {
> >   op = "=",
> >   left = "b"
> >   list = "1,2,3...1"
> >   quantifier = "ANY"
> > }
> >
> > It may also be used to represent the predicate in the following query:
> >
> > SELECT * FROM bar WHERE (a,b) IN / NOT IN ((1,1), (2,2), (3,3), ...
> (1000,
> > 1000));
> >
> > Further more, it is extensible. The op is not limited to be equals or not
> > equals, it also be >, <, >=, <=, IDF, INDF or even customized sql
> operator
> > like geospatial operator intersect:
> > boolean &&( geometry A , geometry B )
> >
> > Thoughts?
> >
> > Thanks,
> > Haisheng Yuan
> >
> >
> >
>