Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN
If it is only constant NOT IN predicate, how difficult it is to rewrite it into a normal composite AND predicate before entering the planning phrase ? Best, Danny Chan 在 2020年7月21日 +0800 PM12:35,Haisheng Yuan ,写道: > Thanks Jinpeng for providing a good example for not in subquery. > > I 100% agree with you that correlated query won't be represented by > ANTI_NOTIN join type, and it is not the proposal's intention. Here what we > are discussing is not to use ANTI_NOTIN to represent all the NOT IN > sub-queries, that is impossible. Instead, if you take a close look at the > example query, it is a simple uncorrelated NOT IN sub-query. That is the > target. Let's focus on that kind of query, ask ourselves this question: Can > such a simple query be transformed into a ANTI join to make the plan > efficient? > > Sadly no. The reality is that this kind of query is not uncommon, may be much > more common than correlated NOT IN sub-queries. > > > Reply to Julian: > > > How about making a sub-query type (in RexSubQuery), so it is gone > > > before we reach algebra. > It will be nice to have a NOT_IN subquery type, without expanding NOT IN to > NOT(IN). > However, if there is no ANTI_NOTIN in the join type (without reaching > algebra), does that mean the optimizer still can't generate efficient plan > for simple NOT IN sub-queries? > > > > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is > > > the opposite of NOT IN?! > It depends how people interpret ANTI. You interpret it as "opposite", I > interpret it as "ANTI JOIN", means "anti join for NOT IN, instead of NOT > EXISTS". But it is just a naming issue, I am OK to change it whatever name > that makes sense to the community, as long as it can convey the meaning. > > Thanks, > Haisheng > > On 2020/07/21 03:02:20, Jinpeng Wu wrote: > > Hi. > > > > In some SQL engine, the query > > select * from A where c1 not in ( select c1 from B where B.c2 = A.c2); > > is transformed to a plan like > > select * from A LEFT ANTI JOIN B on A.c2 = B.c2 AND (A.c1 = B.c1 OR A.c1 is > > null OR B.c1 is null); > > > > Here, the "LEFT ANTI JOIN" is nothing more than traditional definition. One > > thing seems to be a problem is that A.c1 cannot be used as a join key in > > the new plan. However, the problem is also there for ANTI_NOTIN, and even > > other NOT-IN-SUBQUERY physical implementations. > > > > Thanks, > > Qiupeng > > > > On Tue, Jul 21, 2020 at 5:30 AM Julian Hyde wrote: > > > > > How about making a sub-query type (in RexSubQuery), so it is gone > > > before we reach algebra. > > > > > > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is > > > the opposite of NOT IN?! > > > > > > On Mon, Jul 20, 2020 at 1:00 PM Haisheng Yuan wrote: > > > > > > > > Typo: > > > > We can just add a security guard saying that it is supported. > > > > Should be > > > > We can just add a security guard saying that it is NOT supported. > > > > > > > > On 2020/07/20 19:57:34, Haisheng Yuan wrote: > > > > > I am not sure I got your implication by "pollute". If you mean > > > changes, yes, it requires some changes in rules. Do we need to change > > > enumerables? Not necessary. We can just add a security guard saying that > > > it > > > is supported. Not everyone requires the Enumerable operators to support > > > everything. More importantly, currently there is no logic or rules to > > > translate sub-query directly to SEMI/ANTI joins, let alone translating > > > directly to ANTI_NOTIN. Currently NOT IN is expanded to NOT(IN ...) before > > > entering RelNode land. That means we don't even have the chance to > > > generate > > > the NOT IN anti join. Is that still a concern? > > > > > > > > > > Even if some day, some contributor extends Calcite's parser and > > > SubqueryRemovalRule to be able to transform NOT_IN subquery into NOT IN > > > anti join, we still have chance to disable it. Is that still a concern? > > > > > > > > > > There are many ways to play it safe. > > > > > > > > > > > Brainstorming: maybe we could consider it as a separate logical > > > operator > > > > > > (with its corresponding enumerable implementation)? > > > > > It doesn't sound cool. It requires much more work. You have to > > > duplicate all the rules, metadata handler that deal with LogicalJoin, and > > > for some rule that matches Join base class, you have to check it is a > > > LogicalJoin or the logical operator for ANTI_NOTIN. > > > > > > > > > > On 2020/07/20 08:28:42, Ruben Q L wrote: > > > > > > I have some concerns that this new type would "pollute" the existing > > > Join > > > > > > logic, rules and enumerable implementations. > > > > > > > > > > > > Brainstorming: maybe we could consider it as a separate logical > > > operator > > > > > > (with its corresponding enumerable implementation)? > > > > > > > > > > > > > > > > > > Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan > > > > > > > > > a > > > > > > écrit : > > > > > > > > > > > > > I agree that
Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN
Thanks Jinpeng for providing a good example for not in subquery. I 100% agree with you that correlated query won't be represented by ANTI_NOTIN join type, and it is not the proposal's intention. Here what we are discussing is not to use ANTI_NOTIN to represent all the NOT IN sub-queries, that is impossible. Instead, if you take a close look at the example query, it is a simple uncorrelated NOT IN sub-query. That is the target. Let's focus on that kind of query, ask ourselves this question: Can such a simple query be transformed into a ANTI join to make the plan efficient? Sadly no. The reality is that this kind of query is not uncommon, may be much more common than correlated NOT IN sub-queries. Reply to Julian: > > How about making a sub-query type (in RexSubQuery), so it is gone > > before we reach algebra. It will be nice to have a NOT_IN subquery type, without expanding NOT IN to NOT(IN). However, if there is no ANTI_NOTIN in the join type (without reaching algebra), does that mean the optimizer still can't generate efficient plan for simple NOT IN sub-queries? > > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is > > the opposite of NOT IN?! It depends how people interpret ANTI. You interpret it as "opposite", I interpret it as "ANTI JOIN", means "anti join for NOT IN, instead of NOT EXISTS". But it is just a naming issue, I am OK to change it whatever name that makes sense to the community, as long as it can convey the meaning. Thanks, Haisheng On 2020/07/21 03:02:20, Jinpeng Wu wrote: > Hi. > > In some SQL engine, the query > select * from A where c1 not in ( select c1 from B where B.c2 = A.c2); > is transformed to a plan like > select * from A LEFT ANTI JOIN B on A.c2 = B.c2 AND (A.c1 = B.c1 OR A.c1 is > null OR B.c1 is null); > > Here, the "LEFT ANTI JOIN" is nothing more than traditional definition. One > thing seems to be a problem is that A.c1 cannot be used as a join key in > the new plan. However, the problem is also there for ANTI_NOTIN, and even > other NOT-IN-SUBQUERY physical implementations. > > Thanks, > Qiupeng > > On Tue, Jul 21, 2020 at 5:30 AM Julian Hyde wrote: > > > How about making a sub-query type (in RexSubQuery), so it is gone > > before we reach algebra. > > > > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is > > the opposite of NOT IN?! > > > > On Mon, Jul 20, 2020 at 1:00 PM Haisheng Yuan wrote: > > > > > > Typo: > > > We can just add a security guard saying that it is supported. > > > Should be > > > We can just add a security guard saying that it is NOT supported. > > > > > > On 2020/07/20 19:57:34, Haisheng Yuan wrote: > > > > I am not sure I got your implication by "pollute". If you mean > > changes, yes, it requires some changes in rules. Do we need to change > > enumerables? Not necessary. We can just add a security guard saying that it > > is supported. Not everyone requires the Enumerable operators to support > > everything. More importantly, currently there is no logic or rules to > > translate sub-query directly to SEMI/ANTI joins, let alone translating > > directly to ANTI_NOTIN. Currently NOT IN is expanded to NOT(IN ...) before > > entering RelNode land. That means we don't even have the chance to generate > > the NOT IN anti join. Is that still a concern? > > > > > > > > Even if some day, some contributor extends Calcite's parser and > > SubqueryRemovalRule to be able to transform NOT_IN subquery into NOT IN > > anti join, we still have chance to disable it. Is that still a concern? > > > > > > > > There are many ways to play it safe. > > > > > > > > > Brainstorming: maybe we could consider it as a separate logical > > operator > > > > > (with its corresponding enumerable implementation)? > > > > It doesn't sound cool. It requires much more work. You have to > > duplicate all the rules, metadata handler that deal with LogicalJoin, and > > for some rule that matches Join base class, you have to check it is a > > LogicalJoin or the logical operator for ANTI_NOTIN. > > > > > > > > On 2020/07/20 08:28:42, Ruben Q L wrote: > > > > > I have some concerns that this new type would "pollute" the existing > > Join > > > > > logic, rules and enumerable implementations. > > > > > > > > > > Brainstorming: maybe we could consider it as a separate logical > > operator > > > > > (with its corresponding enumerable implementation)? > > > > > > > > > > > > > > > Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan > > a > > > > > écrit : > > > > > > > > > > > I agree that NOT IN is toxic, and it is error-prone. > > > > > > But you can't prevent people writing SQL with not in sub-queries, > > would > > > > > > you rather let optimizer generate inefficient plan? > > > > > > > > > > > > - Haisheng > > > > > > > > > > > > -- > > > > > > 发件人:Julian Hyde > > > > > > 日 期:2020年07月20日 11:56:35 > > > > > > 收件人:dev@calcite.apache.org > > > > > > 主 题:Re:
Re: [VOTE] Release apache-calcite-1.24.0 (release candidate 0)
Thanks for making this release available for voting, Chunwei! Verified GPG Signature - OK Verified SHA512 - OK Ran tests per HOWTO (./gradlew check) - OK Quickly skimmed release notes - Looks good, but I agree with Julian's comments. Spotted checked a few JARs in the Maven repository - OK Environment (OpenJDK:latest docker container): Gradle 6.3 (via gradlew) Oracle Linux Server 7.8 openjdk version "14.0.2" 2020-07-14 OpenJDK Runtime Environment (build 14.0.2+12-46) OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing) My vote is: +1 (binding) Francis On 21/07/2020 12:07 pm, Haisheng Yuan wrote: Environment: Mac OS X 10.15.1, JDK 1.8.0_162 - Checked signatures and checksums, OK - Ran unit tests (./gradlew build), OK +1 (binding) * why is 4032 'breaking'? With that change, the CalcMergeRule won't match PhysicalNode(including EnumerableCalc) in VolcanoPlanner. Perhaps I should elaborate in the release notes. * why is 3786 breaking? (recomputeDigest was not present in 1.23; the remarks about caching digests are useful, so why aren't they in the javadoc?) recomputeDigest() has been there since b0dab68 (2012-05-07). I will add the remarks into the javadoc after release. Thanks, Haisheng On 2020/07/21 01:14:17, Julian Hyde wrote: Downloaded, checked hashes, built and ran tests on Ubuntu/JDK 14; checked distro against git (see issue 1); reviewed release notes (see issue 2). +1 (binding) but issues 1 and 2 need to be fixed right after the release. Issue 1. License file is not the same as in source control: diff -r ./LICENSE /tmp/apache-calcite-1.24.0-src/LICENSE 177a178,189 Additional License files can be found in the 'licenses' folder located in the same directory as the LICENSE file (i.e. this file) - Software produced outside the ASF which is available under other licenses (not Apache-2.0) MIT * cobyism:html5shiv:3.7.2 * font-awesome:font-awesome-code:4.2.0 * gridsim:gridsim: * jekyll:jekyll: * normalize:normalize:3.0.2 * respond:respond:1.4.2 Can you fix the release instructions that the generated LICENSE needs to be committed (probably at the same time you revise the release notes). Issue 2. Release notes For the 'highlights', I prefer a paragraph with hyperlinks over a list (see https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md#1180--2018-12-21). Regarding categorization: * why is 4032 'breaking'? * why is 3786 breaking? (recomputeDigest was not present in 1.23; the remarks about caching digests are useful, so why aren't they in the javadoc?) * we need a note that a bunch of methods are deprecated in this release and will be removed before 1.25 (see 3923, 4023 and 4079). This will break semantic versioning in 1.25, so is a big deal. * 4073, 3224, 4056, 4008, 3972, 4060 are listed as new features, but I think they are bug fixes or improved implementations * 3946, 4089, 4087 are listed as fixes but could be listed as new features * 4075 should be under 'test suite' * 4094 description does not need 'follow-up after review comments' * 4086 is an upgrade, so should be in 'bug fixes', not documentation * A few places SQL and Java keywords are not in code font (e.g. NPE, IllegalArgumentException, RexNode, Expression, HAVING, ARRAY, MAP, CAST) Julian On Mon, Jul 20, 2020 at 12:01 PM Michael Mior wrote: +1 Checked hash and signature and compiled and ran tests. Thanks Chunwei! -- Michael Mior mm...@apache.org Le lun. 20 juil. 2020 à 11:41, Chunwei Lei a écrit : Hi all, I have created a build for Apache Calcite 1.24.0, release candidate 0. Thanks to everyone who has contributed to this release. You can read the release notes here: https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md The commit to be voted upon: https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=4b5b9100e59ae4a43424156c9beabec6805f3d7c Its hash is 4b5b9100e59ae4a43424156c9beabec6805f3d7c Tag: https://github.com/apache/calcite/tree/calcite-1.24.0-rc0 The artifacts to be voted on are located here: https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0 (revision 40574) The hashes of the artifacts are as follows: ffc7821089a444d50be228b0f0d9d8fb875c98f3b31ed0ad5a81cf5f56b9139dd353fd2c866b5bfd42a06c2a09bca579bcf6ed1e05322be1ae228fd7848f4aec *apache-calcite-1.24.0-src.tar.gz A staged Maven repository is available for review at: https://repository.apache.org/content/repositories/orgapachecalcite-1096/org/apache/calcite/ Release artifacts are signed with the following key: https://www.apache.org/dist/calcite/KEYS N.B. To create the jars and test Apache Calcite: "./gradlew build". If you do not have a Java environment available, you can run the tests using docker. To do so, install docker and docker-compose, then run "docker-compose run test" from the root of the directory. Please vote on releasing this package as Apache Calcite 1.24.0. The vote is open for the next 72 hours and passes if a majority of
Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN
Hi. In some SQL engine, the query select * from A where c1 not in ( select c1 from B where B.c2 = A.c2); is transformed to a plan like select * from A LEFT ANTI JOIN B on A.c2 = B.c2 AND (A.c1 = B.c1 OR A.c1 is null OR B.c1 is null); Here, the "LEFT ANTI JOIN" is nothing more than traditional definition. One thing seems to be a problem is that A.c1 cannot be used as a join key in the new plan. However, the problem is also there for ANTI_NOTIN, and even other NOT-IN-SUBQUERY physical implementations. Thanks, Qiupeng On Tue, Jul 21, 2020 at 5:30 AM Julian Hyde wrote: > How about making a sub-query type (in RexSubQuery), so it is gone > before we reach algebra. > > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is > the opposite of NOT IN?! > > On Mon, Jul 20, 2020 at 1:00 PM Haisheng Yuan wrote: > > > > Typo: > > We can just add a security guard saying that it is supported. > > Should be > > We can just add a security guard saying that it is NOT supported. > > > > On 2020/07/20 19:57:34, Haisheng Yuan wrote: > > > I am not sure I got your implication by "pollute". If you mean > changes, yes, it requires some changes in rules. Do we need to change > enumerables? Not necessary. We can just add a security guard saying that it > is supported. Not everyone requires the Enumerable operators to support > everything. More importantly, currently there is no logic or rules to > translate sub-query directly to SEMI/ANTI joins, let alone translating > directly to ANTI_NOTIN. Currently NOT IN is expanded to NOT(IN ...) before > entering RelNode land. That means we don't even have the chance to generate > the NOT IN anti join. Is that still a concern? > > > > > > Even if some day, some contributor extends Calcite's parser and > SubqueryRemovalRule to be able to transform NOT_IN subquery into NOT IN > anti join, we still have chance to disable it. Is that still a concern? > > > > > > There are many ways to play it safe. > > > > > > > Brainstorming: maybe we could consider it as a separate logical > operator > > > > (with its corresponding enumerable implementation)? > > > It doesn't sound cool. It requires much more work. You have to > duplicate all the rules, metadata handler that deal with LogicalJoin, and > for some rule that matches Join base class, you have to check it is a > LogicalJoin or the logical operator for ANTI_NOTIN. > > > > > > On 2020/07/20 08:28:42, Ruben Q L wrote: > > > > I have some concerns that this new type would "pollute" the existing > Join > > > > logic, rules and enumerable implementations. > > > > > > > > Brainstorming: maybe we could consider it as a separate logical > operator > > > > (with its corresponding enumerable implementation)? > > > > > > > > > > > > Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan > a > > > > écrit : > > > > > > > > > I agree that NOT IN is toxic, and it is error-prone. > > > > > But you can't prevent people writing SQL with not in sub-queries, > would > > > > > you rather let optimizer generate inefficient plan? > > > > > > > > > > - Haisheng > > > > > > > > > > -- > > > > > 发件人:Julian Hyde > > > > > 日 期:2020年07月20日 11:56:35 > > > > > 收件人:dev@calcite.apache.org > > > > > 主 题:Re: [DISCUSS] New Join Type: ANTI_NOTIN > > > > > > > > > > Yuck! > > > > > > > > > > NOT IN is toxic. I'd rather keep it out of the algebra. > > > > > > > > > > On Sun, Jul 19, 2020 at 8:24 PM Haisheng Yuan > wrote: > > > > > > > > > > > > Hi all, > > > > > > > > > > > > Currently, JoinRelType.ANTI only represents NOT_EXISTS subquery > (thanks > > > > > to Ruben for reminding). > > > > > > For some simple boolean context NOT_IN subquery, we can't > transform it > > > > > to ANTI join. e.g.: > > > > > > > > > > > > SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- bar.b is > > > > > nullable > > > > > > > > > > > > Because if there is a null value in the results of subquery, the > NOT IN > > > > > predicate will return false, the whole query returns empty. And in > Calcite, > > > > > the plan for this kind of query is inefficient. > > > > > > > > > > > > If we have ANTI_NOTIN to represent this kind of join, we can > generate > > > > > more efficient plan, as long as the query executor support it. > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > Haisheng Yuan > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [VOTE] Release apache-calcite-1.24.0 (release candidate 0)
Environment: Mac OS X 10.15.1, JDK 1.8.0_162 - Checked signatures and checksums, OK - Ran unit tests (./gradlew build), OK +1 (binding) > * why is 4032 'breaking'? With that change, the CalcMergeRule won't match PhysicalNode(including EnumerableCalc) in VolcanoPlanner. Perhaps I should elaborate in the release notes. > * why is 3786 breaking? (recomputeDigest was not present in 1.23; the > remarks about caching digests are useful, so why aren't they in the > javadoc?) recomputeDigest() has been there since b0dab68 (2012-05-07). I will add the remarks into the javadoc after release. Thanks, Haisheng On 2020/07/21 01:14:17, Julian Hyde wrote: > Downloaded, checked hashes, built and ran tests on Ubuntu/JDK 14; > checked distro against git (see issue 1); reviewed release notes (see > issue 2). > > +1 (binding) but issues 1 and 2 need to be fixed right after the release. > > Issue 1. License file is not the same as in source control: > > diff -r ./LICENSE /tmp/apache-calcite-1.24.0-src/LICENSE > 177a178,189 > > > > Additional License files can be found in the 'licenses' folder located in > > the same directory as the LICENSE file (i.e. this file) > > > > - Software produced outside the ASF which is available under other licenses > > (not Apache-2.0) > > > > MIT > > * cobyism:html5shiv:3.7.2 > > * font-awesome:font-awesome-code:4.2.0 > > * gridsim:gridsim: > > * jekyll:jekyll: > > * normalize:normalize:3.0.2 > > * respond:respond:1.4.2 > > Can you fix the release instructions that the generated LICENSE needs > to be committed (probably at the same time you revise the release > notes). > > Issue 2. Release notes > > For the 'highlights', I prefer a paragraph with hyperlinks over a list > (see > https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md#1180--2018-12-21). > > Regarding categorization: > * why is 4032 'breaking'? > * why is 3786 breaking? (recomputeDigest was not present in 1.23; the > remarks about caching digests are useful, so why aren't they in the > javadoc?) > * we need a note that a bunch of methods are deprecated in this > release and will be removed before 1.25 (see 3923, 4023 and 4079). > This will break semantic versioning in 1.25, so is a big deal. > * 4073, 3224, 4056, 4008, 3972, 4060 are listed as new features, but I > think they are bug fixes or improved implementations > * 3946, 4089, 4087 are listed as fixes but could be listed as new features > * 4075 should be under 'test suite' > * 4094 description does not need 'follow-up after review comments' > * 4086 is an upgrade, so should be in 'bug fixes', not documentation > * A few places SQL and Java keywords are not in code font (e.g. NPE, > IllegalArgumentException, RexNode, Expression, HAVING, ARRAY, MAP, > CAST) > > Julian > > On Mon, Jul 20, 2020 at 12:01 PM Michael Mior wrote: > > > > +1 > > > > Checked hash and signature and compiled and ran tests. Thanks Chunwei! > > > > -- > > Michael Mior > > mm...@apache.org > > > > Le lun. 20 juil. 2020 à 11:41, Chunwei Lei a écrit > > : > > > > > > Hi all, > > > > > > I have created a build for Apache Calcite 1.24.0, release > > > candidate 0. > > > > > > Thanks to everyone who has contributed to this release. > > > > > > You can read the release notes here: > > > https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md > > > > > > The commit to be voted upon: > > > https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=4b5b9100e59ae4a43424156c9beabec6805f3d7c > > > > > > Its hash is 4b5b9100e59ae4a43424156c9beabec6805f3d7c > > > > > > Tag: > > > https://github.com/apache/calcite/tree/calcite-1.24.0-rc0 > > > > > > The artifacts to be voted on are located here: > > > https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0 > > > (revision 40574) > > > > > > The hashes of the artifacts are as follows: > > > ffc7821089a444d50be228b0f0d9d8fb875c98f3b31ed0ad5a81cf5f56b9139dd353fd2c866b5bfd42a06c2a09bca579bcf6ed1e05322be1ae228fd7848f4aec > > > *apache-calcite-1.24.0-src.tar.gz > > > > > > A staged Maven repository is available for review at: > > > https://repository.apache.org/content/repositories/orgapachecalcite-1096/org/apache/calcite/ > > > > > > Release artifacts are signed with the following key: > > > https://www.apache.org/dist/calcite/KEYS > > > > > > N.B. > > > To create the jars and test Apache Calcite: "./gradlew build". > > > > > > If you do not have a Java environment available, you can run the tests > > > using docker. To do so, install docker and docker-compose, then run > > > "docker-compose run test" from the root of the directory. > > > > > > Please vote on releasing this package as Apache Calcite 1.24.0. > > > > > > The vote is open for the next 72 hours and passes if a majority of at > > > least three +1 PMC votes are cast. > > > > > > [ ] +1 Release this package as Apache Calcite 1.24.0 > > > [ ] 0 I don't feel strongly about it, but I'm okay with the release > > > [ ] -1 Do
Re: [VOTE] Release apache-calcite-1.24.0 (release candidate 0)
Downloaded, checked hashes, built and ran tests on Ubuntu/JDK 14; checked distro against git (see issue 1); reviewed release notes (see issue 2). +1 (binding) but issues 1 and 2 need to be fixed right after the release. Issue 1. License file is not the same as in source control: diff -r ./LICENSE /tmp/apache-calcite-1.24.0-src/LICENSE 177a178,189 > > Additional License files can be found in the 'licenses' folder located in the > same directory as the LICENSE file (i.e. this file) > > - Software produced outside the ASF which is available under other licenses > (not Apache-2.0) > > MIT > * cobyism:html5shiv:3.7.2 > * font-awesome:font-awesome-code:4.2.0 > * gridsim:gridsim: > * jekyll:jekyll: > * normalize:normalize:3.0.2 > * respond:respond:1.4.2 Can you fix the release instructions that the generated LICENSE needs to be committed (probably at the same time you revise the release notes). Issue 2. Release notes For the 'highlights', I prefer a paragraph with hyperlinks over a list (see https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md#1180--2018-12-21). Regarding categorization: * why is 4032 'breaking'? * why is 3786 breaking? (recomputeDigest was not present in 1.23; the remarks about caching digests are useful, so why aren't they in the javadoc?) * we need a note that a bunch of methods are deprecated in this release and will be removed before 1.25 (see 3923, 4023 and 4079). This will break semantic versioning in 1.25, so is a big deal. * 4073, 3224, 4056, 4008, 3972, 4060 are listed as new features, but I think they are bug fixes or improved implementations * 3946, 4089, 4087 are listed as fixes but could be listed as new features * 4075 should be under 'test suite' * 4094 description does not need 'follow-up after review comments' * 4086 is an upgrade, so should be in 'bug fixes', not documentation * A few places SQL and Java keywords are not in code font (e.g. NPE, IllegalArgumentException, RexNode, Expression, HAVING, ARRAY, MAP, CAST) Julian On Mon, Jul 20, 2020 at 12:01 PM Michael Mior wrote: > > +1 > > Checked hash and signature and compiled and ran tests. Thanks Chunwei! > > -- > Michael Mior > mm...@apache.org > > Le lun. 20 juil. 2020 à 11:41, Chunwei Lei a écrit : > > > > Hi all, > > > > I have created a build for Apache Calcite 1.24.0, release > > candidate 0. > > > > Thanks to everyone who has contributed to this release. > > > > You can read the release notes here: > > https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md > > > > The commit to be voted upon: > > https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=4b5b9100e59ae4a43424156c9beabec6805f3d7c > > > > Its hash is 4b5b9100e59ae4a43424156c9beabec6805f3d7c > > > > Tag: > > https://github.com/apache/calcite/tree/calcite-1.24.0-rc0 > > > > The artifacts to be voted on are located here: > > https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0 > > (revision 40574) > > > > The hashes of the artifacts are as follows: > > ffc7821089a444d50be228b0f0d9d8fb875c98f3b31ed0ad5a81cf5f56b9139dd353fd2c866b5bfd42a06c2a09bca579bcf6ed1e05322be1ae228fd7848f4aec > > *apache-calcite-1.24.0-src.tar.gz > > > > A staged Maven repository is available for review at: > > https://repository.apache.org/content/repositories/orgapachecalcite-1096/org/apache/calcite/ > > > > Release artifacts are signed with the following key: > > https://www.apache.org/dist/calcite/KEYS > > > > N.B. > > To create the jars and test Apache Calcite: "./gradlew build". > > > > If you do not have a Java environment available, you can run the tests > > using docker. To do so, install docker and docker-compose, then run > > "docker-compose run test" from the root of the directory. > > > > Please vote on releasing this package as Apache Calcite 1.24.0. > > > > The vote is open for the next 72 hours and passes if a majority of at > > least three +1 PMC votes are cast. > > > > [ ] +1 Release this package as Apache Calcite 1.24.0 > > [ ] 0 I don't feel strongly about it, but I'm okay with the release > > [ ] -1 Do not release this package because... > > > > > > Here is my vote: > > > > +1 (non-binding) > > > > > > > > Best, > > Chunwei
[jira] [Created] (CALCITE-4135) Cannot find SqlDdlParserImpl class, which seems necessary for parsing DDL.
Stewart Bryson created CALCITE-4135: --- Summary: Cannot find SqlDdlParserImpl class, which seems necessary for parsing DDL. Key: CALCITE-4135 URL: https://issues.apache.org/jira/browse/CALCITE-4135 Project: Calcite Issue Type: Bug Components: core Affects Versions: 1.23.0 Environment: Custom Java classes built using Gradle. Reporter: Stewart Bryson I am trying to use parse DDL statements, and tried code similar to the following: [https://stackoverflow.com/questions/53801005/apache-calcite-cant-seem-to-parse-ddl-statements] However, I cannot find the SqlDdlParserImpl class anywhere. I see several references to it in the GitHub repo: [https://github.com/apache/calcite/search?q=SqlDdlParserImpl_q=SqlDdlParserImpl] I have tried using the calcite-core and calcite-server dependencies. What am I missing? -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] New RexNode: RexListCmp
Another quick thought as far as it concerns the IN operator would be to use RexCall as it is right now where the first operand in the list is a RexInputRef for instance and the rest are the literals. I assume that taking this direction would need to change a bit the respective SqlOperator. I haven't thought of this thoroughly so maybe there are important things that I am missing. Best, Stamatis On Tue, Jul 21, 2020 at 12:41 AM Julian Hyde wrote: > The name isn't very intuitive. > > The concept of a list and a comparison operator seems OK. As Vladimir > points out, it is somewhat similar to RexSubQuery, so maybe this could > be a sub-class (but organizing the data a bit more efficiently). > > I would be very wary of null semantics. RexNode scalar operators are > forced to do 3-valued logic, but this is almost a relational operator > and it would be better without that burden. > > Julian > > > > On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov > wrote: > > > > >Do you know what is the impact on Enumerable implementation? > > > > I guess there are plenty of options there. > > > > The key question regarding RexListCmp is as we introduce a new Rex node, > > all the planning rules and all engines > > must support it somehow. > > > > Technically speaking, we have RexSubQuery. > > Haisheng, have you considered an option to stick with RexSubQuery to > avoid > > having two more-or-less the same rex classes? > > > > Vladimir >
Re: custom metadata logic
Hi Taz, If you are relying on the RelMetadataQuery [1] API then you may need to set your provided into THREAD_PROVIDERS in a similar way that it is done in RelMetadataTest [2]. Best, Stamatis [1] https://github.com/apache/calcite/blob/7a462f2b2f78aa12068b691c1e423ea4c8a825e4/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L75 [2] https://github.com/apache/calcite/blob/7a462f2b2f78aa12068b691c1e423ea4c8a825e4/core/src/test/java/org/apache/calcite/test/RelMetadataTest.java#L893 On Mon, Jul 20, 2020 at 6:15 PM Tal Glanzman wrote: > hi, > > I am trying to figure out how to add custom logic for providing metadata to > nodes inside my adapter with no luck. > > I have an implementation of my own RelMetadataProvider, as described in the > docs. > > After looking around, i still couldn't find a way to use this provider in > the planning phase > - no property in the jdbc connection string > - tried to invoke setMetadataProvider(provider) on the cluster of the > initial nodes of the tree > > drilling a bit, i notice that the provider in use is > JaninoRelMetadataProvider with provider=DefaultRelMetadataProvider. > > Is there a way to do such a thing using the standard execution? > > any guidance will be much appreciated, thanks! >
Re: [DISCUSS] New RexNode: RexListCmp
The name isn't very intuitive. The concept of a list and a comparison operator seems OK. As Vladimir points out, it is somewhat similar to RexSubQuery, so maybe this could be a sub-class (but organizing the data a bit more efficiently). I would be very wary of null semantics. RexNode scalar operators are forced to do 3-valued logic, but this is almost a relational operator and it would be better without that burden. Julian On Mon, Jul 20, 2020 at 3:45 AM Vladimir Sitnikov wrote: > > >Do you know what is the impact on Enumerable implementation? > > I guess there are plenty of options there. > > The key question regarding RexListCmp is as we introduce a new Rex node, > all the planning rules and all engines > must support it somehow. > > Technically speaking, we have RexSubQuery. > Haisheng, have you considered an option to stick with RexSubQuery to avoid > having two more-or-less the same rex classes? > > Vladimir
Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN
Typo: We can just add a security guard saying that it is supported. Should be We can just add a security guard saying that it is NOT supported. On 2020/07/20 19:57:34, Haisheng Yuan wrote: > I am not sure I got your implication by "pollute". If you mean changes, yes, > it requires some changes in rules. Do we need to change enumerables? Not > necessary. We can just add a security guard saying that it is supported. Not > everyone requires the Enumerable operators to support everything. More > importantly, currently there is no logic or rules to translate sub-query > directly to SEMI/ANTI joins, let alone translating directly to ANTI_NOTIN. > Currently NOT IN is expanded to NOT(IN ...) before entering RelNode land. > That means we don't even have the chance to generate the NOT IN anti join. Is > that still a concern? > > Even if some day, some contributor extends Calcite's parser and > SubqueryRemovalRule to be able to transform NOT_IN subquery into NOT IN anti > join, we still have chance to disable it. Is that still a concern? > > There are many ways to play it safe. > > > Brainstorming: maybe we could consider it as a separate logical operator > > (with its corresponding enumerable implementation)? > It doesn't sound cool. It requires much more work. You have to duplicate all > the rules, metadata handler that deal with LogicalJoin, and for some rule > that matches Join base class, you have to check it is a LogicalJoin or the > logical operator for ANTI_NOTIN. > > On 2020/07/20 08:28:42, Ruben Q L wrote: > > I have some concerns that this new type would "pollute" the existing Join > > logic, rules and enumerable implementations. > > > > Brainstorming: maybe we could consider it as a separate logical operator > > (with its corresponding enumerable implementation)? > > > > > > Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan a > > écrit : > > > > > I agree that NOT IN is toxic, and it is error-prone. > > > But you can't prevent people writing SQL with not in sub-queries, would > > > you rather let optimizer generate inefficient plan? > > > > > > - Haisheng > > > > > > -- > > > 发件人:Julian Hyde > > > 日 期:2020年07月20日 11:56:35 > > > 收件人:dev@calcite.apache.org > > > 主 题:Re: [DISCUSS] New Join Type: ANTI_NOTIN > > > > > > Yuck! > > > > > > NOT IN is toxic. I'd rather keep it out of the algebra. > > > > > > On Sun, Jul 19, 2020 at 8:24 PM Haisheng Yuan wrote: > > > > > > > > Hi all, > > > > > > > > Currently, JoinRelType.ANTI only represents NOT_EXISTS subquery (thanks > > > to Ruben for reminding). > > > > For some simple boolean context NOT_IN subquery, we can't transform it > > > to ANTI join. e.g.: > > > > > > > > SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- bar.b is > > > nullable > > > > > > > > Because if there is a null value in the results of subquery, the NOT IN > > > predicate will return false, the whole query returns empty. And in > > > Calcite, > > > the plan for this kind of query is inefficient. > > > > > > > > If we have ANTI_NOTIN to represent this kind of join, we can generate > > > more efficient plan, as long as the query executor support it. > > > > > > > > Thoughts? > > > > > > > > Haisheng Yuan > > > > > > > > > > > > > >
Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN
I am not sure I got your implication by "pollute". If you mean changes, yes, it requires some changes in rules. Do we need to change enumerables? Not necessary. We can just add a security guard saying that it is supported. Not everyone requires the Enumerable operators to support everything. More importantly, currently there is no logic or rules to translate sub-query directly to SEMI/ANTI joins, let alone translating directly to ANTI_NOTIN. Currently NOT IN is expanded to NOT(IN ...) before entering RelNode land. That means we don't even have the chance to generate the NOT IN anti join. Is that still a concern? Even if some day, some contributor extends Calcite's parser and SubqueryRemovalRule to be able to transform NOT_IN subquery into NOT IN anti join, we still have chance to disable it. Is that still a concern? There are many ways to play it safe. > Brainstorming: maybe we could consider it as a separate logical operator > (with its corresponding enumerable implementation)? It doesn't sound cool. It requires much more work. You have to duplicate all the rules, metadata handler that deal with LogicalJoin, and for some rule that matches Join base class, you have to check it is a LogicalJoin or the logical operator for ANTI_NOTIN. On 2020/07/20 08:28:42, Ruben Q L wrote: > I have some concerns that this new type would "pollute" the existing Join > logic, rules and enumerable implementations. > > Brainstorming: maybe we could consider it as a separate logical operator > (with its corresponding enumerable implementation)? > > > Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan a > écrit : > > > I agree that NOT IN is toxic, and it is error-prone. > > But you can't prevent people writing SQL with not in sub-queries, would > > you rather let optimizer generate inefficient plan? > > > > - Haisheng > > > > -- > > 发件人:Julian Hyde > > 日 期:2020年07月20日 11:56:35 > > 收件人:dev@calcite.apache.org > > 主 题:Re: [DISCUSS] New Join Type: ANTI_NOTIN > > > > Yuck! > > > > NOT IN is toxic. I'd rather keep it out of the algebra. > > > > On Sun, Jul 19, 2020 at 8:24 PM Haisheng Yuan wrote: > > > > > > Hi all, > > > > > > Currently, JoinRelType.ANTI only represents NOT_EXISTS subquery (thanks > > to Ruben for reminding). > > > For some simple boolean context NOT_IN subquery, we can't transform it > > to ANTI join. e.g.: > > > > > > SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- bar.b is > > nullable > > > > > > Because if there is a null value in the results of subquery, the NOT IN > > predicate will return false, the whole query returns empty. And in Calcite, > > the plan for this kind of query is inefficient. > > > > > > If we have ANTI_NOTIN to represent this kind of join, we can generate > > more efficient plan, as long as the query executor support it. > > > > > > Thoughts? > > > > > > Haisheng Yuan > > > > > > > > >
Re: [VOTE] Release apache-calcite-1.24.0 (release candidate 0)
+1 Checked hash and signature and compiled and ran tests. Thanks Chunwei! -- Michael Mior mm...@apache.org Le lun. 20 juil. 2020 à 11:41, Chunwei Lei a écrit : > > Hi all, > > I have created a build for Apache Calcite 1.24.0, release > candidate 0. > > Thanks to everyone who has contributed to this release. > > You can read the release notes here: > https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md > > The commit to be voted upon: > https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=4b5b9100e59ae4a43424156c9beabec6805f3d7c > > Its hash is 4b5b9100e59ae4a43424156c9beabec6805f3d7c > > Tag: > https://github.com/apache/calcite/tree/calcite-1.24.0-rc0 > > The artifacts to be voted on are located here: > https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0 > (revision 40574) > > The hashes of the artifacts are as follows: > ffc7821089a444d50be228b0f0d9d8fb875c98f3b31ed0ad5a81cf5f56b9139dd353fd2c866b5bfd42a06c2a09bca579bcf6ed1e05322be1ae228fd7848f4aec > *apache-calcite-1.24.0-src.tar.gz > > A staged Maven repository is available for review at: > https://repository.apache.org/content/repositories/orgapachecalcite-1096/org/apache/calcite/ > > Release artifacts are signed with the following key: > https://www.apache.org/dist/calcite/KEYS > > N.B. > To create the jars and test Apache Calcite: "./gradlew build". > > If you do not have a Java environment available, you can run the tests > using docker. To do so, install docker and docker-compose, then run > "docker-compose run test" from the root of the directory. > > Please vote on releasing this package as Apache Calcite 1.24.0. > > The vote is open for the next 72 hours and passes if a majority of at > least three +1 PMC votes are cast. > > [ ] +1 Release this package as Apache Calcite 1.24.0 > [ ] 0 I don't feel strongly about it, but I'm okay with the release > [ ] -1 Do not release this package because... > > > Here is my vote: > > +1 (non-binding) > > > > Best, > Chunwei
[VOTE] Release apache-calcite-1.24.0 (release candidate 0)
Hi all, I have created a build for Apache Calcite 1.24.0, release candidate 0. Thanks to everyone who has contributed to this release. You can read the release notes here: https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md The commit to be voted upon: https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=4b5b9100e59ae4a43424156c9beabec6805f3d7c Its hash is 4b5b9100e59ae4a43424156c9beabec6805f3d7c Tag: https://github.com/apache/calcite/tree/calcite-1.24.0-rc0 The artifacts to be voted on are located here: https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0 (revision 40574) The hashes of the artifacts are as follows: ffc7821089a444d50be228b0f0d9d8fb875c98f3b31ed0ad5a81cf5f56b9139dd353fd2c866b5bfd42a06c2a09bca579bcf6ed1e05322be1ae228fd7848f4aec *apache-calcite-1.24.0-src.tar.gz A staged Maven repository is available for review at: https://repository.apache.org/content/repositories/orgapachecalcite-1096/org/apache/calcite/ Release artifacts are signed with the following key: https://www.apache.org/dist/calcite/KEYS N.B. To create the jars and test Apache Calcite: "./gradlew build". If you do not have a Java environment available, you can run the tests using docker. To do so, install docker and docker-compose, then run "docker-compose run test" from the root of the directory. Please vote on releasing this package as Apache Calcite 1.24.0. The vote is open for the next 72 hours and passes if a majority of at least three +1 PMC votes are cast. [ ] +1 Release this package as Apache Calcite 1.24.0 [ ] 0 I don't feel strongly about it, but I'm okay with the release [ ] -1 Do not release this package because... Here is my vote: +1 (non-binding) Best, Chunwei
custom metadata logic
hi, I am trying to figure out how to add custom logic for providing metadata to nodes inside my adapter with no luck. I have an implementation of my own RelMetadataProvider, as described in the docs. After looking around, i still couldn't find a way to use this provider in the planning phase - no property in the jdbc connection string - tried to invoke setMetadataProvider(provider) on the cluster of the initial nodes of the tree drilling a bit, i notice that the provider in use is JaninoRelMetadataProvider with provider=DefaultRelMetadataProvider. Is there a way to do such a thing using the standard execution? any guidance will be much appreciated, thanks!
[jira] [Created] (CALCITE-4133) Shouldn't trim fields when it's under Union
Jiatao Tao created CALCITE-4133: --- Summary: Shouldn't trim fields when it's under Union Key: CALCITE-4133 URL: https://issues.apache.org/jira/browse/CALCITE-4133 Project: Calcite Issue Type: Bug Reporter: Jiatao Tao Assignee: Jiatao Tao -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS] SqlDialect.getCastSpec() doesn't work when unparsing SqlNode tree
Currently SqlDialect.getCastSpec() is used to amend types to cast to for different dialects and works during Rel to Sql conversion. That means it cannot be used when unparsing SqlNode tree to a specific dialect. Does somebody know was it made intentionally and would it be better to provide alternative method to convert cast types fully in SqlNode tree that can be used during SqlNode.unparse()? Thanks, Pavel Gubin
Re: [DISCUSS] New RexNode: RexListCmp
>Do you know what is the impact on Enumerable implementation? I guess there are plenty of options there. The key question regarding RexListCmp is as we introduce a new Rex node, all the planning rules and all engines must support it somehow. Technically speaking, we have RexSubQuery. Haisheng, have you considered an option to stick with RexSubQuery to avoid having two more-or-less the same rex classes? Vladimir
Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN
I have some concerns that this new type would "pollute" the existing Join logic, rules and enumerable implementations. Brainstorming: maybe we could consider it as a separate logical operator (with its corresponding enumerable implementation)? Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan a écrit : > I agree that NOT IN is toxic, and it is error-prone. > But you can't prevent people writing SQL with not in sub-queries, would > you rather let optimizer generate inefficient plan? > > - Haisheng > > -- > 发件人:Julian Hyde > 日 期:2020年07月20日 11:56:35 > 收件人:dev@calcite.apache.org > 主 题:Re: [DISCUSS] New Join Type: ANTI_NOTIN > > Yuck! > > NOT IN is toxic. I'd rather keep it out of the algebra. > > On Sun, Jul 19, 2020 at 8:24 PM Haisheng Yuan wrote: > > > > Hi all, > > > > Currently, JoinRelType.ANTI only represents NOT_EXISTS subquery (thanks > to Ruben for reminding). > > For some simple boolean context NOT_IN subquery, we can't transform it > to ANTI join. e.g.: > > > > SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- bar.b is > nullable > > > > Because if there is a null value in the results of subquery, the NOT IN > predicate will return false, the whole query returns empty. And in Calcite, > the plan for this kind of query is inefficient. > > > > If we have ANTI_NOTIN to represent this kind of join, we can generate > more efficient plan, as long as the query executor support it. > > > > Thoughts? > > > > Haisheng Yuan > > > > >
Re: [DISCUSS] New RexNode: RexListCmp
Hi Haisheng, Do you know what is the impact on Enumerable implementation? Will RexListCmp still be converted AND/OR something else in that layer or there will be new logic introduced to generate code for RexListCmp? -Rui On Sun, Jul 19, 2020 at 10:13 PM Enrico Olivelli wrote: > Il Lun 20 Lug 2020, 03:00 Haisheng Yuan ha scritto: > > > Hi all, > > > > This is a rough idea, I'd like to see how the community think about it. > > > > RexListCmp extends RexNode / RexCall { > > public final SqlOperator op; > > public final RexNode left; > > public final ImmutableList list; > > public final RexQuantifier quantifier; > > public final RelDataType type; > > } > > > > Enum RexQuantifier { > > ALL, > > ANY > > } > > > > Background: > > > > It is not uncommon that the query contains large number of constant IN > > list, e.g. > > 1) SELECT * FROM foo WHERE a NOT IN (1, 2, 3, , 1); > > 2) SELECT * FROM bar WHERE b IN (1, 2, 3, , 1); > > > > Currently, Calcite either translates it into a Join, or expand to OR/AND, > > which is inefficient, and may cause problems. > > > > Yes. It is not efficient. > > I would love to see this new feature > > Thanks > Enrico > > > > > > With RexListCmp, the predicate in query 1) will be represented as: > > RexListCmp { > > op = "<>", > > left = "a" > > list = "1,2,3...1" > > quantifier = "ALL" > > } > > > > The predicate in query 2) will be represented as: > > RexListCmp { > > op = "=", > > left = "b" > > list = "1,2,3...1" > > quantifier = "ANY" > > } > > > > It may also be used to represent the predicate in the following query: > > > > SELECT * FROM bar WHERE (a,b) IN / NOT IN ((1,1), (2,2), (3,3), ... > (1000, > > 1000)); > > > > Further more, it is extensible. The op is not limited to be equals or not > > equals, it also be >, <, >=, <=, IDF, INDF or even customized sql > operator > > like geospatial operator intersect: > > boolean &&( geometry A , geometry B ) > > > > Thoughts? > > > > Thanks, > > Haisheng Yuan > > > > > > >