Re: [DISCUSS] Publish test JARS

2020-08-10 Thread Vladimir Sitnikov
Michael>I'm not sure what the changes to the Gradle config would look like
to
support this.

Here you go: https://github.com/apache/calcite/pull/2104/files
Then core/src/test should be able to see testkit classes, and testkit
should see core classes.

Vladimir


Re: [DISCUSS] Sarg (search argument) to generalize and replace IN in RexCall

2020-08-10 Thread Vladimir Sitnikov
Julian>I cannot see any cases that would become more expensive.

I mean the optimization passes might be complicated, not the storage of
sargs themselves.
For instance, CALCITE-4155 converts a in (1, 2, 3, 4, 5) to a >= 1 and a <=
5
Is it a significant improvement?
Is between representation always better?
Does the case appear very often in practice?

However, the optimization does take time, so it looks like extra logic with
doubtful gains.
Of course, it is unlikely sargs would be a major time consumer, however, if
we keep adding tiny simplifications we might
end up with a condition where the planning is slow and we have no way to
fix it.

I'm ok with people updating RexSimplify (and 4155 looks like an innocent
feature), however, I think the current design is not really scalable
(e.g. it might process the same expression multiple times).

Vladimir


Re: [DISCUSS] Publish test JARS

2020-08-10 Thread Michael Mior
Thanks for pointing back to the previous discussion. I'm fine with
publishing separate modules. No real preference on my end since I
haven't found consuming test artifacts to cause any problems. Although
I'm not sure what the changes to the Gradle config would look like to
support this.

--
Michael Mior
mm...@apache.org

Le lun. 10 août 2020 à 14:49, Vladimir Sitnikov
 a écrit :
>
> Hi Michael,
>
> I suggest we go with adding explicitly published modules rather than
> publishing tests.
> Test artifacts do not have their own pom.xml, so they are not really
> convenient for consumers.
>
> Here's the relevant thread:
> https://lists.apache.org/thread.html/2ab9652ea855dce0b5b75cc221a5f74ebec536818847ae5ce6d284f5%40%3Cdev.calcite.apache.org%3E
>
> I guess the naming could be calcite-testkit or calcite-test-framework.
>
> Vladimir


Re: [DISCUSS] Sarg (search argument) to generalize and replace IN in RexCall

2020-08-10 Thread Julian Hyde



> Vladimir wrote:
> 
> 5. The optimization of disjoint conditions might take significant
> optimization time with insignificant gains.

Can you give an example?

I am proposing to use sargs only where we would today use RexCall(IN). The data 
structures have about the same size. Sargs are sorted. I cannot see any cases 
that would become more expensive.

> Sargs might be helpful as databases often need to implement "integer/string
> index range scan", however, sargs do not seem to suit well for `col
> intersect any(...)` and similar cases.

Requiring sargs to support geospatial operations seems an unreasonably high 
bar. Many techniques that we use (sorting, hashing, sketches) do not support 
geospatial.

> I remember Calcite sources did include sargs package, however, it was
> removed for some reason.

That is correct. We once had b-tree indexes and column store; when we removed 
these, sargs were no longer used. Guava RangeSet is superior to the sargs 
classes we threw away.

Julian



Re: [DISCUSS] Publish test JARS

2020-08-10 Thread Vladimir Sitnikov
Hi Michael,

I suggest we go with adding explicitly published modules rather than
publishing tests.
Test artifacts do not have their own pom.xml, so they are not really
convenient for consumers.

Here's the relevant thread:
https://lists.apache.org/thread.html/2ab9652ea855dce0b5b75cc221a5f74ebec536818847ae5ce6d284f5%40%3Cdev.calcite.apache.org%3E

I guess the naming could be calcite-testkit or calcite-test-framework.

Vladimir


Re: [DISCUSS] Sarg (search argument) to generalize and replace IN in RexCall

2020-08-10 Thread Vladimir Sitnikov
+1 to Haisheng.

5. The optimization of disjoint conditions might take significant
optimization time with insignificant gains.

Sargs might be helpful as databases often need to implement "integer/string
index range scan", however, sargs do not seem to suit well for `col
intersect any(...)` and similar cases.

I remember Calcite sources did include sargs package, however, it was
removed for some reason.

Vladimir


Re: [DISCUSS] Sarg (search argument) to generalize and replace IN in RexCall

2020-08-10 Thread Haisheng Yuan
I am against the proposal, for the following reasons:

1. It has to support range operations, not every type supports, especially for 
User Defined Types that are used in IN constant lists, they might only support 
equal operation.

2. The range optimization of ">2 AND <4" to "3” is only valid for integer-like 
types, but I don't think it will bring much gains. Optimizing  col in 
(1,2,3,4,5) to a >=1 and a <=5 is ok, but only ok for a single or very limited 
number of disjoint ranges, but this is a very limited corner case, most likely 
we may end up with many disjoint ranges, especially when there are 10k values. 
I don't think it is worth doing SARG for IN for this sake.

3. For non-integer data types, like double or string, we will end up with 
ranges {[a,a], [b,b], [c,c]...}, the stats derivation e.g. inferring 
selectivity from histogram may take much longer time depends on the 
implementation. And when passing to executor, we have to transform it back to 
form of col = ANY(a,b,c) or col IN (a,b,c) to be able to utilize hash look up.

4. It is not extensible. It can only be used for iN or NOT IN. What about 
customized operators like geospatial intersect? e.g. col intersect ANY(area1, 
area2, area3)

Haisheng

On 2020/08/10 16:15:56, Michael Mior  wrote: 
> The simplifications you present would also of course assume that the
> column has integer type. In any case, overall the proposal seems solid
> to me.
> 
> --
> Michael Mior
> mm...@apache.org
> 
> Le lun. 10 août 2020 à 04:21, Fan Liya  a écrit :
> >
> > Hi Julian,
> >
> > Thanks for opening the discussion.
> >
> > In general, I am convinced of the value and importance of the proposal.
> >
> > I want to discuss more about the algebra involved, as the simplified range
> > sets may not have the minimum computational costs.
> > Some examples:
> >
> > 1. Suppose we have expression x in (1, 2, 4, 5, 6).
> >
> > The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
> > 4 and x <= 6), which represents 4 comparisons and 3 logical operations.
> > Another equivalent expression is x = 1 or x = 2 or x = 4 or x = 5 or x = 6,
> > which represents 5 comparisons and 4 logical operations.
> > It's hard to say which one has a smaller computational cost.
> >
> > 2. Suppose we have expression x in (1, 2, 4, 5)
> > The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
> > 4 and x <= 5).
> > Another (possibly cheaper) reduction should be x >= 1 and x <= 5 and x != 3.
> > I am not sure if the range set implementation supports this type of
> > simplification.
> >
> > Therefore, to address above problems,
> > 1. We may need a model to estimate the cost, and the simplification process
> > should be guided by the cost model.
> > 2. Maybe we need to extend the range set implementation so it produces
> > expressions with minimum computational cost.
> >
> > Best,
> > Liya Fan
> >
> >
> >
> >
> >
> > On Mon, Aug 10, 2020 at 3:27 AM Julian Hyde  wrote:
> >
> > > We have had several discussions over the years about how to represent
> > > IN-lists (e.g. “x IN (1, 3, 5)”) in RexNode-land.
> > >
> > > I have generally taken the position that we should expand to ORs (e.g. “x
> > > = 1 OR x = 3 OR x = 5”) but a few months ago accepted that we should allow
> > > IN in RexCall.
> > >
> > > I have given this some further thought, as part of a couple of RexSimplify
> > > bugs [1] [2] and I now think we should replace IN with something more
> > > powerful, namely sargs. A sarg (or search argument [3]) is an ordered set
> > > of intervals, and can be represented as a Guava ImmutableRangeSet, such as
> > > "[[0‥1), (1‥2], [3‥3], (5‥+∞)]". It can represent an IN-list of constants,
> > > but also ranges.
> > >
> > > Today you would write
> > >
> > >   RexCall(IN, RexInputRef(0), RexLiteral(1), RexLiteral(3), RexLiteral(5))
> > >
> > > With sargs, you would instead write
> > >
> > >   RexCall(IN_SARG, RexInputRef(0), RexSarg(ImmutableRangeSet(“[[1..1],
> > > [3..3], [5..5]]”)))
> > >
> > > There is a new operator IN_SARG, and a new node type RexSarg (it could
> > > almost be a RexLiteral).
> > >
> > > Sargs (and Guava RangeSets) have a powerful and consistent algebra, so if
> > > we invest in sarg support in RexSimplify and RelOptPredicateList, that
> > > investment is likely to pay dividends in better plans.
> > >
> > > Guava RangeSets, and hence sargs, have support for discrete domains, so
> > > they can easily optimize ">2 AND <4" to "3”.
> > >
> > > Sargs would be the preferred (therefore canonical) form for any AND or OR
> > > list that has more than one comparison on the same operand (e.g. "x > 3 
> > > AND
> > > x < 17 AND x <> 10”).
> > >
> > > This proposal would subsume IN and therefore we would stop supporting IN
> > > in RexCall.
> > >
> > > Julian
> > >
> > > [1] https://issues.apache.org/jira/browse/CALCITE-4155 <
> > > https://issues.apache.org/jira/browse/CALCITE-4155>
> > >
> > > [2] 

Re: [DISCUSS] Sarg (search argument) to generalize and replace IN in RexCall

2020-08-10 Thread Julian Hyde
Integer is of course the main example but when I say I like “algebraic” 
approach of Guava range sets I mean that it operates based on the properties of 
data types, not any hard-coded list.

Sargs (and Guava range sets) merely require the values to be comparable (i.e. 
totally ordered). So they would apply to string and float types, for instance. 
Or lexically ordered tuples, (‘a’, 1), (‘b’, 2).

Null does not fit neatly into a total order, so we’d have to find a way to 
finesse expressions like ‘x between 3 and 5 or x is null’.

Sargs (and Guava range sets) are able to do additional optimizations if the 
domain is discrete. The optimization example I gave,  ">2 AND <4" to “3”, uses 
that discrete property. Integers are the main example of discrete domains, but 
“INTERVAL DAY” and “BOOLEAN” are other examples.

Bounded-length strings (e.g. VARCHAR(10)) are also a discrete domain, but the 
Guava documentation recommends against it. Maybe there are just too many 
values. I do think it would be worth treating constrained types, e.g.

  paymentType VARCHAR(6) CHECK (paymentType IN (‘CASH’,’CREDIT’))

as discrete domains.

Julian


> On Aug 10, 2020, at 9:15 AM, Michael Mior  wrote:
> 
> The simplifications you present would also of course assume that the
> column has integer type. In any case, overall the proposal seems solid
> to me.
> 
> --
> Michael Mior
> mm...@apache.org
> 
> Le lun. 10 août 2020 à 04:21, Fan Liya  a écrit :
>> 
>> Hi Julian,
>> 
>> Thanks for opening the discussion.
>> 
>> In general, I am convinced of the value and importance of the proposal.
>> 
>> I want to discuss more about the algebra involved, as the simplified range
>> sets may not have the minimum computational costs.
>> Some examples:
>> 
>> 1. Suppose we have expression x in (1, 2, 4, 5, 6).
>> 
>> The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
>> 4 and x <= 6), which represents 4 comparisons and 3 logical operations.
>> Another equivalent expression is x = 1 or x = 2 or x = 4 or x = 5 or x = 6,
>> which represents 5 comparisons and 4 logical operations.
>> It's hard to say which one has a smaller computational cost.
>> 
>> 2. Suppose we have expression x in (1, 2, 4, 5)
>> The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
>> 4 and x <= 5).
>> Another (possibly cheaper) reduction should be x >= 1 and x <= 5 and x != 3.
>> I am not sure if the range set implementation supports this type of
>> simplification.
>> 
>> Therefore, to address above problems,
>> 1. We may need a model to estimate the cost, and the simplification process
>> should be guided by the cost model.
>> 2. Maybe we need to extend the range set implementation so it produces
>> expressions with minimum computational cost.
>> 
>> Best,
>> Liya Fan
>> 
>> 
>> 
>> 
>> 
>> On Mon, Aug 10, 2020 at 3:27 AM Julian Hyde  wrote:
>> 
>>> We have had several discussions over the years about how to represent
>>> IN-lists (e.g. “x IN (1, 3, 5)”) in RexNode-land.
>>> 
>>> I have generally taken the position that we should expand to ORs (e.g. “x
>>> = 1 OR x = 3 OR x = 5”) but a few months ago accepted that we should allow
>>> IN in RexCall.
>>> 
>>> I have given this some further thought, as part of a couple of RexSimplify
>>> bugs [1] [2] and I now think we should replace IN with something more
>>> powerful, namely sargs. A sarg (or search argument [3]) is an ordered set
>>> of intervals, and can be represented as a Guava ImmutableRangeSet, such as
>>> "[[0‥1), (1‥2], [3‥3], (5‥+∞)]". It can represent an IN-list of constants,
>>> but also ranges.
>>> 
>>> Today you would write
>>> 
>>>  RexCall(IN, RexInputRef(0), RexLiteral(1), RexLiteral(3), RexLiteral(5))
>>> 
>>> With sargs, you would instead write
>>> 
>>>  RexCall(IN_SARG, RexInputRef(0), RexSarg(ImmutableRangeSet(“[[1..1],
>>> [3..3], [5..5]]”)))
>>> 
>>> There is a new operator IN_SARG, and a new node type RexSarg (it could
>>> almost be a RexLiteral).
>>> 
>>> Sargs (and Guava RangeSets) have a powerful and consistent algebra, so if
>>> we invest in sarg support in RexSimplify and RelOptPredicateList, that
>>> investment is likely to pay dividends in better plans.
>>> 
>>> Guava RangeSets, and hence sargs, have support for discrete domains, so
>>> they can easily optimize ">2 AND <4" to "3”.
>>> 
>>> Sargs would be the preferred (therefore canonical) form for any AND or OR
>>> list that has more than one comparison on the same operand (e.g. "x > 3 AND
>>> x < 17 AND x <> 10”).
>>> 
>>> This proposal would subsume IN and therefore we would stop supporting IN
>>> in RexCall.
>>> 
>>> Julian
>>> 
>>> [1] https://issues.apache.org/jira/browse/CALCITE-4155 <
>>> https://issues.apache.org/jira/browse/CALCITE-4155>
>>> 
>>> [2] https://github.com/julianhyde/calcite/tree/4159-simplify <
>>> https://github.com/julianhyde/calcite/tree/4159-simplify>
>>> 
>>> [3] https://en.wikipedia.org/wiki/Sargable <
>>> https://en.wikipedia.org/wiki/Sargable>



Re: [VOTE] Release apache-calcite-1.25.0 (release candidate 0)

2020-08-10 Thread Andrei Sereda
Yes I'm aware that for some reason site artifact wasn't built / uploaded.

Meanwhile here are the release notes:
https://github.com/apache/calcite/blob/calcite-1.25.0-rc0/site/_docs/history.md

I'll double-check the build script / instructions, maybe I have missed
something.

On Mon, Aug 10, 2020 at 12:59 PM Haisheng Yuan  wrote:

> +1 (binding)
>
> Environment:
> Mac OS X 10.15.1 , JDK 1.8.0_162
> - Ran unit tests (./gradlew build), OK
> - Checked release notes, CALCITE-4156, CALCITE-4022 CALCITE-4115
> CALCITE-4129 CALCITE-4111 are not part of Build and test suite changes,
> those can be updated after release.
>
> - Haisheng
>
> On 2020/08/09 03:22:28, Andrei Sereda  wrote:
> > Hi all,
> >
> > I have created a build for Apache Calcite 1.25.0, release
> > candidate 0.
> >
> > Thanks to everyone who has contributed to this release.
> >
> > You can read the release notes here:
> > https://apache.github.io/calcite-site-preview/docs/history.html
> >
> > The commit to be voted upon:
> >
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=68b02dfd4af15bc94a91a0cd2a30655d04439555
> >
> > Its hash is 68b02dfd4af15bc94a91a0cd2a30655d04439555
> >
> > Tag:
> > https://github.com/apache/calcite/tree/calcite-1.25.0-rc0
> >
> > The artifacts to be voted on are located here:
> > https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.25.0-rc0
> > (revision 40922)
> >
> > RAT report:
> > https://apache.github.io/calcite-site-preview/rat/rat-report.txt
> >
> > Site preview is here:
> > https://apache.github.io/calcite-site-preview/
> >
> > JavaDoc API preview is here:
> > https://apache.github.io/calcite-site-preview/api
> >
> > The hashes of the artifacts are as follows:
> >
> a5e61bd93657a274ee8a1d1ecbde68e3e471fd27b85bea179991b372f094ae3cdf692672245506a08b996534f9136be26569de81af2bb7d8f026799313957e87
> > *apache-calcite-1.25.0-src.tar.gz
> >
> > A staged Maven repository is available for review at:
> >
> https://repository.apache.org/content/repositories/orgapachecalcite-1097/org/apache/calcite/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/sereda.asc
> > https://www.apache.org/dist/calcite/KEYS
> >
> > N.B.
> > To create the jars and test Apache Calcite: "./gradlew build".
> >
> > If you do not have a Java environment available, you can run the tests
> > using docker. To do so, install docker and docker-compose, then run
> > "docker-compose run test" from the root of the directory.
> >
> > Please vote on releasing this package as Apache Calcite 1.25.0.
> >
> > The vote is open for the next 72 hours and passes if a majority of at
> > least three +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Calcite 1.25.0
> > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > [ ] -1 Do not release this package because...
> >
> >
> > Here is my vote:
> >
> > +1 (non binding)
> >
>


Re: [VOTE] Release apache-calcite-1.25.0 (release candidate 0)

2020-08-10 Thread Haisheng Yuan
+1 (binding)

Environment: 
Mac OS X 10.15.1 , JDK 1.8.0_162
- Ran unit tests (./gradlew build), OK
- Checked release notes, CALCITE-4156, CALCITE-4022 CALCITE-4115 CALCITE-4129 
CALCITE-4111 are not part of Build and test suite changes, those can be updated 
after release.

- Haisheng

On 2020/08/09 03:22:28, Andrei Sereda  wrote: 
> Hi all,
> 
> I have created a build for Apache Calcite 1.25.0, release
> candidate 0.
> 
> Thanks to everyone who has contributed to this release.
> 
> You can read the release notes here:
> https://apache.github.io/calcite-site-preview/docs/history.html
> 
> The commit to be voted upon:
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=68b02dfd4af15bc94a91a0cd2a30655d04439555
> 
> Its hash is 68b02dfd4af15bc94a91a0cd2a30655d04439555
> 
> Tag:
> https://github.com/apache/calcite/tree/calcite-1.25.0-rc0
> 
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.25.0-rc0
> (revision 40922)
> 
> RAT report:
> https://apache.github.io/calcite-site-preview/rat/rat-report.txt
> 
> Site preview is here:
> https://apache.github.io/calcite-site-preview/
> 
> JavaDoc API preview is here:
> https://apache.github.io/calcite-site-preview/api
> 
> The hashes of the artifacts are as follows:
> a5e61bd93657a274ee8a1d1ecbde68e3e471fd27b85bea179991b372f094ae3cdf692672245506a08b996534f9136be26569de81af2bb7d8f026799313957e87
> *apache-calcite-1.25.0-src.tar.gz
> 
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachecalcite-1097/org/apache/calcite/
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/sereda.asc
> https://www.apache.org/dist/calcite/KEYS
> 
> N.B.
> To create the jars and test Apache Calcite: "./gradlew build".
> 
> If you do not have a Java environment available, you can run the tests
> using docker. To do so, install docker and docker-compose, then run
> "docker-compose run test" from the root of the directory.
> 
> Please vote on releasing this package as Apache Calcite 1.25.0.
> 
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Calcite 1.25.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
> 
> 
> Here is my vote:
> 
> +1 (non binding)
> 


[DISCUSS] Publish test JARS

2020-08-10 Thread Michael Mior
I know this discussion had come up before, but it seems on the release
of 1.22.0 we decided to stop publishing test JARs. While these
shouldn't be needed for normal operation, my project linked below
happens to use several of them for demonstration purposes so currently
I'm stuck on Calcite 1.21.0. Of course I could add in more of my own
example code, but when we already have it written for the tests, it
seems redundant. Thoughts?

https://github.com/michaelmior/calcite-notebooks
--
Michael Mior
mm...@apache.org


Re: [VOTE] Release apache-calcite-1.25.0 (release candidate 0)

2020-08-10 Thread Michael Mior
+1 (binding)

Checked hash and compiled and ran tests. I will note though that all
links in the email to https://apache.github.io/calcite-site-preview/
are broken.

--
Michael Mior
mm...@apache.org

Le sam. 8 août 2020 à 23:22, Andrei Sereda  a écrit :
>
> Hi all,
>
> I have created a build for Apache Calcite 1.25.0, release
> candidate 0.
>
> Thanks to everyone who has contributed to this release.
>
> You can read the release notes here:
> https://apache.github.io/calcite-site-preview/docs/history.html
>
> The commit to be voted upon:
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=68b02dfd4af15bc94a91a0cd2a30655d04439555
>
> Its hash is 68b02dfd4af15bc94a91a0cd2a30655d04439555
>
> Tag:
> https://github.com/apache/calcite/tree/calcite-1.25.0-rc0
>
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.25.0-rc0
> (revision 40922)
>
> RAT report:
> https://apache.github.io/calcite-site-preview/rat/rat-report.txt
>
> Site preview is here:
> https://apache.github.io/calcite-site-preview/
>
> JavaDoc API preview is here:
> https://apache.github.io/calcite-site-preview/api
>
> The hashes of the artifacts are as follows:
> a5e61bd93657a274ee8a1d1ecbde68e3e471fd27b85bea179991b372f094ae3cdf692672245506a08b996534f9136be26569de81af2bb7d8f026799313957e87
> *apache-calcite-1.25.0-src.tar.gz
>
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachecalcite-1097/org/apache/calcite/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/sereda.asc
> https://www.apache.org/dist/calcite/KEYS
>
> N.B.
> To create the jars and test Apache Calcite: "./gradlew build".
>
> If you do not have a Java environment available, you can run the tests
> using docker. To do so, install docker and docker-compose, then run
> "docker-compose run test" from the root of the directory.
>
> Please vote on releasing this package as Apache Calcite 1.25.0.
>
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Calcite 1.25.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (non binding)


Re: [DISCUSS] Sarg (search argument) to generalize and replace IN in RexCall

2020-08-10 Thread Michael Mior
The simplifications you present would also of course assume that the
column has integer type. In any case, overall the proposal seems solid
to me.

--
Michael Mior
mm...@apache.org

Le lun. 10 août 2020 à 04:21, Fan Liya  a écrit :
>
> Hi Julian,
>
> Thanks for opening the discussion.
>
> In general, I am convinced of the value and importance of the proposal.
>
> I want to discuss more about the algebra involved, as the simplified range
> sets may not have the minimum computational costs.
> Some examples:
>
> 1. Suppose we have expression x in (1, 2, 4, 5, 6).
>
> The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
> 4 and x <= 6), which represents 4 comparisons and 3 logical operations.
> Another equivalent expression is x = 1 or x = 2 or x = 4 or x = 5 or x = 6,
> which represents 5 comparisons and 4 logical operations.
> It's hard to say which one has a smaller computational cost.
>
> 2. Suppose we have expression x in (1, 2, 4, 5)
> The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
> 4 and x <= 5).
> Another (possibly cheaper) reduction should be x >= 1 and x <= 5 and x != 3.
> I am not sure if the range set implementation supports this type of
> simplification.
>
> Therefore, to address above problems,
> 1. We may need a model to estimate the cost, and the simplification process
> should be guided by the cost model.
> 2. Maybe we need to extend the range set implementation so it produces
> expressions with minimum computational cost.
>
> Best,
> Liya Fan
>
>
>
>
>
> On Mon, Aug 10, 2020 at 3:27 AM Julian Hyde  wrote:
>
> > We have had several discussions over the years about how to represent
> > IN-lists (e.g. “x IN (1, 3, 5)”) in RexNode-land.
> >
> > I have generally taken the position that we should expand to ORs (e.g. “x
> > = 1 OR x = 3 OR x = 5”) but a few months ago accepted that we should allow
> > IN in RexCall.
> >
> > I have given this some further thought, as part of a couple of RexSimplify
> > bugs [1] [2] and I now think we should replace IN with something more
> > powerful, namely sargs. A sarg (or search argument [3]) is an ordered set
> > of intervals, and can be represented as a Guava ImmutableRangeSet, such as
> > "[[0‥1), (1‥2], [3‥3], (5‥+∞)]". It can represent an IN-list of constants,
> > but also ranges.
> >
> > Today you would write
> >
> >   RexCall(IN, RexInputRef(0), RexLiteral(1), RexLiteral(3), RexLiteral(5))
> >
> > With sargs, you would instead write
> >
> >   RexCall(IN_SARG, RexInputRef(0), RexSarg(ImmutableRangeSet(“[[1..1],
> > [3..3], [5..5]]”)))
> >
> > There is a new operator IN_SARG, and a new node type RexSarg (it could
> > almost be a RexLiteral).
> >
> > Sargs (and Guava RangeSets) have a powerful and consistent algebra, so if
> > we invest in sarg support in RexSimplify and RelOptPredicateList, that
> > investment is likely to pay dividends in better plans.
> >
> > Guava RangeSets, and hence sargs, have support for discrete domains, so
> > they can easily optimize ">2 AND <4" to "3”.
> >
> > Sargs would be the preferred (therefore canonical) form for any AND or OR
> > list that has more than one comparison on the same operand (e.g. "x > 3 AND
> > x < 17 AND x <> 10”).
> >
> > This proposal would subsume IN and therefore we would stop supporting IN
> > in RexCall.
> >
> > Julian
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-4155 <
> > https://issues.apache.org/jira/browse/CALCITE-4155>
> >
> > [2] https://github.com/julianhyde/calcite/tree/4159-simplify <
> > https://github.com/julianhyde/calcite/tree/4159-simplify>
> >
> > [3] https://en.wikipedia.org/wiki/Sargable <
> > https://en.wikipedia.org/wiki/Sargable>


Re: Adopting Calcite to a new DB

2020-08-10 Thread Michael Mior
Is this an operator that corresponds to new operations that need to be
processed or optimizations of existing operations? If these are simply
optimizations, you may be able to write rules that replace existing
operators with your custom operators. You could then hook into the
planner to add these rules.

If you actually need to represent operations not covered by the
current algebra, you'll probably want to modify either RelBuilder to
construct the expressions manually (or with your own parser) or modify
the parser to accept whatever new constructs you need and then change
SqlToRelConverter to construct the algebra for these queries.

--
Michael Mior
mm...@apache.org

Le lun. 10 août 2020 à 11:20, Yori Lavi  a écrit :
>
> Hi
> I'm working on a DB that focuses on substantial data sets (100s of TB). We
> have built multiple custom operators and are now trying to integrate
> Calcite so we can do better optimization, including CBO.
> We're looking for help regarding best practices for introducing a custom
> operator.
> We're also looking for assistance on the best way to use Calcite in our
> environment.
>
> We'll be happy to consider a consulting engagement or even just pointers to
> Calcite architecture.
>
> Thanks,
> Yori
> yori.lavi@gmail.com


Adopting Calcite to a new DB

2020-08-10 Thread Yori Lavi
Hi
I'm working on a DB that focuses on substantial data sets (100s of TB). We
have built multiple custom operators and are now trying to integrate
Calcite so we can do better optimization, including CBO.
We're looking for help regarding best practices for introducing a custom
operator.
We're also looking for assistance on the best way to use Calcite in our
environment.

We'll be happy to consider a consulting engagement or even just pointers to
Calcite architecture.

Thanks,
Yori
yori.lavi@gmail.com


回复: [VOTE] Release apache-calcite-1.25.0 (release candidate 0)

2020-08-10 Thread 953396112
+1 (non binding)


Local Calcite build with tests enabled on MacOs: Ok
Calcite-based system test suite: Ok


MacOs (Catalina 10.15.6)
Java version(1.8.0_191)


xzh





--原始邮件--
发件人:
"dev"   
 
https://apache.github.io/calcite-site-preview/docs/history.html.
 
  On Mon, Aug 10, 2020 at 9:47 AM Rui Wang https://github.com/apache/calcite/blob/calcite-1.25.0-rc0/site/_docs/history.md
)

 - OK
 Spotted checked a few JARs in the Maven repository - OK

 Environment (OpenJDK:latest docker container):
 Gradle 6.3 (via gradlew)
 Oracle Linux Server 7.8
 openjdk 14.0.2 2020-07-14
 OpenJDK Runtime Environment (build 14.0.2+12-46)
 OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed 
mode, sharing)

 My vote is: +1 (binding)

 Francis

 On 10/08/2020 4:10 am, Rui Wang wrote:
  +1 (non-binding)
 
  platform: ubuntu16 + java8
 
  run tests locally on: ok
  verify the commit hash in git tag: ok
  check sha512: ok
  verify signature by gpg: ok
 
 
  one thing can be fixed after vote: CALCITE-4114 
does not belong
 to
  "breaking changes" in the release note. That 
change happened in
core/test
  (a testing related change).
 
 
  -Rui
 
 
 
  On Sun, Aug 9, 2020 at 2:02 AM Enrico Olivelli <
  eolive...@gmail.com
 wrote:
 
  +1 (non binding)
  run tests locally on Fedora + JDK14
  run tests of HerdDB just by switching from 
1.24 without any
 change
 
  Enrico
 
  Il giorno dom 9 ago 2020 alle ore 05:22 
Andrei Sereda
https://apache.github.io/calcite-site-preview/docs/history.html
 
  The commit to be voted upon:
 
 
 

   
  
 
 
https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=68b02dfd4af15bc94a91a0cd2a30655d04439555
 
  Its hash is 
68b02dfd4af15bc94a91a0cd2a30655d04439555
 
  Tag:
  
https://github.com/apache/calcite/tree/calcite-1.25.0-rc0
 
  The artifacts to be voted on are located 
here:
 

  
 https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.25.0-rc0
  (revision 40922)
 
  RAT report:
 
 https://apache.github.io/calcite-site-preview/rat/rat-report.txt
 
  Site preview is here:
  
https://apache.github.io/calcite-site-preview/
 
  JavaDoc API preview is here:
  
https://apache.github.io/calcite-site-preview/api
 
  The hashes of the artifacts are as 
follows:
 
 
 

   
  
 
 
a5e61bd93657a274ee8a1d1ecbde68e3e471fd27b85bea179991b372f094ae3cdf692672245506a08b996534f9136be26569de81af2bb7d8f026799313957e87
  *apache-calcite-1.25.0-src.tar.gz
 
  A staged Maven repository is available 
for review at:
 
 
 

   
  
 
 
https://repository.apache.org/content/repositories/orgapachecalcite-1097/org/apache/calcite/
 
  Release artifacts are signed with the 
following key:
  
https://people.apache.org/keys/committer/sereda.asc
  https://www.apache.org/dist/calcite/KEYS
 
  N.B.
  To create the jars and test Apache 
Calcite: "./gradlew build".
 
  If you do not have a Java environment 
available, you can run
 the
tests
  using docker. To do so, install docker 
and docker-compose, then
  run
  "docker-compose run test" from the root 
of the directory.
 
  Please vote on releasing this package as 
Apache Calcite 1.25.0.
 
  The vote is open for the next 72 hours 
and passes if a majority
  of
   at
  least three +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache 
Calcite 1.25.0
  [ ] 0 I don't feel strongly about 
it, but I'm okay with the
   release
  [ ] -1 Do not release this package 
because...
 
 
  Here is my vote:
 
  +1 (non binding)
 
 
 

   
  
 



-- 
Best regards,
Anton.

[jira] [Created] (CALCITE-4171) Support named parameters for table window functions

2020-08-10 Thread Danny Chen (Jira)
Danny Chen created CALCITE-4171:
---

 Summary: Support named parameters for table window functions
 Key: CALCITE-4171
 URL: https://issues.apache.org/jira/browse/CALCITE-4171
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.24.0, 1.23.0
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 1.25.0


Supports window functions like named params style:
```sql
select *
from table(
session(
  data => table Shipments,
  timecol => descriptor(rowtime),
  key => descriptor(orderId),
  size => INTERVAL '10' MINUTE))
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release apache-calcite-1.25.0 (release candidate 0)

2020-08-10 Thread Anton Haidai
Hello,

Local Calcite build with tests enabled on Linux: *OK*
Calcite-based system (Zoomdata) test suite: *OK*

Vote:
+1 (non-binding)

On Mon, Aug 10, 2020 at 11:09 AM Ruben Q L  wrote:

> +1 (non binding)
> - Local Calcite build with tests (Windows10 + JDK8): OK
> - Calcite-based application test suite: OK
>
>
> Le lun. 10 août 2020 à 07:13, chunwei  a écrit :
>
> > +1 (non-binding).
> >
> > - run tests locally: ok
> > - verify the commit hash in git tag: ok
> > - check sha512: ok
> >
> > Environment:
> > - java version 1.8.0_151
> > - MacOS Mojave 10.14.6
> >
> > BTW, It shows 404 when I open
> > https://apache.github.io/calcite-site-preview/docs/history.html.
> >
> > On Mon, Aug 10, 2020 at 9:47 AM Rui Wang  wrote:
> >
> > > > I'm happy to change release notes to better reflect 1.25.0 changes. I
> > > > presume it can be done after the vote ?
> > >
> > > Absolutely! It does not affect current vote.
> > >
> > >
> > > -Rui
> > >
> > > On Sun, Aug 9, 2020 at 4:55 PM Andrei Sereda  wrote:
> > >
> > > > > one thing can be fixed after vote: CALCITE-4114 does not belong to
> > > > "breaking changes" in the release note.
> > > >
> > > > I'm happy to change release notes to better reflect 1.25.0 changes. I
> > > > presume it can be done after the vote ?
> > > >
> > > > On Sun, Aug 9, 2020 at 7:39 PM Francis Chuang <
> > francischu...@apache.org>
> > > > wrote:
> > > >
> > > > > Thanks for making this release available for voting, Andrei!
> > > > >
> > > > > Verified GPG Signature - OK
> > > > > Verified SHA512 - OK
> > > > > Ran tests per HOWTO (./gradlew check) - OK
> > > > > Quickly skimmed release notes
> > > > > (
> > > > >
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/calcite-1.25.0-rc0/site/_docs/history.md
> > > > )
> > > > >
> > > > > - OK
> > > > > Spotted checked a few JARs in the Maven repository - OK
> > > > >
> > > > > Environment (OpenJDK:latest docker container):
> > > > > Gradle 6.3 (via gradlew)
> > > > > Oracle Linux Server 7.8
> > > > > openjdk 14.0.2 2020-07-14
> > > > > OpenJDK Runtime Environment (build 14.0.2+12-46)
> > > > > OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing)
> > > > >
> > > > > My vote is: +1 (binding)
> > > > >
> > > > > Francis
> > > > >
> > > > > On 10/08/2020 4:10 am, Rui Wang wrote:
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > platform: ubuntu16 + java8
> > > > > >
> > > > > > run tests locally on: ok
> > > > > > verify the commit hash in git tag: ok
> > > > > > check sha512: ok
> > > > > > verify signature by gpg:  ok
> > > > > >
> > > > > >
> > > > > > one thing can be fixed after vote: CALCITE-4114 does not belong
> to
> > > > > > "breaking changes" in the release note. That change happened in
> > > > core/test
> > > > > > (a testing related change).
> > > > > >
> > > > > >
> > > > > > -Rui
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Aug 9, 2020 at 2:02 AM Enrico Olivelli <
> > eolive...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> +1 (non binding)
> > > > > >> run tests locally on Fedora + JDK14
> > > > > >> run tests of HerdDB just by switching from 1.24 without any
> change
> > > > > >>
> > > > > >> Enrico
> > > > > >>
> > > > > >> Il giorno dom 9 ago 2020 alle ore 05:22 Andrei Sereda
> > > >  > > > > >
> > > > > >> ha
> > > > > >> scritto:
> > > > > >>
> > > > > >>> Hi all,
> > > > > >>>
> > > > > >>> I have created a build for Apache Calcite 1.25.0, release
> > > > > >>> candidate 0.
> > > > > >>>
> > > > > >>> Thanks to everyone who has contributed to this release.
> > > > > >>>
> > > > > >>> You can read the release notes here:
> > > > > >>>
> https://apache.github.io/calcite-site-preview/docs/history.html
> > > > > >>>
> > > > > >>> The commit to be voted upon:
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=68b02dfd4af15bc94a91a0cd2a30655d04439555
> > > > > >>>
> > > > > >>> Its hash is 68b02dfd4af15bc94a91a0cd2a30655d04439555
> > > > > >>>
> > > > > >>> Tag:
> > > > > >>> https://github.com/apache/calcite/tree/calcite-1.25.0-rc0
> > > > > >>>
> > > > > >>> The artifacts to be voted on are located here:
> > > > > >>>
> > > > >
> > >
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.25.0-rc0
> > > > > >>> (revision 40922)
> > > > > >>>
> > > > > >>> RAT report:
> > > > > >>>
> https://apache.github.io/calcite-site-preview/rat/rat-report.txt
> > > > > >>>
> > > > > >>> Site preview is here:
> > > > > >>> https://apache.github.io/calcite-site-preview/
> > > > > >>>
> > > > > >>> JavaDoc API preview is here:
> > > > > >>> https://apache.github.io/calcite-site-preview/api
> > > > > >>>
> > > > > >>> The hashes of the artifacts are as follows:
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> a5e61bd93657a274ee8a1d1ecbde68e3e471fd27b85bea179991b372f094ae3cdf692672245506a08b996534f9136be26569de81af2bb7d8f026799313957e87
> > > > > >>> *apache-calcite-1.25.0-src.tar.gz
> > > > > >>>
> > 

Re: Gather real world statistics about row count, CPU, and IO cost

2020-08-10 Thread Thomas Rebele
Thank you for your analysis and the references. I would be more interested
in sub-problem 2.
A kind of tuning tool would be nice. If a project using Calcite  introduces
custom operators, it could help to find a cost model for them.

I would start with a simple case, e.g., deciding whether a particular join
should be translated into a hash join, merge join, or correlate.
With a different cardinality of the inputs, one algorithm or the other
might be better. The real-world statistics might help to decide which one
to choose.

Cordialement / Best Regards,
*Thomas Rebele* | R Developer | 18 rue du 4 septembre, 75002 Paris, France
| www.tibco.com


On Fri, Aug 7, 2020 at 11:32 PM Julian Hyde  wrote:

> Also, check out the paper "Learning to Optimize Join Queries With Deep
> Reinforcement Learning" by Krishnan, Yang, Goldberg, Hellerstein,
> Stoica 2019 (which aims to improve the join-order benchmark and uses
> Calcite as one of its test platforms):
> https://arxiv.org/pdf/1808.03196.pdf
>
> On Fri, Aug 7, 2020 at 2:02 PM Julian Hyde  wrote:
> >
> > I consider there to be two fairly independent sub-problems:
> >
> > 1. Improve our cardinality estimates (especially for queries with
> > complex joins and aggregation).
> >
> > 2. Calibrate physical operators so that, given a good estimate of the
> > number of rows they will see, we can come up with a reasonable
> > estimate of the physical cost (e.g. how long the query will take to
> > execute, and how much memory).
> >
> > For 1, the paper you cite, and the join-order benchmark it introduces,
> > is an excellent contribution to the field. It inspired me to do work
> > on profiling [1]. I would encourage you to build on the work I have
> > already done.
> >
> > For 2, I have not done any work personally. An approach would be to
> > give each physical operator (e.g. EnumerableHashJoin) a cost model
> > parameterized by certain constants, and then run experiments to
> > determine the values of those constants empirically. Perhaps we could
> > write a "TuningTool" that generates an "operator constants file", and
> > thereby start to formalize the process.
> >
> > Julian
> >
> > [1]
> https://www.slideshare.net/julianhyde/data-profiling-in-apache-calcite
> >
> > On Fri, Aug 7, 2020 at 6:57 AM Thomas Rebele 
> wrote:
> > >
> > > Hi all,
> > >
> > > I'm working on basic query optimization. I once stumbled on the case
> that
> > > two operators had the same row count but one had a much higher CPU
> cost.
> > > Unfortunately the default cost model only takes the row count into
> account
> > > (see [1]). Stamatis had pointed out in another mail that the row count
> > > might be much more important than the other costs [2]. However, if
> there
> > > are two possible choices with the same row count, we should prefer the
> one
> > > with the least CPU cost. I'm wondering whether the assumption that a
> > > smaller row count is better in most cases is actually correct. Also,
> what
> > > is "better" in this context? The query plan with the least execution
> time?
> > > Maybe there's a plan that is just <10% slower, but consumes much less
> > > CPU/memory/etc.
> > >
> > > So I thought about the cost model in general, and how to improve it. I
> > > assume the better the estimated cost corresponds to the real cost, the
> > > better the optimized plans. So the first step would be to collect the
> real
> > > world statistics and the second step to adapt the cost estimation so
> that
> > > there's a better correspondence. For the beginning I would just
> measure how
> > > many rows have been in the result and how much time has passed for each
> > > RelNode during query execution. Is there already a way to do this in
> > > Calcite? Does this make sense at all?
> > >
> > > [1]
> > >
> https://github.com/apache/calcite/blob/52a57078ba081b24b9d086ed363c715485d1a519/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoCost.java#L100
> > > [2]
> > >
> https://15721.courses.cs.cmu.edu/spring2019/papers/24-costmodels/p204-leis.pdf
> > >
> > > Cordialement / Best Regards,
> > > *Thomas Rebele* | R Developer | 18 rue du 4 septembre, 75002 Paris,
> France
> > > | www.tibco.com
>


Re: RelToSqlConverter NullPointerException

2020-08-10 Thread tonytao

Sorry, the attached file was wrong,please ignore it.

I upload a new file.

On 8/10/20 4:38 PM, tonytao wrote:

hi team,

I met a NullPointerException when I used calcite 1.24.0 to convert a 
relNode to sqlNode.


The trace:

java.lang.NullPointerException
    at java.base/java.util.Objects.requireNonNull(Objects.java:221)
    at 
org.apache.calcite.rel.rel2sql.RelToSqlConverter.result(RelToSqlConverter.java:152)
    at 
org.apache.calcite.rel.rel2sql.SqlImplementor.result(SqlImplementor.java:454)
    at 
org.apache.calcite.rel.rel2sql.SqlImplementor$Builder.result(SqlImplementor.java:1822)
    at 
org.apache.calcite.rel.rel2sql.RelToSqlConverter.visit(RelToSqlConverter.java:392)

    at server.TestRelToSqlConverter.test(TestRelToSqlConverter.java:84)


Attched file is a junit5 test case,you can repeat the issue with this 
code.


Maven configuration:

        
            org.apache.calcite
            calcite-core
            1.24.0
        
        
            org.apache.commons
            commons-dbcp2
            2.7.0
        
        
            org.postgresql
            postgresql
            42.2.12
        




package server;

import static org.junit.jupiter.api.Assertions.*;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;

import org.apache.calcite.adapter.jdbc.JdbcSchema;
import org.apache.calcite.plan.RelOptUtil;
import org.apache.calcite.rel.RelNode;
import org.apache.calcite.rel.RelRoot;
import org.apache.calcite.rel.core.Aggregate;
import org.apache.calcite.rel.rel2sql.RelToSqlConverter;
import org.apache.calcite.schema.Schema;
import org.apache.calcite.schema.SchemaPlus;
import org.apache.calcite.sql.SqlNode;
import org.apache.calcite.sql.dialect.PostgresqlSqlDialect;
import org.apache.calcite.sql.parser.SqlParseException;
import org.apache.calcite.sql.parser.SqlParser;
import org.apache.calcite.tools.FrameworkConfig;
import org.apache.calcite.tools.Frameworks;
import org.apache.calcite.tools.Planner;
import org.apache.calcite.tools.RelConversionException;
import org.apache.calcite.tools.ValidationException;
import org.apache.commons.dbcp2.BasicDataSource;
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;

class TestRelToSqlConverter {
	private static BasicDataSource dataSource = null;
	private static Connection conn;
	private static Statement stmt;

	@BeforeAll
	static void setUpBeforeClass() throws Exception {
		String url = "jdbc:postgresql://127.0.0.1:5432/hdb2";
		DriverManager.registerDriver(DriverManager.getDriver(url));
		dataSource = new BasicDataSource();
		dataSource.setUrl(url);
		dataSource.setUsername("dev");
		dataSource.setPassword("123456");
		conn = dataSource.getConnection();
		stmt = conn.createStatement();
		stmt.execute("drop table if exists testdata;");
		stmt.execute("create table testdata(" + "id int," + "uptime timestamp," + "x int," + "y int,"
+ "cmt text" + ");");
		stmt.execute("insert into testdata " + "select id," + "uptime," + "round(random()*100),"
+ "round(random()*100)," + "md5(uptime::text) " + "from (" + "select"
+ "generate_series id," + "current_timestamp + make_interval(mins => generate_series) uptime"
+ " from generate_series(1,10)) t");
	}

	@AfterAll
	static void tearDownAfterClass() throws Exception {
		stmt.execute("drop table testdata;");
		stmt.close();
		conn.close();
	}

	@Test
	void test() throws SQLException, RelConversionException, SqlParseException, ValidationException {
		SqlParser.Config insensitiveParser = SqlParser.configBuilder().setCaseSensitive(false).build();

		SchemaPlus rootSchema = Frameworks.createRootSchema(true);

		Schema schema = JdbcSchema.create(rootSchema, "public", dataSource, "hdb", "public");
		rootSchema.add("public", schema);

		FrameworkConfig config = Frameworks.newConfigBuilder().parserConfig(insensitiveParser).defaultSchema(rootSchema)
.build();
		Planner planner = Frameworks.getPlanner(config);
		String sql = "select  year(uptime),sum(x) from public.testdata group by year(uptime)";
		SqlNode sqlNode = planner.parse(sql);
		System.out.println(sqlNode);
		SqlNode sqlNodeValidated = planner.validate(sqlNode);
		RelRoot relRoot = planner.rel(sqlNodeValidated);
		RelNode relNode = relRoot.project();
		System.out.println(RelOptUtil.toString(relNode));
		RelToSqlConverter converter = new RelToSqlConverter(PostgresqlSqlDialect.DEFAULT);
		System.out.println("\n optimized  sql :");
		System.out.println(converter.visit((Aggregate) relNode).asSelect());
	}

}


Re: [DISCUSS] Sarg (search argument) to generalize and replace IN in RexCall

2020-08-10 Thread Fan Liya
Hi Julian,

Thanks for opening the discussion.

In general, I am convinced of the value and importance of the proposal.

I want to discuss more about the algebra involved, as the simplified range
sets may not have the minimum computational costs.
Some examples:

1. Suppose we have expression x in (1, 2, 4, 5, 6).

The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
4 and x <= 6), which represents 4 comparisons and 3 logical operations.
Another equivalent expression is x = 1 or x = 2 or x = 4 or x = 5 or x = 6,
which represents 5 comparisons and 4 logical operations.
It's hard to say which one has a smaller computational cost.

2. Suppose we have expression x in (1, 2, 4, 5)
The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
4 and x <= 5).
Another (possibly cheaper) reduction should be x >= 1 and x <= 5 and x != 3.
I am not sure if the range set implementation supports this type of
simplification.

Therefore, to address above problems,
1. We may need a model to estimate the cost, and the simplification process
should be guided by the cost model.
2. Maybe we need to extend the range set implementation so it produces
expressions with minimum computational cost.

Best,
Liya Fan





On Mon, Aug 10, 2020 at 3:27 AM Julian Hyde  wrote:

> We have had several discussions over the years about how to represent
> IN-lists (e.g. “x IN (1, 3, 5)”) in RexNode-land.
>
> I have generally taken the position that we should expand to ORs (e.g. “x
> = 1 OR x = 3 OR x = 5”) but a few months ago accepted that we should allow
> IN in RexCall.
>
> I have given this some further thought, as part of a couple of RexSimplify
> bugs [1] [2] and I now think we should replace IN with something more
> powerful, namely sargs. A sarg (or search argument [3]) is an ordered set
> of intervals, and can be represented as a Guava ImmutableRangeSet, such as
> "[[0‥1), (1‥2], [3‥3], (5‥+∞)]". It can represent an IN-list of constants,
> but also ranges.
>
> Today you would write
>
>   RexCall(IN, RexInputRef(0), RexLiteral(1), RexLiteral(3), RexLiteral(5))
>
> With sargs, you would instead write
>
>   RexCall(IN_SARG, RexInputRef(0), RexSarg(ImmutableRangeSet(“[[1..1],
> [3..3], [5..5]]”)))
>
> There is a new operator IN_SARG, and a new node type RexSarg (it could
> almost be a RexLiteral).
>
> Sargs (and Guava RangeSets) have a powerful and consistent algebra, so if
> we invest in sarg support in RexSimplify and RelOptPredicateList, that
> investment is likely to pay dividends in better plans.
>
> Guava RangeSets, and hence sargs, have support for discrete domains, so
> they can easily optimize ">2 AND <4" to "3”.
>
> Sargs would be the preferred (therefore canonical) form for any AND or OR
> list that has more than one comparison on the same operand (e.g. "x > 3 AND
> x < 17 AND x <> 10”).
>
> This proposal would subsume IN and therefore we would stop supporting IN
> in RexCall.
>
> Julian
>
> [1] https://issues.apache.org/jira/browse/CALCITE-4155 <
> https://issues.apache.org/jira/browse/CALCITE-4155>
>
> [2] https://github.com/julianhyde/calcite/tree/4159-simplify <
> https://github.com/julianhyde/calcite/tree/4159-simplify>
>
> [3] https://en.wikipedia.org/wiki/Sargable <
> https://en.wikipedia.org/wiki/Sargable>


Re: [VOTE] Release apache-calcite-1.25.0 (release candidate 0)

2020-08-10 Thread Ruben Q L
+1 (non binding)
- Local Calcite build with tests (Windows10 + JDK8): OK
- Calcite-based application test suite: OK


Le lun. 10 août 2020 à 07:13, chunwei  a écrit :

> +1 (non-binding).
>
> - run tests locally: ok
> - verify the commit hash in git tag: ok
> - check sha512: ok
>
> Environment:
> - java version 1.8.0_151
> - MacOS Mojave 10.14.6
>
> BTW, It shows 404 when I open
> https://apache.github.io/calcite-site-preview/docs/history.html.
>
> On Mon, Aug 10, 2020 at 9:47 AM Rui Wang  wrote:
>
> > > I'm happy to change release notes to better reflect 1.25.0 changes. I
> > > presume it can be done after the vote ?
> >
> > Absolutely! It does not affect current vote.
> >
> >
> > -Rui
> >
> > On Sun, Aug 9, 2020 at 4:55 PM Andrei Sereda  wrote:
> >
> > > > one thing can be fixed after vote: CALCITE-4114 does not belong to
> > > "breaking changes" in the release note.
> > >
> > > I'm happy to change release notes to better reflect 1.25.0 changes. I
> > > presume it can be done after the vote ?
> > >
> > > On Sun, Aug 9, 2020 at 7:39 PM Francis Chuang <
> francischu...@apache.org>
> > > wrote:
> > >
> > > > Thanks for making this release available for voting, Andrei!
> > > >
> > > > Verified GPG Signature - OK
> > > > Verified SHA512 - OK
> > > > Ran tests per HOWTO (./gradlew check) - OK
> > > > Quickly skimmed release notes
> > > > (
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/calcite-1.25.0-rc0/site/_docs/history.md
> > > )
> > > >
> > > > - OK
> > > > Spotted checked a few JARs in the Maven repository - OK
> > > >
> > > > Environment (OpenJDK:latest docker container):
> > > > Gradle 6.3 (via gradlew)
> > > > Oracle Linux Server 7.8
> > > > openjdk 14.0.2 2020-07-14
> > > > OpenJDK Runtime Environment (build 14.0.2+12-46)
> > > > OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing)
> > > >
> > > > My vote is: +1 (binding)
> > > >
> > > > Francis
> > > >
> > > > On 10/08/2020 4:10 am, Rui Wang wrote:
> > > > > +1 (non-binding)
> > > > >
> > > > > platform: ubuntu16 + java8
> > > > >
> > > > > run tests locally on: ok
> > > > > verify the commit hash in git tag: ok
> > > > > check sha512: ok
> > > > > verify signature by gpg:  ok
> > > > >
> > > > >
> > > > > one thing can be fixed after vote: CALCITE-4114 does not belong to
> > > > > "breaking changes" in the release note. That change happened in
> > > core/test
> > > > > (a testing related change).
> > > > >
> > > > >
> > > > > -Rui
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Aug 9, 2020 at 2:02 AM Enrico Olivelli <
> eolive...@gmail.com>
> > > > wrote:
> > > > >
> > > > >> +1 (non binding)
> > > > >> run tests locally on Fedora + JDK14
> > > > >> run tests of HerdDB just by switching from 1.24 without any change
> > > > >>
> > > > >> Enrico
> > > > >>
> > > > >> Il giorno dom 9 ago 2020 alle ore 05:22 Andrei Sereda
> > >  > > > >
> > > > >> ha
> > > > >> scritto:
> > > > >>
> > > > >>> Hi all,
> > > > >>>
> > > > >>> I have created a build for Apache Calcite 1.25.0, release
> > > > >>> candidate 0.
> > > > >>>
> > > > >>> Thanks to everyone who has contributed to this release.
> > > > >>>
> > > > >>> You can read the release notes here:
> > > > >>> https://apache.github.io/calcite-site-preview/docs/history.html
> > > > >>>
> > > > >>> The commit to be voted upon:
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=68b02dfd4af15bc94a91a0cd2a30655d04439555
> > > > >>>
> > > > >>> Its hash is 68b02dfd4af15bc94a91a0cd2a30655d04439555
> > > > >>>
> > > > >>> Tag:
> > > > >>> https://github.com/apache/calcite/tree/calcite-1.25.0-rc0
> > > > >>>
> > > > >>> The artifacts to be voted on are located here:
> > > > >>>
> > > >
> > https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.25.0-rc0
> > > > >>> (revision 40922)
> > > > >>>
> > > > >>> RAT report:
> > > > >>> https://apache.github.io/calcite-site-preview/rat/rat-report.txt
> > > > >>>
> > > > >>> Site preview is here:
> > > > >>> https://apache.github.io/calcite-site-preview/
> > > > >>>
> > > > >>> JavaDoc API preview is here:
> > > > >>> https://apache.github.io/calcite-site-preview/api
> > > > >>>
> > > > >>> The hashes of the artifacts are as follows:
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> a5e61bd93657a274ee8a1d1ecbde68e3e471fd27b85bea179991b372f094ae3cdf692672245506a08b996534f9136be26569de81af2bb7d8f026799313957e87
> > > > >>> *apache-calcite-1.25.0-src.tar.gz
> > > > >>>
> > > > >>> A staged Maven repository is available for review at:
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://repository.apache.org/content/repositories/orgapachecalcite-1097/org/apache/calcite/
> > > > >>>
> > > > >>> Release artifacts are signed with the following key:
> > > > >>> https://people.apache.org/keys/committer/sereda.asc
> > > > >>> https://www.apache.org/dist/calcite/KEYS
> > > > >>>
> > > > >>> N.B.
> > > > >>> To create the jars and test Apache Calcite: "./gradlew build".
> > 

Re: [VOTE] Release apache-calcite-1.25.0 (release candidate 0)

2020-08-10 Thread chunwei
+1 (non-binding).

- run tests locally: ok
- verify the commit hash in git tag: ok
- check sha512: ok

Environment:
- java version 1.8.0_151
- MacOS Mojave 10.14.6

BTW, It shows 404 when I open
https://apache.github.io/calcite-site-preview/docs/history.html.

On Mon, Aug 10, 2020 at 9:47 AM Rui Wang  wrote:

> > I'm happy to change release notes to better reflect 1.25.0 changes. I
> > presume it can be done after the vote ?
>
> Absolutely! It does not affect current vote.
>
>
> -Rui
>
> On Sun, Aug 9, 2020 at 4:55 PM Andrei Sereda  wrote:
>
> > > one thing can be fixed after vote: CALCITE-4114 does not belong to
> > "breaking changes" in the release note.
> >
> > I'm happy to change release notes to better reflect 1.25.0 changes. I
> > presume it can be done after the vote ?
> >
> > On Sun, Aug 9, 2020 at 7:39 PM Francis Chuang 
> > wrote:
> >
> > > Thanks for making this release available for voting, Andrei!
> > >
> > > Verified GPG Signature - OK
> > > Verified SHA512 - OK
> > > Ran tests per HOWTO (./gradlew check) - OK
> > > Quickly skimmed release notes
> > > (
> > >
> >
> https://github.com/apache/calcite/blob/calcite-1.25.0-rc0/site/_docs/history.md
> > )
> > >
> > > - OK
> > > Spotted checked a few JARs in the Maven repository - OK
> > >
> > > Environment (OpenJDK:latest docker container):
> > > Gradle 6.3 (via gradlew)
> > > Oracle Linux Server 7.8
> > > openjdk 14.0.2 2020-07-14
> > > OpenJDK Runtime Environment (build 14.0.2+12-46)
> > > OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing)
> > >
> > > My vote is: +1 (binding)
> > >
> > > Francis
> > >
> > > On 10/08/2020 4:10 am, Rui Wang wrote:
> > > > +1 (non-binding)
> > > >
> > > > platform: ubuntu16 + java8
> > > >
> > > > run tests locally on: ok
> > > > verify the commit hash in git tag: ok
> > > > check sha512: ok
> > > > verify signature by gpg:  ok
> > > >
> > > >
> > > > one thing can be fixed after vote: CALCITE-4114 does not belong to
> > > > "breaking changes" in the release note. That change happened in
> > core/test
> > > > (a testing related change).
> > > >
> > > >
> > > > -Rui
> > > >
> > > >
> > > >
> > > > On Sun, Aug 9, 2020 at 2:02 AM Enrico Olivelli 
> > > wrote:
> > > >
> > > >> +1 (non binding)
> > > >> run tests locally on Fedora + JDK14
> > > >> run tests of HerdDB just by switching from 1.24 without any change
> > > >>
> > > >> Enrico
> > > >>
> > > >> Il giorno dom 9 ago 2020 alle ore 05:22 Andrei Sereda
> >  > > >
> > > >> ha
> > > >> scritto:
> > > >>
> > > >>> Hi all,
> > > >>>
> > > >>> I have created a build for Apache Calcite 1.25.0, release
> > > >>> candidate 0.
> > > >>>
> > > >>> Thanks to everyone who has contributed to this release.
> > > >>>
> > > >>> You can read the release notes here:
> > > >>> https://apache.github.io/calcite-site-preview/docs/history.html
> > > >>>
> > > >>> The commit to be voted upon:
> > > >>>
> > > >>>
> > > >>
> > >
> >
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=68b02dfd4af15bc94a91a0cd2a30655d04439555
> > > >>>
> > > >>> Its hash is 68b02dfd4af15bc94a91a0cd2a30655d04439555
> > > >>>
> > > >>> Tag:
> > > >>> https://github.com/apache/calcite/tree/calcite-1.25.0-rc0
> > > >>>
> > > >>> The artifacts to be voted on are located here:
> > > >>>
> > >
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.25.0-rc0
> > > >>> (revision 40922)
> > > >>>
> > > >>> RAT report:
> > > >>> https://apache.github.io/calcite-site-preview/rat/rat-report.txt
> > > >>>
> > > >>> Site preview is here:
> > > >>> https://apache.github.io/calcite-site-preview/
> > > >>>
> > > >>> JavaDoc API preview is here:
> > > >>> https://apache.github.io/calcite-site-preview/api
> > > >>>
> > > >>> The hashes of the artifacts are as follows:
> > > >>>
> > > >>>
> > > >>
> > >
> >
> a5e61bd93657a274ee8a1d1ecbde68e3e471fd27b85bea179991b372f094ae3cdf692672245506a08b996534f9136be26569de81af2bb7d8f026799313957e87
> > > >>> *apache-calcite-1.25.0-src.tar.gz
> > > >>>
> > > >>> A staged Maven repository is available for review at:
> > > >>>
> > > >>>
> > > >>
> > >
> >
> https://repository.apache.org/content/repositories/orgapachecalcite-1097/org/apache/calcite/
> > > >>>
> > > >>> Release artifacts are signed with the following key:
> > > >>> https://people.apache.org/keys/committer/sereda.asc
> > > >>> https://www.apache.org/dist/calcite/KEYS
> > > >>>
> > > >>> N.B.
> > > >>> To create the jars and test Apache Calcite: "./gradlew build".
> > > >>>
> > > >>> If you do not have a Java environment available, you can run the
> > tests
> > > >>> using docker. To do so, install docker and docker-compose, then run
> > > >>> "docker-compose run test" from the root of the directory.
> > > >>>
> > > >>> Please vote on releasing this package as Apache Calcite 1.25.0.
> > > >>>
> > > >>> The vote is open for the next 72 hours and passes if a majority of
> at
> > > >>> least three +1 PMC votes are cast.
> > > >>>
> > > >>> [ ] +1 Release this package as Apache Calcite 1.25.0
> > >