CassandraAdapterTest failure

2018-07-29 Thread Julian Hyde
I'm seeing the following error when I run the tests on bd0e14002
origin/master. Anyone else see it?

[INFO] Running org.apache.calcite.test.CassandraAdapterTest
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
elapsed: 0.001 s <<< FAILURE! - in
org.apache.calcite.test.CassandraAdapterTest
[ERROR] org.apache.calcite.test.CassandraAdapterTest  Time elapsed:
0.001 s  <<< ERROR!
java.lang.ExceptionInInitializerError
at 
org.apache.calcite.test.CassandraAdapterTest.initCassandraIfEnabled(CassandraAdapterTest.java:106)
at 
org.apache.calcite.test.CassandraAdapterTest.(CassandraAdapterTest.java:56)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 5
at 
org.apache.calcite.test.CassandraAdapterTest.initCassandraIfEnabled(CassandraAdapterTest.java:106)
at 
org.apache.calcite.test.CassandraAdapterTest.(CassandraAdapterTest.java:56)


Re: CassandraAdapterTest failure

2018-07-29 Thread Julian Hyde
I ran on several JDK versions, all on Ubuntu Linux. The machine was
fairly heavily loaded (I was copying one filesystem to another at the
time).

Here's the error from OpenJDK10:

[INFO] Running org.apache.calcite.test.CassandraAdapterTest
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
elapsed: 0.001 s <<< FAILURE! - in
org.apache.calcite.test.CassandraAdapterTest
[ERROR] org.apache.calcite.test.CassandraAdapterTest  Time elapsed:
0.001 s  <<< ERROR!
java.lang.ExceptionInInitializerError
at 
org.apache.calcite.test.CassandraAdapterTest.initCassandraIfEnabled(CassandraAdapterTest.java:106)
at 
org.apache.calcite.test.CassandraAdapterTest.(CassandraAdapterTest.java:56)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 5
at 
org.apache.calcite.test.CassandraAdapterTest.initCassandraIfEnabled(CassandraAdapterTest.java:106)
at 
org.apache.calcite.test.CassandraAdapterTest.(CassandraAdapterTest.java:56)

Here's the error in JDK 11:

[INFO] Running org.apache.calcite.test.CassandraAdapterTest
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
elapsed: 0 s <<< FAILURE! - in
org.apache.calcite.test.CassandraAdapterTest
[ERROR] org.apache.calcite.test.CassandraAdapterTest  Time elapsed: 0
s  <<< ERROR!
java.lang.ExceptionInInitializerError
at 
org.apache.calcite.test.CassandraAdapterTest.enabled(CassandraAdapterTest.java:81)
at 
org.apache.calcite.test.CassandraAdapterTest.initCassandraIfEnabled(CassandraAdapterTest.java:88)
at 
org.apache.calcite.test.CassandraAdapterTest.(CassandraAdapterTest.java:56)

Here's the failure from JDK 9:

[INFO] Running org.apache.calcite.test.CassandraAdapterTest
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time
elapsed: 0.001 s <<< FAILURE! - in
org.apache.calcite.test.CassandraAdapterTest
[ERROR] org.apache.calcite.test.CassandraAdapterTest  Time elapsed:
0.001 s  <<< FAILURE!
java.lang.AssertionError: Cassandra daemon did not start within timeout

Here's the failure from JDK 10:

[INFO] Running org.apache.calcite.jdbc.CalciteRemoteDriverTest
[ERROR] Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time
elapsed: 0.642 s <<< FAILURE! - in
org.apache.calcite.jdbc.CalciteRemoteDriverTest
[ERROR] testRemoteExecuteQuery(org.apache.calcite.jdbc.CalciteRemoteDriverTest)
 Time elapsed: 0.047 s  <<< FAILURE!
java.lang.AssertionError:

Expected: "EXPR$0=1; EXPR$1=a\nEXPR$0=null; EXPR$1=b"
 but: was ""
at 
org.apache.calcite.jdbc.CalciteRemoteDriverTest.testRemoteExecuteQuery(CalciteRemoteDriverTest.java:280)

In conclusion: It's worrying that the suite shows 4 different cracks
under 4 different JDKs. Clearly the load on my machine was making
problems worse, and granted, the problems are just testing problems,
not real bugs. But flaky test suites waste time and effort. There are
indications that the new embedded Cassandra test is more flaky than
most.

Julian


On Sun, Jul 29, 2018 at 11:33 AM, Andrei Sereda  wrote:
> What version of java / OS / maven do you have ? What is your maven command
> ?
>
> Things pass for me on MacOS X java 8,9 and 10.
>
> On Sun, Jul 29, 2018 at 2:05 PM Julian Hyde  wrote:
>
>> I'm seeing the following error when I run the tests on bd0e14002
>> origin/master. Anyone else see it?
>>
>> [INFO] Running org.apache.calcite.test.CassandraAdapterTest
>> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
>> elapsed: 0.001 s <<< FAILURE! - in
>> org.apache.calcite.test.CassandraAdapterTest
>> [ERROR] org.apache.calcite.test.CassandraAdapterTest  Time elapsed:
>> 0.001 s  <<< ERROR!
>> java.lang.ExceptionInInitializerError
>> at
>> org.apache.calcite.test.CassandraAdapterTest.initCassandraIfEnabled(CassandraAdapterTest.java:106)
>> at
>> org.apache.calcite.test.CassandraAdapterTest.(CassandraAdapterTest.java:56)
>> Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end -1,
>> length 5
>> at
>> org.apache.calcite.test.CassandraAdapterTest.initCassandraIfEnabled(CassandraAdapterTest.java:106)
>> at
>> org.apache.calcite.test.CassandraAdapterTest.(CassandraAdapterTest.java:56)
>>


Re: CassandraAdapterTest failure

2018-07-30 Thread Julian Hyde
On balance, I don’t think we should back out CassandraAdapterTest. But we do 
need to continue working to make it more resilient. Any test that generates too 
many false negatives over the long run should be disabled, and this is no 
exception.

And as the other test results show, it’s not the only flaky part of the test 
suite, when the machine is stressed. One thing we can do is to avoid timeouts 
whenever possible.

Julian


> On Jul 30, 2018, at 9:30 AM, Andrei Sereda  wrote:
> 
> Pls check and confirm that the following PR fixes the issue:
> https://github.com/apache/calcite/pull/770
> It addresses build failures due to version parsing.
> 
> Stale folders / files will be addressed in a different PR.
> 
> 
> On Mon, Jul 30, 2018 at 10:10 AM Andrei Sereda  wrote:
> 
>> This must be something specific to Cassandra Unit. Will check
>> 
>> On Mon, Jul 30, 2018, 08:59 Sergey Nuyanzin  wrote:
>> 
>>> There is one more strange thing (at least on Windows): while building a
>>> file with name ".toDelete" is generated under calcite\cassandra and it's
>>> not removed by the end of tests
>>> Is there a way to make cassandra generates these files in target directory
>>> e.g.?
>>> 
>>> 
>>> 
>>> On Mon, Jul 30, 2018 at 3:52 PM, Andrei Sereda  wrote:
>>> 
>>>> Most of the problems are during test init Phase. Most likely with
>>> version
>>>> string (eg. 11-ea for JDK11).
>>>> I'll fix that.
>>>> 
>>>> 
>>>> On Mon, Jul 30, 2018 at 8:16 AM Michael Mior  wrote:
>>>> 
>>>>> I tested myself a fair bit under Ubuntu before pushing this and didn't
>>>> see
>>>>> any of these issues myself. That said, I agree that it's important the
>>>> test
>>>>> suite be stable. I'm fine with reverting for now or (more preferably
>>> IMO)
>>>>> just disabling these tests by default.
>>>>> 
>>>>> --
>>>>> Michael Mior
>>>>> mm...@apache.org
>>>>> 
>>>>> 
>>>>> 
>>>>> Le lun. 30 juil. 2018 à 00:59, Julian Hyde  a
>>> écrit :
>>>>> 
>>>>>> I ran on several JDK versions, all on Ubuntu Linux. The machine was
>>>>>> fairly heavily loaded (I was copying one filesystem to another at
>>> the
>>>>>> time).
>>>>>> 
>>>>>> Here's the error from OpenJDK10:
>>>>>> 
>>>>>> [INFO] Running org.apache.calcite.test.CassandraAdapterTest
>>>>>> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
>>>>>> elapsed: 0.001 s <<< FAILURE! - in
>>>>>> org.apache.calcite.test.CassandraAdapterTest
>>>>>> [ERROR] org.apache.calcite.test.CassandraAdapterTest  Time elapsed:
>>>>>> 0.001 s  <<< ERROR!
>>>>>> java.lang.ExceptionInInitializerError
>>>>>>at
>>>>>> 
>>>>> org.apache.calcite.test.CassandraAdapterTest.initCassandraIfEnabled(
>>>> CassandraAdapterTest.java:106)
>>>>>>at
>>>>>> 
>>>>> org.apache.calcite.test.CassandraAdapterTest.(
>>>> CassandraAdapterTest.java:56)
>>>>>> Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end
>>> -1,
>>>>>> length 5
>>>>>>at
>>>>>> 
>>>>> org.apache.calcite.test.CassandraAdapterTest.initCassandraIfEnabled(
>>>> CassandraAdapterTest.java:106)
>>>>>>at
>>>>>> 
>>>>> org.apache.calcite.test.CassandraAdapterTest.(
>>>> CassandraAdapterTest.java:56)
>>>>>> 
>>>>>> Here's the error in JDK 11:
>>>>>> 
>>>>>> [INFO] Running org.apache.calcite.test.CassandraAdapterTest
>>>>>> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
>>>>>> elapsed: 0 s <<< FAILURE! - in
>>>>>> org.apache.calcite.test.CassandraAdapterTest
>>>>>> [ERROR] org.apache.calcite.test.CassandraAdapterTest  Time elapsed:
>>> 0
>>>>>> s  <<< ERROR!
>>>>>> java.lang.ExceptionInInitializerError
>>>>>>at
>>>>>> 
>>>>> org.apache.calcite.test.CassandraAdapterTest.ena

Re: CassandraAdapterTest failure

2018-07-30 Thread Julian Hyde
I’m running tests on the PR now.

Since commit comments are a soap-box of mine, I will remark that rather than

  [CALCITE-2428] Fix cassandra unit test initialization. (Andrei Sereda)

the commit comment should be

  [CALCITE-2428] Cassandra unit test fails to parse version string (Andrei 
Sereda)

Julian


> On Jul 30, 2018, at 1:13 PM, Michael Mior  wrote:
> 
> Thanks Andrei for digging into this! Since I haven't reproduced this
> failure myself, if someone else could check out the PR, that would be
> great.
> --
> Michael Mior
> mm...@apache.org
> 
> 
> 
> Le lun. 30 juil. 2018 à 13:29, Andrei Sereda  a écrit :
> 
>> Agree, flaky tests are pretty annoying. I'll try to watch more carefully
>> new "embedded data-source" issues (fongo, ES, cassandra). They introduced
>> more "non-determinism" because they now run as part of regular build which
>> means executed much more often than IT.
>> 
>> Last commit was for a more deterministic issue (incorrect parsing of new
>> java version format <http://openjdk.java.net/jeps/223>).
>> 
>> 
>> On Mon, Jul 30, 2018 at 12:58 PM Julian Hyde  wrote:
>> 
>>> On balance, I don’t think we should back out CassandraAdapterTest. But we
>>> do need to continue working to make it more resilient. Any test that
>>> generates too many false negatives over the long run should be disabled,
>>> and this is no exception.
>>> 
>>> And as the other test results show, it’s not the only flaky part of the
>>> test suite, when the machine is stressed. One thing we can do is to avoid
>>> timeouts whenever possible.
>>> 
>>> Julian
>>> 
>>> 
>>>> On Jul 30, 2018, at 9:30 AM, Andrei Sereda  wrote:
>>>> 
>>>> Pls check and confirm that the following PR fixes the issue:
>>>> https://github.com/apache/calcite/pull/770
>>>> It addresses build failures due to version parsing.
>>>> 
>>>> Stale folders / files will be addressed in a different PR.
>>>> 
>>>> 
>>>> On Mon, Jul 30, 2018 at 10:10 AM Andrei Sereda 
>> wrote:
>>>> 
>>>>> This must be something specific to Cassandra Unit. Will check
>>>>> 
>>>>> On Mon, Jul 30, 2018, 08:59 Sergey Nuyanzin 
>>> wrote:
>>>>> 
>>>>>> There is one more strange thing (at least on Windows): while
>> building a
>>>>>> file with name ".toDelete" is generated under calcite\cassandra and
>>> it's
>>>>>> not removed by the end of tests
>>>>>> Is there a way to make cassandra generates these files in target
>>> directory
>>>>>> e.g.?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 30, 2018 at 3:52 PM, Andrei Sereda 
>>> wrote:
>>>>>> 
>>>>>>> Most of the problems are during test init Phase. Most likely with
>>>>>> version
>>>>>>> string (eg. 11-ea for JDK11).
>>>>>>> I'll fix that.
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Jul 30, 2018 at 8:16 AM Michael Mior 
>>> wrote:
>>>>>>> 
>>>>>>>> I tested myself a fair bit under Ubuntu before pushing this and
>>> didn't
>>>>>>> see
>>>>>>>> any of these issues myself. That said, I agree that it's important
>>> the
>>>>>>> test
>>>>>>>> suite be stable. I'm fine with reverting for now or (more
>> preferably
>>>>>> IMO)
>>>>>>>> just disabling these tests by default.
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Michael Mior
>>>>>>>> mm...@apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Le lun. 30 juil. 2018 à 00:59, Julian Hyde  a
>>>>>> écrit :
>>>>>>>> 
>>>>>>>>> I ran on several JDK versions, all on Ubuntu Linux. The machine
>> was
>>>>>>>>> fairly heavily loaded (I was copying one filesystem to another at
>>>>>> the
>>>>>>>>> time).
>>>>>>>>> 
>>>>>>>>> Here's the error from OpenJDK10:
>>>>>>&

Re: Review request [CALCITE-2404]

2018-07-30 Thread Julian Hyde
I have reviewed. See my comments in the jira case.

> On Jul 26, 2018, at 11:22 PM, Stamatis Zampetakis  wrote:
> 
> Hello,
> 
> Can somebody have a look at
> [CALCITE-2404] Accessing structured-types is not implemented by the runtime
> https://issues.apache.org/jira/browse/CALCITE-2404
> 
> The pull request is here:
> https://github.com/apache/calcite/pull/762
> 
> Thanks,
> Stamatis



Re: MATCH_RECOGNIZE

2018-07-31 Thread Julian Hyde
I’m delighted that Flink is getting full SQL support for MATCH_RECOGNIZE.

Sounds like it might be challenging to share the implementation, but could we 
perhaps share the test suite? (I.e. a set of SQL queries and their expected 
results.)

I added a simple test in 
https://github.com/julianhyde/calcite/commit/ee460847643ec17544f310088affd99be4028bb6
 
<https://github.com/julianhyde/calcite/commit/ee460847643ec17544f310088affd99be4028bb6>
 that could be extended.

Julian
 

> On Jul 31, 2018, at 8:07 AM, Fabian Hueske  wrote:
> 
> Hi everyone,
> 
> I'd like to share the plans for MATCH_RECOGNIZE support in Flink.
> 
> Flink features a so-called CEP library for quite some time [1]. The CEP
> features is a popular feature and frequently used.
> In a nutshell, the library provides a domain-specific API to define event
> patterns. The patterns are translated into a state machine and evaluated in
> a streaming program.
> 
> Even before, we learned about about MATCH_RECOGNIZE, Till (another Flink
> committer) and I gave a few talks about unifying SQL and CEP [2].
> Hence, we were quite excited when we learned about MATCH_RECOGNIZE and even
> more when it was added to Calcite.
> Shortly after that, we got a PR [3] which translated the parsed
> MATCH_RECOGNIZE clause into patterns of our CEP library.
> However, we never really got to the point of merging that contribution,
> mainly because there were some inconsistencies in the semantics of
> MATCH_RECOGNIZE and Flink's CEP library.
> 
> Recently, a Flink committers picked up this feature again, validated the
> the semantics, and made a few corrections [4].
> The CEP library is now ready to support a subset of the MATCH_RECOGNIZE
> features.
> Unfortunately, MATCH_RECOGNIZE support won't make it into the upcoming
> 1.6.0 release, but the plans are to add it for the 1.7.0 release.
> 
> Regarding the idea of sharing parts of the evaluation logic.
> Flink has runtime support for a subset of the MATCH_RECOGNIZE clause.
> Unfortunately, I am not familiar with the internals of Flink's CEP library
> and don't know how portable it is.
> 
> Best, Fabian
> 
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/cep.html 
> <https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/cep.html>
> [2]
> https://www.slideshare.net/tillrohrmann/streaming-analytics-cep-two-sides-of-the-same-coin
>  
> <https://www.slideshare.net/tillrohrmann/streaming-analytics-cep-two-sides-of-the-same-coin>
> [3] https://github.com/apache/flink/pull/4502 
> <https://github.com/apache/flink/pull/4502>
> [4] https://issues.apache.org/jira/browse/FLINK-9593 
> <https://issues.apache.org/jira/browse/FLINK-9593>
> 
> 2018-07-23 21:03 GMT+02:00 Sergey Nuyanzin  <mailto:snuyan...@gmail.com>>:
> 
>> looks exciting.
>> If it is possible I would like to take a part of it however I'm not sure
>> about this week (I could since August)
>> 
>> On Mon, Jul 23, 2018 at 9:10 PM, Michael Mior > <mailto:mm...@apache.org>> wrote:
>> 
>>> This does sound like my idea of fun, but unfortunately I won't have
>>> the time to contribute in the near future. I'll keep this on my radar
>>> though. I also shared this message with all the students in our
>>> research group and I wouldn't be surprised if there was someone
>>> willing to jump in. Thanks for keeping this moving Julian!
>>> 
>>> --
>>> Michael Mior
>>> mm...@apache.org <mailto:mm...@apache.org>
>>> Le lun. 23 juil. 2018 à 13:54, Julian Hyde >> <mailto:jh...@apache.org>> a écrit :
>>>> 
>>>> For quite a while we have had partial support for MATCH_RECOGNIZE. We
>>> support it in the parser and validator, but there is no runtime
>>> implementation. It’s a shame, because MATCH_RECOGNIZE is an incredibly
>>> powerful SQL feature for both traditional SQL (it’s in Oracle 12c) and
>> for
>>> continuous query (aka complex event processing - CEP).
>>>> 
>>>> I figure it’s time to change that. My plan is to implement it
>>> incrementally, getting simple queries working to start with, then allow
>>> people to add more complex queries.
>>>> 
>>>> In a dev branch [1], I’ve added a method Enumerables.match[2]. The idea
>>> is that if you supply an Enumerable of input data, a finite state machine
>>> to figure out when a sequence of rows makes a match (represented by a
>>> transition function: (state, row) -> state), and a function to convert a
>>> matched set of rows to a set of output ro

Re: CassandraAdapterTest failure

2018-08-01 Thread Julian Hyde
The test is failing every time for me on JDK 10.  The command “mvn -Pit clean 
test” will probably reproduce it for most people.

Can other folks please try to reproduce this? I’m getting close to saying that 
we should back this change out even though apparently only I can reproduce the 
failure.

Julian

 




> On Jul 30, 2018, at 1:35 PM, Andrei Sereda  wrote:
> 
> Julian, I have amended the commit message in f0b00f0c
> 
> 
> On Mon, Jul 30, 2018 at 4:22 PM Julian Hyde  wrote:
> 
>> I’m running tests on the PR now.
>> 
>> Since commit comments are a soap-box of mine, I will remark that rather
>> than
>> 
>>  [CALCITE-2428] Fix cassandra unit test initialization. (Andrei Sereda)
>> 
>> the commit comment should be
>> 
>>  [CALCITE-2428] Cassandra unit test fails to parse version string (Andrei
>> Sereda)
>> 
>> Julian
>> 
>> 
>>> On Jul 30, 2018, at 1:13 PM, Michael Mior  wrote:
>>> 
>>> Thanks Andrei for digging into this! Since I haven't reproduced this
>>> failure myself, if someone else could check out the PR, that would be
>>> great.
>>> --
>>> Michael Mior
>>> mm...@apache.org
>>> 
>>> 
>>> 
>>> Le lun. 30 juil. 2018 à 13:29, Andrei Sereda  a écrit
>> :
>>> 
>>>> Agree, flaky tests are pretty annoying. I'll try to watch more carefully
>>>> new "embedded data-source" issues (fongo, ES, cassandra). They
>> introduced
>>>> more "non-determinism" because they now run as part of regular build
>> which
>>>> means executed much more often than IT.
>>>> 
>>>> Last commit was for a more deterministic issue (incorrect parsing of new
>>>> java version format <http://openjdk.java.net/jeps/223>).
>>>> 
>>>> 
>>>> On Mon, Jul 30, 2018 at 12:58 PM Julian Hyde  wrote:
>>>> 
>>>>> On balance, I don’t think we should back out CassandraAdapterTest. But
>> we
>>>>> do need to continue working to make it more resilient. Any test that
>>>>> generates too many false negatives over the long run should be
>> disabled,
>>>>> and this is no exception.
>>>>> 
>>>>> And as the other test results show, it’s not the only flaky part of the
>>>>> test suite, when the machine is stressed. One thing we can do is to
>> avoid
>>>>> timeouts whenever possible.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>>> On Jul 30, 2018, at 9:30 AM, Andrei Sereda  wrote:
>>>>>> 
>>>>>> Pls check and confirm that the following PR fixes the issue:
>>>>>> https://github.com/apache/calcite/pull/770
>>>>>> It addresses build failures due to version parsing.
>>>>>> 
>>>>>> Stale folders / files will be addressed in a different PR.
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 30, 2018 at 10:10 AM Andrei Sereda 
>>>> wrote:
>>>>>> 
>>>>>>> This must be something specific to Cassandra Unit. Will check
>>>>>>> 
>>>>>>> On Mon, Jul 30, 2018, 08:59 Sergey Nuyanzin 
>>>>> wrote:
>>>>>>> 
>>>>>>>> There is one more strange thing (at least on Windows): while
>>>> building a
>>>>>>>> file with name ".toDelete" is generated under calcite\cassandra and
>>>>> it's
>>>>>>>> not removed by the end of tests
>>>>>>>> Is there a way to make cassandra generates these files in target
>>>>> directory
>>>>>>>> e.g.?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Jul 30, 2018 at 3:52 PM, Andrei Sereda 
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Most of the problems are during test init Phase. Most likely with
>>>>>>>> version
>>>>>>>>> string (eg. 11-ea for JDK11).
>>>>>>>>> I'll fix that.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Mon, Jul 30, 2018 at 8:16 AM Michael Mior 
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I tested myself a fa

Re: CassandraAdapterTest failure

2018-08-01 Thread Julian Hyde
What about on JDK 10? Here’s my java version:

$ java -version
java version "10" 2018-03-20
Java(TM) SE Runtime Environment 18.3 (build 10+46)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10+46, mixed mode)


> On Aug 1, 2018, at 12:58 PM, Igor Kryvenko  wrote:
> 
> "mvn -Pit clean test" works fine for me.
> Ubuntu 18.04
> java version "1.8.0_171"
> 
> Kind regards
> Igor Kryvenko
> 
> 
> On Wed, 1 Aug 2018 at 22:34, Julian Hyde  wrote:
> 
>> The test is failing every time for me on JDK 10.  The command “mvn -Pit
>> clean test” will probably reproduce it for most people.
>> 
>> Can other folks please try to reproduce this? I’m getting close to saying
>> that we should back this change out even though apparently only I can
>> reproduce the failure.
>> 
>> Julian
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Jul 30, 2018, at 1:35 PM, Andrei Sereda  wrote:
>>> 
>>> Julian, I have amended the commit message in f0b00f0c
>>> 
>>> 
>>> On Mon, Jul 30, 2018 at 4:22 PM Julian Hyde  wrote:
>>> 
>>>> I’m running tests on the PR now.
>>>> 
>>>> Since commit comments are a soap-box of mine, I will remark that rather
>>>> than
>>>> 
>>>> [CALCITE-2428] Fix cassandra unit test initialization. (Andrei Sereda)
>>>> 
>>>> the commit comment should be
>>>> 
>>>> [CALCITE-2428] Cassandra unit test fails to parse version string
>> (Andrei
>>>> Sereda)
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Jul 30, 2018, at 1:13 PM, Michael Mior  wrote:
>>>>> 
>>>>> Thanks Andrei for digging into this! Since I haven't reproduced this
>>>>> failure myself, if someone else could check out the PR, that would be
>>>>> great.
>>>>> --
>>>>> Michael Mior
>>>>> mm...@apache.org
>>>>> 
>>>>> 
>>>>> 
>>>>> Le lun. 30 juil. 2018 à 13:29, Andrei Sereda  a
>> écrit
>>>> :
>>>>> 
>>>>>> Agree, flaky tests are pretty annoying. I'll try to watch more
>> carefully
>>>>>> new "embedded data-source" issues (fongo, ES, cassandra). They
>>>> introduced
>>>>>> more "non-determinism" because they now run as part of regular build
>>>> which
>>>>>> means executed much more often than IT.
>>>>>> 
>>>>>> Last commit was for a more deterministic issue (incorrect parsing of
>> new
>>>>>> java version format <http://openjdk.java.net/jeps/223>).
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 30, 2018 at 12:58 PM Julian Hyde 
>> wrote:
>>>>>> 
>>>>>>> On balance, I don’t think we should back out CassandraAdapterTest.
>> But
>>>> we
>>>>>>> do need to continue working to make it more resilient. Any test that
>>>>>>> generates too many false negatives over the long run should be
>>>> disabled,
>>>>>>> and this is no exception.
>>>>>>> 
>>>>>>> And as the other test results show, it’s not the only flaky part of
>> the
>>>>>>> test suite, when the machine is stressed. One thing we can do is to
>>>> avoid
>>>>>>> timeouts whenever possible.
>>>>>>> 
>>>>>>> Julian
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jul 30, 2018, at 9:30 AM, Andrei Sereda 
>> wrote:
>>>>>>>> 
>>>>>>>> Pls check and confirm that the following PR fixes the issue:
>>>>>>>> https://github.com/apache/calcite/pull/770
>>>>>>>> It addresses build failures due to version parsing.
>>>>>>>> 
>>>>>>>> Stale folders / files will be addressed in a different PR.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Jul 30, 2018 at 10:10 AM Andrei Sereda 
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> This must be something specific to Cassandra Unit. Will check
>>>>>>>>> 
>>>>>>>>> On Mon, Jul 30, 2018, 08:59 Sergey Nuyanzin 
>>&g

Re: MATCH_RECOGNIZE

2018-08-02 Thread Julian Hyde
Quidem can run on top of any JDBC data source (you just need to invoke with a 
connection factory by implementing a simple SPI). But it requires queries to 
terminate (i.e. can’t handle streaming queries). So, if Flink SQL is were able 
to run queries on an EMP table, then I think it could be tested using Quidem. 

> On Aug 2, 2018, at 6:27 AM, Fabian Hueske  wrote:
> 
> Hi Julian,
> 
> It would be great to use the same test suite.
> 
> We have quite a few tests in Flink but they are not super well organized.
> I would love to have more structure for at least some of the tests.
> 
> I had a quick look at how Calcite runs its Quidem tests.
> Not sure if this is a format that we could easily adopt to, but maybe its
> possible to put a test data set, queries, and results in a more portable
> format.
> 
> Best, Fabian
> 
> 
> 
> 
> 
> 2018-07-31 19:54 GMT+02:00 Julian Hyde :
> 
>> I’m delighted that Flink is getting full SQL support for MATCH_RECOGNIZE.
>> 
>> Sounds like it might be challenging to share the implementation, but could
>> we perhaps share the test suite? (I.e. a set of SQL queries and their
>> expected results.)
>> 
>> I added a simple test in https://github.com/julianhyde/calcite/commit/
>> ee460847643ec17544f310088affd99be4028bb6 <https://github.com/
>> julianhyde/calcite/commit/ee460847643ec17544f310088affd99be4028bb6> that
>> could be extended.
>> 
>> Julian
>> 
>> 
>>> On Jul 31, 2018, at 8:07 AM, Fabian Hueske  wrote:
>>> 
>>> Hi everyone,
>>> 
>>> I'd like to share the plans for MATCH_RECOGNIZE support in Flink.
>>> 
>>> Flink features a so-called CEP library for quite some time [1]. The CEP
>>> features is a popular feature and frequently used.
>>> In a nutshell, the library provides a domain-specific API to define event
>>> patterns. The patterns are translated into a state machine and evaluated
>> in
>>> a streaming program.
>>> 
>>> Even before, we learned about about MATCH_RECOGNIZE, Till (another Flink
>>> committer) and I gave a few talks about unifying SQL and CEP [2].
>>> Hence, we were quite excited when we learned about MATCH_RECOGNIZE and
>> even
>>> more when it was added to Calcite.
>>> Shortly after that, we got a PR [3] which translated the parsed
>>> MATCH_RECOGNIZE clause into patterns of our CEP library.
>>> However, we never really got to the point of merging that contribution,
>>> mainly because there were some inconsistencies in the semantics of
>>> MATCH_RECOGNIZE and Flink's CEP library.
>>> 
>>> Recently, a Flink committers picked up this feature again, validated the
>>> the semantics, and made a few corrections [4].
>>> The CEP library is now ready to support a subset of the MATCH_RECOGNIZE
>>> features.
>>> Unfortunately, MATCH_RECOGNIZE support won't make it into the upcoming
>>> 1.6.0 release, but the plans are to add it for the 1.7.0 release.
>>> 
>>> Regarding the idea of sharing parts of the evaluation logic.
>>> Flink has runtime support for a subset of the MATCH_RECOGNIZE clause.
>>> Unfortunately, I am not familiar with the internals of Flink's CEP
>> library
>>> and don't know how portable it is.
>>> 
>>> Best, Fabian
>>> 
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-
>> release-1.5/dev/libs/cep.html <https://ci.apache.org/
>> projects/flink/flink-docs-release-1.5/dev/libs/cep.html>
>>> [2]
>>> https://www.slideshare.net/tillrohrmann/streaming-
>> analytics-cep-two-sides-of-the-same-coin <https://www.slideshare.net/
>> tillrohrmann/streaming-analytics-cep-two-sides-of-the-same-coin>
>>> [3] https://github.com/apache/flink/pull/4502 <
>> https://github.com/apache/flink/pull/4502>
>>> [4] https://issues.apache.org/jira/browse/FLINK-9593 <
>> https://issues.apache.org/jira/browse/FLINK-9593>
>>> 
>>> 2018-07-23 21:03 GMT+02:00 Sergey Nuyanzin > snuyan...@gmail.com>>:
>>> 
>>>> looks exciting.
>>>> If it is possible I would like to take a part of it however I'm not sure
>>>> about this week (I could since August)
>>>> 
>>>> On Mon, Jul 23, 2018 at 9:10 PM, Michael Mior > <mailto:mm...@apache.org>> wrote:
>>>> 
>>>>> This does sound like my idea of fun, but unfortunately I won't have
>>>>> the time to contribute in the near future. I'll keep this on my 

Re: problem with lateral join

2018-08-02 Thread Julian Hyde
Sounds like an interesting bug in view expansion. Can you log a bug please. If 
possible create a test case in ServerTest (since I presume that it needs CREATE 
VIEW followed by a query).

Julian


> On Aug 2, 2018, at 4:50 AM, ptr.bo...@gmail.com wrote:
> 
> Hello,
> 
> I struggling with a strange case. Following query works for me:
> SELECT *
> FROM CORE.FILTERS F
> CROSS JOIN LATERAL TABLE(AUX.TBLFUNCTION('somestring, F.aCOLLUMN)) tblfn
> 
> But when it is placed as a view under a schema EXAMPLES with name
> EXAMPLECOLLATERAL, following query won't work
> SELECT * FROM EXAMPLES.EXAMPLECOLLATERAL
> 
> It produces:
> 
>> org.apache.calcite.runtime.CalciteContextException: At line 3, column 51:
>> Table 'F' not found
>>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>  at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>>  at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>>  at
>> org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:463)
>>  at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:783)
>>  at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:768)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:4779)
>>  at
>> org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:330)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5527)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5490)
>>  at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:334)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1627)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveType(SqlValidatorImpl.java:1612)
>>  at
>> org.apache.calcite.sql.SqlOperator.constructArgTypeList(SqlOperator.java:584)
>>  at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:233)
>>  at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:215)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5503)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:5490)
>>  at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:138)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1627)
>>  at
>> org.apache.calcite.sql.validate.ProcedureNamespace.validateImpl(ProcedureNamespace.java:53)
>>  at
>> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:965)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:944)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3027)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3012)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.validateJoin(SqlValidatorImpl.java:3064)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3021)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3271)
>>  at
>> org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
>>  at
>> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:965)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:944)
>>  at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:225)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:919)
>>  at
>> org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:629)
>>  at
>> org.apache.calcite.prepare.CalcitePrepareImpl.parse_(CalcitePrepareImpl.java:292)
> 
> 
> I am willing to patch this if this is a bug - just need a point to the
> right direction.
> 
> Thanx!
> -- 
> Piotr Bojko
> http://about.me/ptr.bojko



Re: RelBuilder API on the top of a view

2018-08-02 Thread Julian Hyde
There’s no reason why it shouldn’t work, except that it hasn’t been tested. It 
looks as if the implementation is trying to do the right thing — parse the 
view, expand it to relational algebra, and push that algebra onto RelBuilder’s 
stack — but it fails because the context is a dumb implementation that doesn’t 
know how to expand views.

Can you log a bug, please? I’m not sure whether the test case would be in 
RelBuilderTest (using a schema that contains a view) or ServerTest (where you 
could execute CREATE VIEW and then start a RelBuilder on the connection).

Julian




> On Aug 2, 2018, at 2:57 PM, Andrei Sereda  wrote:
> 
> Hello,
> 
> I was wondering if one can use relational algebra
>  API on the top of existing
> view (or only tables are supported) ? Below is an example which fails for
> me :
> 
> // suppose one creates a view as follows
> CREATE VIEW view AS select * from elastic.docs;
> 
> @Test
> public void sql() {
>  // works when using SQL
>  CalciteAssert.that()
>  .with(newConnectionFactory())
>  .query("select * from view")
>  .returnsCount(1);
> }
> 
> 
> /**
> * Example when querying calcite view using {@link RelBuilder} api fails.
> * It is working fine when using SQL or when using RelBuilder on a
> table (not view).
> */
> @Test
> public void relBuilder() throws Exception {
>  Connection connection = newConnectionFactory().createConnection();
>  SchemaPlus root = connection.unwrap(CalciteConnection.class).getRootSchema();
>  FrameworkConfig config =
> Frameworks.newConfigBuilder().defaultSchema(root).build();
>  // querying a view using RelBuilder fails
>  RelBuilder builder = RelBuilder.create(config).scan("VIEW");
>  // querying directly a table works
>  // RelBuilder builder = RelBuilder.create(config).scan("elastic", "docs");
> 
>  int count = 0;
>  try (PreparedStatement stm =
> connection.unwrap(RelRunner.class).prepare(builder.build());
>   ResultSet rset = stm.executeQuery() ) {
>while (rset.next()) {
>  count++;
>}
>  }
> 
>  assertEquals(1, count);
> }
> 
> Exception
> 
> Caused by: java.lang.UnsupportedOperationException
>   at org.apache.calcite.plan.RelOptUtil$4.expandView(RelOptUtil.java:2805)
>   at 
> org.apache.calcite.schema.impl.ViewTable.expandView(ViewTable.java:124)
>   ... 39 more
> 
> 
> Anything I'm doing wrong ?
> 
> Many thanks,
> Andrei.



Sqlline release

2018-08-06 Thread Julian Hyde
(Forgive the cross-posting. The sqlline dev list isn’t very active, and many of 
the sqlline community are in the calcite community. Please reply to calcite dev 
only.)

There have been a number of enhancements to sqlline recently[1] (thanks, 
Sergey!). Is it time for a release of sqlline? Or should we plan to have a 
release in say a month, to give people time to add more features.

Julian

[1] https://github.com/julianhyde/sqlline/commits/master 


Re: MATCH_RECOGNIZE

2018-08-06 Thread Julian Hyde
If a JDBC driver is a problem, it shouldn’t be hard to mock a connection that 
can create a statement that can describe itself and execute a query. Quidem 
makes light use of JDBC.

> On Aug 6, 2018, at 10:33 AM, Fabian Hueske  wrote:
> 
> OK, I see.
> Flink doesn't have support for JDBC yet.
> Would need to look into that.
> 
> 2018-08-02 21:35 GMT+02:00 Julian Hyde :
> 
>> Quidem can run on top of any JDBC data source (you just need to invoke
>> with a connection factory by implementing a simple SPI). But it requires
>> queries to terminate (i.e. can’t handle streaming queries). So, if Flink
>> SQL is were able to run queries on an EMP table, then I think it could be
>> tested using Quidem.
>> 
>>> On Aug 2, 2018, at 6:27 AM, Fabian Hueske  wrote:
>>> 
>>> Hi Julian,
>>> 
>>> It would be great to use the same test suite.
>>> 
>>> We have quite a few tests in Flink but they are not super well organized.
>>> I would love to have more structure for at least some of the tests.
>>> 
>>> I had a quick look at how Calcite runs its Quidem tests.
>>> Not sure if this is a format that we could easily adopt to, but maybe its
>>> possible to put a test data set, queries, and results in a more portable
>>> format.
>>> 
>>> Best, Fabian
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 2018-07-31 19:54 GMT+02:00 Julian Hyde :
>>> 
>>>> I’m delighted that Flink is getting full SQL support for
>> MATCH_RECOGNIZE.
>>>> 
>>>> Sounds like it might be challenging to share the implementation, but
>> could
>>>> we perhaps share the test suite? (I.e. a set of SQL queries and their
>>>> expected results.)
>>>> 
>>>> I added a simple test in https://github.com/julianhyde/calcite/commit/
>>>> ee460847643ec17544f310088affd99be4028bb6 <https://github.com/
>>>> julianhyde/calcite/commit/ee460847643ec17544f310088affd99be4028bb6>
>> that
>>>> could be extended.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Jul 31, 2018, at 8:07 AM, Fabian Hueske  wrote:
>>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> I'd like to share the plans for MATCH_RECOGNIZE support in Flink.
>>>>> 
>>>>> Flink features a so-called CEP library for quite some time [1]. The CEP
>>>>> features is a popular feature and frequently used.
>>>>> In a nutshell, the library provides a domain-specific API to define
>> event
>>>>> patterns. The patterns are translated into a state machine and
>> evaluated
>>>> in
>>>>> a streaming program.
>>>>> 
>>>>> Even before, we learned about about MATCH_RECOGNIZE, Till (another
>> Flink
>>>>> committer) and I gave a few talks about unifying SQL and CEP [2].
>>>>> Hence, we were quite excited when we learned about MATCH_RECOGNIZE and
>>>> even
>>>>> more when it was added to Calcite.
>>>>> Shortly after that, we got a PR [3] which translated the parsed
>>>>> MATCH_RECOGNIZE clause into patterns of our CEP library.
>>>>> However, we never really got to the point of merging that contribution,
>>>>> mainly because there were some inconsistencies in the semantics of
>>>>> MATCH_RECOGNIZE and Flink's CEP library.
>>>>> 
>>>>> Recently, a Flink committers picked up this feature again, validated
>> the
>>>>> the semantics, and made a few corrections [4].
>>>>> The CEP library is now ready to support a subset of the MATCH_RECOGNIZE
>>>>> features.
>>>>> Unfortunately, MATCH_RECOGNIZE support won't make it into the upcoming
>>>>> 1.6.0 release, but the plans are to add it for the 1.7.0 release.
>>>>> 
>>>>> Regarding the idea of sharing parts of the evaluation logic.
>>>>> Flink has runtime support for a subset of the MATCH_RECOGNIZE clause.
>>>>> Unfortunately, I am not familiar with the internals of Flink's CEP
>>>> library
>>>>> and don't know how portable it is.
>>>>> 
>>>>> Best, Fabian
>>>>> 
>>>>> [1]
>>>>> https://ci.apache.org/projects/flink/flink-docs-
>>>> release-1.5/dev/libs/cep.html <https://ci.apache.org/
>>>> projects/flink/flink-docs-release-1.5

Re: SQL Query Set Analyzer

2018-08-06 Thread Julian Hyde
It’s hard to automatically recommend a set of MVs from past queries. The design 
space is just too large. But if you are designing MVs for interactive BI, you 
can use the “lattice” model. This works because many queries will be 
filter-join-aggregate queries on a star schema (i.e. a central fact table and 
dimension tables joined over many-to-one relationships). (Or perhaps a join 
between two or more such queries.) 

Do the queries you are trying to optimize have that pattern?

If so, you might start by creating a lattice for each such star schema. Then 
the lattice can suggest MVs that are summary tables.

(Lattice suggester is one step more meta - it recommends lattices - but given 
where you are, I would suggest hand-writing one or two lattices.)

Calcite is a framework, and this unfortunately means that you have to write 
Java code to use these features. It might be easier if you use the new “server” 
module, which supports CREATE MATERIALIZED VIEW as a DDL statement. Then you 
can create some demos for your colleagues that are wholly or mostly SQL.

The simplest way to populate a materialized view is the CREATE MATERIALIZED 
VIEW statement. It basically does the same as CREATE TABLE AS SELECT (executes 
a query, stores the results in a table) but it leaves behind the metadata about 
where that data came from.

Materialized views can in principle be maintained incrementally, but how you do 
it depends upon what changes are allowed (append only? Replace rows and write 
the old rows to an audit table?). We’ve not done a lot of work on it. I believe 
the Hive folks have given this more thought than I have.

Julian


> On Aug 3, 2018, at 11:11 PM, James Taylor  wrote:
> 
> Both the Lattice Suggestor and Quark sound like what I need for an
> automated solution, but I have some more basic follow up questions first.
> Here's our basic use case (very similar to Zheng Shao's, I believe):
> - Our company has stood up Presto for data analysts
> - Nightly ETL jobs populate Hive tables with lots of data
> - Analysts run adhoc queries over data using Presto
> - The top CPU using queries are pretty complex (2-3 pages of complex SQL,
> lots of joins and aggregation)
> 
> There are some basic/obvious stuff that can be done manually first:
> - Provide better visibility into which queries are expensive
> - Ask query owners to produce their own materialized views and manually
> change their queries to use them (I believe there's some amount of this
> already)
> 
> Then there's kind of a middle ground:
> - Ask query owners to identify what they think are the top few materialized
> views to build
> - Manually build these materialized views in the daily ETL job.
> - Use Calcite to rewrite the query to use the materialized views. Can
> Calcite do this and would it be a problem if the queries are Presto
> queries? I'd need to make sure I provided Calcite with the cost information
> it needs, right?
> - Dark launch to test that the rewritten query returns the same results as
> the original query (and measure the perf improvement)
> 
> But the more interesting stuff is:
> - Automatically identifying the materialized views that should be built.
> Sounds like both the Lattice Suggestor and Quark are potentially a good
> fit. I'm not clear on what is output by the Suggestor. Would it spit out a
> CREATE VIEW statement (or could what it outputs produce that)? How does the
> Suggestor compare with Quark?
> - Automatically build the materialized views. Would the Lattice framework
> or Quark help me with that? Would it be possible to incrementally build the
> materialized views or would it be necessary to build the materialized views
> from the beginning of time again and again (clearly not feasible given the
> size of the tables)? Maybe it depends on the aggregation functions that are
> used?
> - And the nirvana is a kind of feedback loop - based on the top N expensive
> queries, identify and build the materialized views, use them transparently
> during querying, and then retire them when they're infrequently used.
> 
> Would it be a better choice to build the materialized views as Druid
> tables? That'd require a Druid connector to Presto, though. This reminds me
> of the work you already did, Julian, with Hive+Druid (i.e. CALCITE-1731)
> but for Presto instead of Hive. Do you think any of that would transfer
> over in some way?
> 
> WDYT? Huge amount of work? Any advice is much appreciated.
> 
> Thanks,
> James
> 
> On Thu, Jul 26, 2018 at 11:29 AM, Julian Hyde  <mailto:jh...@apache.org>> wrote:
> 
>> PS
>> 
>> +1 for Babel.
>> 
>> If you are analyzing a set of queries, it is very likely that these
>> queries were written to be executed against another database. Babel aims to

Re: SQL Query Set Analyzer

2018-08-06 Thread Julian Hyde
Regarding the SCOPE paper you reference. That was on my mind too (I went to the 
talk at SIGMOD). A materialized view is created only if the same query is used 
*textually identically* in different parts of the ETL process, so it is mainly 
for optimizing batch jobs that are largely the same night after night. Lattices 
are a better approach for optimizing interactive BI work-loads.

Julian


> On Aug 6, 2018, at 4:57 PM, Jesus Camacho Rodriguez 
>  wrote:
> 
> You can find an overview of the work that has been done in Hive for
> materialized view integration in the following link:
> https://cwiki.apache.org/confluence/display/Hive/Materialized+views 
> <https://cwiki.apache.org/confluence/display/Hive/Materialized+views>
> Materialized views can be stored in external tables such as Druid-backed
> tables too. Druid rules that in Calcite are used to push computation
> to Druid from Hive.
> 
> The rewriting algorithm itself is in Calcite. The algorithm can take advantage
> of constraints (PK-FK relationship between tables) to produce additional
> correct rewritings, can execute rollups, etc. However, it does not assume any
> specific schema layout, which may make it useful for multiple ETL workloads.
> http://calcite.apache.org/docs/materialized_views#rewriting-using-plan-structural-information
>  
> <http://calcite.apache.org/docs/materialized_views#rewriting-using-plan-structural-information>
> The most recent addition is the support for partitioned materialized views,
> including the extension in the cost model to take into account partition 
> pruning
> during the planning phase.
> 
> Incremental maintenance is supported. Most of that code lives in Hive, but it 
> relies
> on the rewriting algorithm too. It only works for materialized views that use 
> Hive
> transactional tables, either full ACID or insert-only. Basically Hive exposes 
> explicitly
> the data contained in the materialization via filter condition, e.g., mv1 
> contains data
> for transactions (x, y, z), then let the rewriting algorithm trigger a 
> partial rewriting
> which reads new contents from the sources tables and processed contents from 
> mv1.
> Finally, an additional step transforms the rewritten expression into an 
> INSERT or
> MERGE statement depending on the materialized view expression (MERGE for
> materialized views containing aggregations). Since not all tables in Hive 
> support
> UPDATE needed for MERGE, we were thinking about allowing some target 
> materialized
> views with definitions that include aggregates to use INSERT and then force 
> the rollup
> at runtime, e.g., for Druid.
> bq. Maybe it depends on the aggregation functions that are used?
> The result of some aggregate functions cannot be (always) incrementally 
> maintained in
> the presence of UPDATE/DELETE operations on source tables, e.g., min and max, 
> though
> some rewriting to minimize full rebuilds can be used if count is added as an 
> additional
> column to the materialized view. Incremental maintenance in presence of 
> UPDATE/DELETE
> operations in source tables is not supported in Hive yet, hence this is not 
> implemented.
> 
> 
> I would like to think that of the problems described below, we are getting to 
> the
> 'more interesting stuff' in the Hive project, though there is some 
> consolidation needed for
> existing work too. That is why we are also interested in any effort related 
> to materializations
> recommendation. I believe the most powerful abstraction to use would be 
> RelNode, which
> can be useful for any system representing its queries internally using that 
> representation,
> instead of relying on SQL nodes which are more closely tight to the parser.
> 
> Concerning the ´feedback loop´, this recent paper by MSFT describes a system 
> that does
> something similar to what James was describing (for SCOPE):
> https://www.microsoft.com/en-us/research/uploads/prod/2018/03/cloudviews-sigmod2018.pdf
>  
> <https://www.microsoft.com/en-us/research/uploads/prod/2018/03/cloudviews-sigmod2018.pdf>
> 
> -Jesús
> 
> 
> 
> On 8/6/18, 3:32 PM, "Julian Hyde"  <mailto:jh...@apache.org>> wrote:
> 
>It’s hard to automatically recommend a set of MVs from past queries. The 
> design space is just too large. But if you are designing MVs for interactive 
> BI, you can use the “lattice” model. This works because many queries will be 
> filter-join-aggregate queries on a star schema (i.e. a central fact table and 
> dimension tables joined over many-to-one relationships). (Or perhaps a join 
> between two or more such queries.) 
> 
>Do the queries you are trying to optimize have that pattern?
> 
>If so, you might

Re-ordering AND clauses

2018-08-08 Thread Julian Hyde
We have never really made our policy clear about whether it is valid for the 
planner to re-order the clause of an AND. I would like to have a discussion 
about that policy (also see https://issues.apache.org/jira/browse/CALCITE-2450 
).  I propose that we can 
re-order the clauses of AND (and OR) but not CASE.

Consider the query Q1:

  SELECT *
  FROM t
  WHERE x > 0 AND y / x < 5

And the similar query Q2 that re-orders the AND clause:

  SELECT *
  FROM t
  WHERE y / x < 5 AND x > 0

If one of the rows has a x = 0, we would expect Q2 to throw a divide-by-zero 
error. Is it allowed for Q1 to throw? Is it allowed for it NOT to throw?

We recognized that sometimes people want to write SQL to guard against bad 
values (like x = 0 above), and so we tacitly assumed that we would not re-order 
AND. Thus in current Calcite, Q1 would never throw, and Q2 would always throw.

I think that was a mistake. It ties our hands too much (we are not able to move 
highly selective predicates to the front of the list, for instance) and it is 
inconsistent with SQL semantics.

There is a way to achieve the “guarding” behavior: use CASE (whose clauses 
cannot be re-ordered), in Q3 as follows:

  SELECT *
  FROM t
  WHERE CASE WHEN x > 0 THEN y / x < 5 ELSE FALSE END

Julian



Re: Sqlline release

2018-08-09 Thread Julian Hyde
Let’s wait about a month then. Target the first week in September. 

(Good to see all these new features Sergey... keep them coming!)

Julian

> On Aug 7, 2018, at 11:44 AM, Sergey Nuyanzin  wrote:
> 
> +1 for release
> at the same time I am ok to wait about a month
> (I have a few ideas about some more improvements)
> 
>> On Tue, Aug 7, 2018 at 5:29 AM Julian Hyde  wrote:
>> 
>> (Forgive the cross-posting. The sqlline dev list isn’t very active, and
>> many of the sqlline community are in the calcite community. Please reply to
>> calcite dev only.)
>> 
>> There have been a number of enhancements to sqlline recently[1] (thanks,
>> Sergey!). Is it time for a release of sqlline? Or should we plan to have a
>> release in say a month, to give people time to add more features.
>> 
>> Julian
>> 
>> [1] https://github.com/julianhyde/sqlline/commits/master
>> 
>> --
>> You received this message because you are subscribed to the Google Groups
>> "sqlline-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to sqlline-dev+unsubscr...@googlegroups.com.
>> To post to this group, send email to sqlline-...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/sqlline-dev/32DBFC47-F2D7-4DB4-9B32-3F36B54296C4%40gmail.com
>> <https://groups.google.com/d/msgid/sqlline-dev/32DBFC47-F2D7-4DB4-9B32-3F36B54296C4%40gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>> 
> 
> 
> -- 
> Best regards,
> Sergey


Re: Avatica client can not talk to Multiple Avatica Servers issue

2018-08-09 Thread Julian Hyde
One important question is: what state is shared among servers? Some pieces of 
state: connection state, statement state, result set state (parameter values 
and position in a scroll). It is not unusual for a cluster of servers to use a 
shared cache for some of the larger / slower-changing pieces of state.

I could imagine a system where connection and statements are shared but result 
set state is not. I could imagine another system where nothing is shared. The 
solution would be different for those cases.

A possible technical solution in Avatica might be for Avatica to transmit all 
necessary state in the return from each RPC, and the next RPC to transmit that 
state. Thus all necessary state is on the client, and in each RPC call.

But such a solution would not be easy to implement, and would not perform as 
well as a system that makes some reasonable assumptions about what state can be 
left on the server.

Julian




> On Aug 9, 2018, at 8:43 AM, Josh Elser  wrote:
> 
> Hi Jiandan,
> 
> Glad you found my write-up on this. One of the original design goals was to 
> *not* implement routing logic in the client. Sticky-sessions is by far the 
> easiest way to implement this.
> 
> There is some retry logic in the Avatica client to resubmit requests when a 
> server responds that it doesn't have a connection/statement cached that the 
> client thinks it should (e.g. the load balancer flipped the client to a newer 
> server). I'm still a little concerned about this level of "smarts" :)
> 
> I don't know if there is a fancier solution that we can do in Avatica. We 
> could consider sharing state between Avatica servers, but I think it is 
> database-dependent as to whether or not you could correctly reconstruct an 
> iteration through a result set.
> 
> I had talked with a dev on the Apache Hive project. He suggested that 
> HiveServer2 just fails the query when the client is mid-query and the server 
> dies (which is reasonably -- servers failing should be an infrequent 
> operation).
> 
> 
> On 8/8/18 8:09 PM, JD Zheng wrote:
>> Hi,
>> Our query engine is using calcite as parser/optimizer and enumerable as 
>> runtime if needed to federate different storage engines. We are trying to 
>> enable JDBC access to our query engine. Everything works smoothly when we 
>> only have one calcite/avatica server.
>> However, JDBC calls will fail if we run multiple instances of 
>> calcite/avatica servers behind a generic load-balancer. Given that JDBC 
>> server is not stateless, this problem was not a surprise. I searched around 
>> and here are the two options suggested by phoenix developers 
>> (https://community.hortonworks.com/articles/9377/deploying-the-phoenix-query-server-in-production-e.html
>>  
>> ):
>> 1. sticky sessions: make the router to always route a client to a given 
>> server.
>> 2. client-driven routing: implementing Avarice’s protocol which passes an 
>> identifier to the load balancer to control how the request is routed to the 
>> backend servers.
>> Before we rush into any implementation, we would really appreciate it if 
>> anyone can share experience or thoughts regarding this issue. Thanks,
>> -Jiandan



[DISCUSS] Vetoing commits

2018-08-09 Thread Julian Hyde
Calcite PMC,

For the first time in this project (to my knowledge) a committer has vetoed the 
commit of another committer.

The details of this particular case are here: 
https://issues.apache.org/jira/browse/CALCITE-2438 


I would like the PMC to answer the following questions: (1) is the veto valid 
in this case? (2) when in general is it valid to veto a commit?

Here is the ASF policy document on the matter: 
https://www.apache.org/foundation/voting.html#Veto 


I was on the receiving end of this veto, so I am far from dispassionate about 
this matter. Therefore I intend to take a back seat in this discussion and I 
would appreciate if another PMC member would drive it. I will state for the 
record my opinion that vetoes are an important right in the ASF, and that 
making a veto is not a light matter, nor is a PMC declaring a veto invalid.

Julian



Re: calcite git commit: tests: add TestUtilTest to CalciteSuite

2018-08-09 Thread Julian Hyde
Vladimir,

Please adhere to the convention for commit comments: start with a capital 
letter. I would only use a component prefix (“tests:”) if it clarifies.

Julian



> On Aug 9, 2018, at 12:19 PM, vladimirsitni...@apache.org wrote:
> 
> Repository: calcite
> Updated Branches:
>  refs/heads/master 0e6733bf8 -> 3c6b5ec75
> 
> 
> tests: add TestUtilTest to CalciteSuite
> 
> 
> Project: http://git-wip-us.apache.org/repos/asf/calcite/repo
> Commit: http://git-wip-us.apache.org/repos/asf/calcite/commit/3c6b5ec7
> Tree: http://git-wip-us.apache.org/repos/asf/calcite/tree/3c6b5ec7
> Diff: http://git-wip-us.apache.org/repos/asf/calcite/diff/3c6b5ec7
> 
> Branch: refs/heads/master
> Commit: 3c6b5ec759caadabb67f09d7a4963cc7d9386d0c
> Parents: 0e6733b
> Author: Vladimir Sitnikov 
> Authored: Thu Aug 9 22:19:27 2018 +0300
> Committer: Vladimir Sitnikov 
> Committed: Thu Aug 9 22:19:27 2018 +0300
> 
> --
> core/src/test/java/org/apache/calcite/test/CalciteSuite.java | 2 ++
> 1 file changed, 2 insertions(+)
> --
> 
> 
> http://git-wip-us.apache.org/repos/asf/calcite/blob/3c6b5ec7/core/src/test/java/org/apache/calcite/test/CalciteSuite.java
> --
> diff --git a/core/src/test/java/org/apache/calcite/test/CalciteSuite.java 
> b/core/src/test/java/org/apache/calcite/test/CalciteSuite.java
> index d87efb0..7042010 100644
> --- a/core/src/test/java/org/apache/calcite/test/CalciteSuite.java
> +++ b/core/src/test/java/org/apache/calcite/test/CalciteSuite.java
> @@ -60,6 +60,7 @@ import org.apache.calcite.util.PermutationTestCase;
> import org.apache.calcite.util.PrecedenceClimbingParserTest;
> import org.apache.calcite.util.ReflectVisitorTest;
> import org.apache.calcite.util.SourceTest;
> +import org.apache.calcite.util.TestUtilTest;
> import org.apache.calcite.util.UtilTest;
> import org.apache.calcite.util.graph.DirectedGraphTest;
> import org.apache.calcite.util.mapping.MappingTest;
> @@ -99,6 +100,7 @@ import org.junit.runners.Suite;
> SqlValidatorFeatureTest.class,
> VolcanoPlannerTraitTest.class,
> InterpreterTest.class,
> +TestUtilTest.class,
> VolcanoPlannerTest.class,
> HepPlannerTest.class,
> TraitPropagationTest.class,
> 



Re: JMH dependency vs licensing

2018-08-10 Thread Julian Hyde
That’s my understanding as well.

I thought we’d settled this a while ago. (I can’t find a URL to prove it.)

Julian


> On Aug 10, 2018, at 7:58 AM, Enrico Olivelli  wrote:
> 
> I think it is fine to use JMH, you are not "redistributing" it, it is here
> only to run local benchmarks.
> 
> We have the same in Apache BookKeeper codebase
> 
> just my 2 cents
> 
> Enrico
> 
> Il giorno ven 10 ago 2018 alle ore 16:56 Michael Mior  ha
> scritto:
> 
>> Perhaps we should just open up a JIRA case on legal for an official ruling.
>> It does seem like we should try to have ubenchmark excluded from releases.
>> Unless I'm mistaken, I don't belive it's required.
>> 
>> On Thu, Aug 9, 2018, 4:01 PM Vladimir Sitnikov <
>> sitnikov.vladi...@gmail.com>
>> wrote:
>> 
>>> There are two questions there:
>>> 1) Is it possible to use third party code with "forbidden" licenses?
>>> As you say, the answer is "it is OK for optional modules".
>>> 
>>> 2) What should be the license of `ubenchmark` module?
>>> It looks like `ubenchmark` code links to JMH in a way that we can't strip
>>> out JMH and replace it with another alternative.
>>> 
>>> Apparently calcite-ubenchmark is published to Maven Central, so it does
>> not
>>> look like "a temporary use for tests", but it finds its way to the Apache
>>> Calcite release.
>>> 
>>> Vladimir
>>> 
>> 



Re: Review and update pull requests

2018-08-10 Thread Julian Hyde
I agree with most of what Vladimir said. But briefly:

* It isn’t often necessary for the contributor to rebase. The reviewer will ask 
for a rebase if it’s really needed.
* If you add a commit after an initial review, it’s really important that you 
do not squash (or amend). The reviewer wants to see the delta.
* Reviewers are (in my opinion) at liberty to perform "copy-editing” style 
tasks (squash, rebase, fix typos, add a couple of extra tests). We don’t want 
the process to provide friction to higher quality.

Julian'

> On Aug 10, 2018, at 3:00 AM, Vladimir Sitnikov  
> wrote:
> 
> Stamatis>Personally, I always perform rebase followed by a forced push
> 
> I was inclined to use that policy in early days, yet I think it should not
> be the main way.
> 
> Bellow assumes GitHub. If we happen to use Gerrit things might shine with a
> different colour.
> 
> I suggest the following.
> 
> FAQ:
> Q: I want to rebase/squash to make the PR shiny. Should I?
> A: No. It would complicate
> 
> Q: I'm afraid all those oops/fixup commits will clutter git history. Should
> I rebase?
> A: No. Rebase/squash can be performed by committer if there's no other
> issues
> 
> Q: Travis CI failed, but the failure is not caused by my changes (e.g.
> failed to download from Maven). Should I force-push to re-trigger the CI?
> A: No. Please create empty commit (git commit --allow-empty) and push it.
> 
> Q: My PR is quite old, and I am afraid it is no longer valid. Should I
> rebase it?
> A: Yes.
> 
> Rules for contributor:
> R1) Use feature branch when creating PR. Do not use yours master branch for
> PR.
> R2) Consider squashing the commits into meaningful ones before you create
> the PR. Do not expect "oops/fix/fixup" commits to land to Calcite master.
> R3) Feel free to force-push and squash commits during the first 10 minutes
> of PR lifetime
> R4) If PR was created more than 10 minutes ago, refrain from force-push
> R5) Do not force-push in case there's a pending discussion (in the PR
> and/or in JIRA) regarding the changes. Pending is vague, so I would suggest
> tp consider the discussion to be in pending state if the latest comment is
> within 2 weeks
> R6) Consider using appropriate commit message for the first commit in
> series. Consider duplicating the message to JIRA/PR, so it gets clear what
> is the nature of the change
> R7) Consider rebasing the PR on top of master if there are lots of new
> commits there
> 
> Rules for committer/reviewer:
> R8) Consider squashing the commits manually rather than asking PR author to
> do that. If "commit is not squashed" is the sole comment, then both author
> and reviewer would have to spend time on one more review iteration with
> just a mechanical changes. Note: committer cannot just use "squash and
> merge" button in the GitHub UI
> 
> Reasoning
> 1) Prefer non-rebase push, prefer regular commits on top of previously
> existing ones.
> It does make it easier to review. Review is async in its nature, and having
> a commit (or multiple of them) with new changes
> enables to review the changes later.
> "rebase + squash" makes it very complicated to review, especially if the
> diff is very small.
> On top of that, if new commits are just added, then reviewer can just point
> which of the variations is better.
> 
> 2) I suppose "squash everything in single commit" can be performed by
> committer assuming the first commit has meaningful message.
> Squash is trivial, however crafting a message takes some time.
> 
> 3) Sometimes it makes sense to squash the PR into several commits (there
> might be several fixes that relate to the same JIRA ticket),
> and I suggest that to be made after there's a consensus in general, and
> after all the other bits are resolved.
> 
> 4) If the PR gets very old, it might make sense to rebase it on top of
> current master. That might be very valid point to squash commits.
> 
> 5) Adding a dummy commit is the only option to re-launch Travis CI tests.
> Making dummy commit is way better than force-pushing all the changes with
> just different commit date.
> 
> 
> Vladimir



Re: calcite git commit: tests: add TestUtilTest to CalciteSuite

2018-08-10 Thread Julian Hyde
I’ve sent one or two emails on the subject in the last few months.

But mostly it’s a convention, "a way in which something is usually done”; the 
rule is to follow the same style as other commits.

Julian


> On Aug 9, 2018, at 1:42 PM, Vladimir Sitnikov  
> wrote:
> 
> Julian>Please adhere to the convention for commit comments: start with a
> capital letter. I would only use a component prefix (“tests:”) if it
> clarifies.
> 
> Could you please clarify where the convention is listed?
> 
> https://calcite.apache.org/develop/#contributing does not clarify "capital
> letter", neither it clarifies "valid component prefixes".
> Apparently the convention is to use [CALCITE- prefix, however I have not
> created JIRA issue and there's no GitHub PR for this commit and alike (e.g.
> 6496cb76301e7191 "test: add testSqlAdvisorTableInSchema", 7088dc7261d2
> "SqlTestFactory: use lazy initialization of objects", etc).
> That was intentional since I am sure it is just an extra work with no real
> pay-off for such trivial changes.
> 
> The intention for "test:" was to clarify that the commit does not change
> production code, so everyone can ignore it.
> 
> Do you mean "Add TestUtilTest to CalciteSuite" is way better commit
> message? I doubt so since this variation of the message would require one
> to parse the message in order to understand that it is test-only commit (it
> does not fix bugs, it does not add features, it just updates tests).
> 
> In other words, "tests: " (well, it should have been "test: ", but anyway)
> does summarize the nature of the commit in a single word. I'm sure that
> clarifies a lot, so I use that.
> 
> Could you please look a the following commit messages once again? Do you
> still think there's a better way to write them?
> If so, I'm all ears, no kidding.
> 
> tests: add TestUtilTest to CalciteSuite
> test: update test name current -> javaMajorVersionExceeds6
> test: add testSqlAdvisorTableInSchema
> 
> Vladimir



PRs must be made by code authors

2018-08-10 Thread Julian Hyde
Committers, here is something watch out for.

I am reviewing a case (see
https://github.com/apache/calcite/pull/782/commits) where the pull
request (PR) was made by one github user and the git commit is made by
a different github user.

The PR is important to the integrity of the intellectual property that
we include in our releases. It indicates that the author of the code
intends to contribute. So, the PR must be made by the same github user
who made the commit(s) being contributed.

As a committer, you must reject PRs where the users do not match.

Julian


Re: nested structs. querying and building metadata in calcite

2018-08-11 Thread Julian Hyde
As I noted in https://issues.apache.org/jira/browse/CALCITE-2464 
, SQL struct types do not 
behave exactly like Java classes (more like Java value types). If the semantics 
are not as desired, maybe we’ll have to design a new type constructor.

Since Calcite is grounded in SQL, I would encourage people to give examples in 
terms of SQL (DDL, queries, results), not just in terms of the Java APIs.

Lastly, I’ll draw your attention to Shuyi’s great work on “CREATE TYPE” (see 
https://issues.apache.org/jira/browse/CALCITE-2045 
). He extended DDL in the 
“server” module, so you can try out his examples.

Julian


> On Aug 11, 2018, at 9:03 AM, Vladimir Sitnikov  
> wrote:
> 
> Just to clarify the use case: I'm building SQL plugin to analyze Java heap
> dumps.
> https://github.com/vlsi/mat-calcite-plugin
> 
> select * from "java.lang.String" s  produces a row for each String in the
> heap dump.
> 
> Then might be a case like
> select u.path from  "java.net.URL" u;
> That is java.net.URL has "path" field which is of java.lang.String.
> 
> Of course Java classes can produce recursive types, so Node { Node next; }
> bothered me.
> 
> The relevant issue is https://issues.apache.org/jira/browse/CALCITE-207
> 
> I have asked once if RelDataTypeProxy is welcome in Calcite (
> https://issues.apache.org/jira/browse/CALCITE-207?focusedCommentId=14035245&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14035245
> ), however it looks like I have to implement it and see what breaks.
> 
> The idea there was to use RelDataTypeProxy("Node") as a type for the "next"
> field. That should avoid "stackoverflow" on cyclic dependencies in types.
> I would love to know your opinion on that if you happen to have one.
> 
> It's great that you update executor to support nested structs.
> 
> PS. I've not had a chance to review it.
> 
> Vladimir



Re: Problem with Calcite and SQuirrelSQL

2018-08-13 Thread Julian Hyde
Looks similar to this:
http://mail-archives.apache.org/mod_mbox/calcite-dev/201611.mbox/%3CCAGssvOYTS_tMGh=xASVqgC47DpBQQtYgc8r7BjNuvR+w=hh...@mail.gmail.com%3E
On Mon, Aug 13, 2018 at 5:37 AM Julian Feinauer
 wrote:
>
> Hi devs,
>
> First, a short disclaimer, I am cross-posting this question on the calcite 
> and on the SQuirrelSQL mailing list as I’m not really sure where the problem 
> comes from.
>
> I am using calcite with a custom Schema to read a specific file format as DB.
> It works when running the queries embedded in Test Code.
> When I link my jar into the sqlline script it also works flawlessly but when 
> I link the code to SQuirrelSQL [1] it gives me the following stacktrace:
>
>
> 2018-08-13 13:06:15,471 [Thread-3] DEBUG 
> net.sourceforge.squirrel_sql.fw.util.DefaultExceptionFormatter  - Error
>
>  593 java.sql.SQLException: Error while executing SQL "SELECT * FROM 
> metadata.TABLES": Unable to instantiate java compiler
>
>  594 at 
> org.apache.calcite.avatica.Helper.createException(Helper.java:56)
>
>  595 at 
> org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>
>  596 at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
>
>  597 at 
> org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:209)
>
>  598 at 
> net.sourceforge.squirrel_sql.client.session.StatementWrapper.execute(StatementWrapper.java:165)
>
>  599 at 
> net.sourceforge.squirrel_sql.client.session.SQLExecuterTask.processQuery(SQLExecuterTask.java:369)
>
>  600 at 
> net.sourceforge.squirrel_sql.client.session.SQLExecuterTask.run(SQLExecuterTask.java:212)
>
>  601 at 
> net.sourceforge.squirrel_sql.fw.util.TaskExecuter.run(TaskExecuter.java:82)
>
>  602 at java.lang.Thread.run(Thread.java:745)
>
>  603 Caused by: java.lang.IllegalStateException: Unable to instantiate java 
> compiler
>
>  604 at 
> org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.compile(JaninoRelMetadataProvider.java:433)
>
>  605 at 
> org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.load3(JaninoRelMetadataProvider.java:374)
>
>  606 at 
> org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.access$000(JaninoRelMetadataProvider.java:94)
>
>  607 at 
> org.apache.calcite.rel.metadata.JaninoRelMetadataProvider$1.load(JaninoRelMetadataProvider.java:113)
>
>  608 at 
> org.apache.calcite.rel.metadata.JaninoRelMetadataProvider$1.load(JaninoRelMetadataProvider.java:110)
>
>  609 at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)
>
>  610 at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)
>
>  611 at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
>
>  612 at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
>
>  613 at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
>
>  614 at 
> com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
>
>  615 at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
>
>  616 at 
> org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.create(JaninoRelMetadataProvider.java:464)
>
>  617 at 
> org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.revise(JaninoRelMetadataProvider.java:477)
>
>  618 at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.revise(RelMetadataQuery.java:203)
>
>  619 at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.collations(RelMetadataQuery.java:565)
>
>  620 at 
> org.apache.calcite.rel.metadata.RelMdCollation.project(RelMdCollation.java:207)
>
>  621 at 
> org.apache.calcite.rel.logical.LogicalProject$1.get(LogicalProject.java:117)
>
>  622 at 
> org.apache.calcite.rel.logical.LogicalProject$1.get(LogicalProject.java:115)
>
>  623 at 
> org.apache.calcite.plan.RelTraitSet.replaceIfs(RelTraitSet.java:238)
>
>  624 at 
> org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:113)
>
>  625 at 
> org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:103)
>
>  626 at 
> org.apache.calcite.rel.core.RelFactories$ProjectFactoryImpl.createProject(RelFactories.java:127)
>
>  627 at 
> org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:956)
>
>  628 at 
> org.apache.calcite.plan.RelOptUtil.createProject(RelOptUtil.java:2952)
>
>  629 at 
> org.apache.calcite.plan.RelOptUtil.createProject(RelOptUtil.java:2910)
>
>  630 at 
> org.apache.calcite.plan.RelOptUtil.createProject(RelOptUtil.java:2854)
>
>  631 at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:3734)
>
>  632 at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(

Re: Docker test environment

2018-08-13 Thread Julian Hyde
+1 what Michael says.

The less friction to running the tests, the more often they will get run, and 
the higher quality we will be. (Friction is some function of manual setup 
effort and the effort to debug/fix false positives when running the tests 
regularly. For example, if the framework is not re-entrant - i.e. doesn’t allow 
me to have two test runs running on different sandboxes on the same machine at 
the same time - that’s a mark against it.)



> On Aug 13, 2018, at 12:57 PM, Michael Mior  wrote:
> 
> Thanks for digging into this Igor! I'm fine with whatever approach others
> want to take. In general, I agree there are problems with the current
> integration test setup and whatever approach allows us to run these tests
> more frequently sounds good to me!
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> 
> Le lun. 13 août 2018 à 06:46, Igor Kryvenko  a
> écrit :
> 
>> Hi all, last few months I worked on moving current test environment to
>> docker environment.
>> Thanks, Volodymyr Vysotskiy for the initial patch.
>> 
>> *Motivation*
>> 
>> I noticed that the current test environment has problems with updating
>> versions of databases and often OOM.
>> I investigated previous tickets about moving to Docker environment, and
>> there was only one problem that there was no stable docker for Mac OS and
>> Windows.
>> Now, As far as I know, it works stable for them, and we can use it.
>> 
>> Also, I observed moving calcite integration tests to an in-memory
>> database(MongoDB, ElasticSearch, Cassandra). Why don't I like it?
>> In case of MongoDB, we use Fongo library, which has no full support of all
>> features of MongoDB, so it creates one more dependency for calcite.
>> Before, we need just implement some feature in calcite and use latest
>> MongoDB with this feature. Now we use Fongo, and if we want to support the
>> latest features of MongoDB, Fongo has to implement them also.
>> In the case of ElasticSearch, I think it is the comfortable tradeoff
>> because we use official ElasticSearch API to construct an in-memory
>> database.
>> 
>> Also, there is one more advantage of using Docker, that if we just make
>> changes in some module(e.g. Cassandra) we can just start docker only with
>> Cassandra image, we don't need to setup whole virtual machine with all
>> databases.
>> Also setting up all docker images is faster that Vagrant, even if we launch
>> it the first time.
>> Next launching will be very fast, thanks to  Docker's cache until we change
>> the context of the Docker container.
>> 
>> 
>> I've already created the branch in calcite-test-dataset --
>> https://github.com/igorKryvenko/calcite-test-dataset/tree/docker-new
>> 
>> and branch in calcite-project with corresponding changes(a few changes, but
>> I need someone's look at it) ---
>> https://github.com/igorKryvenko/calcite/tree/docker
>> 
>> 
>> I will be appreciated if someone will test my changes on Windows and Mac
>> OS.
>> 
>> *Please, do not hesitate and post your questions and remarks.*
>> 
>> Kind regards
>> Igor Kryvenko
>> 



Re: Nested Struct Type inference error

2018-08-14 Thread Julian Hyde
Since the exception (IOOBE) is not a user error, this seems to be a bug.

Could you perhaps reproduce this as SQL (DDL plus a query) in the “server” 
module, say server/src/test/resources/sql/type.iq?


> On Aug 14, 2018, at 6:59 AM, Rong Rong  wrote:
> 
> Hi Devs,
> 
> I am trying to utilize Calcite type inference in my Flink and realize that
> one of the situations will cause exception when trying to infer the operand
> type based on a StructType return type. [1]
> 
> Seems like there's a requirement [2] that if return type is a struct, the
> operand # and return type field list size need to be the same. I was
> wondering if there is specific design reason behind this such as it is
> required to flatten the nested structure field?
> 
> I have attached a test to reproduced the exception when dealing with
> records that has only 1 single field in the field list [3].
> 
> Much appreciate the pointers and suggestions in advance.
> 
> --
> Rong
> 
> [1] https://issues.apache.org/jira/browse/FLINK-10019
> [2]
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql/type/InferTypes.java#L68
> [3]
> https://github.com/apache/calcite/compare/master...walterddr:struct_type_inference_error



Re: nested structs. querying and building metadata in calcite

2018-08-14 Thread Julian Hyde
 (with declared schema).
>>>> 
>>>> Say I have the following type definition:
>>>> 
>>>> CREATE TYPE mytype AS (
>>>>a varchar(2) not null,
>>>>b varchar(2) NULL // optional (null?)
>>>> );
>>>> 
>>>> If my document has only a present ({ a:value }) what should select *
>>> return
>>>> ? Map with single value ({a:value}) or a pair (value, null)
>>>> 
>>>> In other words should select * return raw document (as generic map) or
>>> list
>>>> of mytype ?
>>>> If later, how to differentiate between missing attribute (field) and
>>>> attribute having null value ?
>>>> 
>>>> On Sat, Aug 11, 2018 at 2:51 PM Julian Hyde  wrote:
>>>> 
>>>>> As I noted in https://issues.apache.org/jira/browse/CALCITE-2464 <
>>>>> https://issues.apache.org/jira/browse/CALCITE-2464>, SQL struct
>>> types do
>>>>> not behave exactly like Java classes (more like Java value types). If
>>> the
>>>>> semantics are not as desired, maybe we’ll have to design a new type
>>>>> constructor.
>>>>> 
>>>>> Since Calcite is grounded in SQL, I would encourage people to give
>>>>> examples in terms of SQL (DDL, queries, results), not just in terms of
>>>> the
>>>>> Java APIs.
>>>>> 
>>>>> Lastly, I’ll draw your attention to Shuyi’s great work on “CREATE
>>> TYPE”
>>>>> (see https://issues.apache.org/jira/browse/CALCITE-2045 <
>>>>> https://issues.apache.org/jira/browse/CALCITE-2045>). He extended
>>> DDL in
>>>>> the “server” module, so you can try out his examples.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>>> On Aug 11, 2018, at 9:03 AM, Vladimir Sitnikov <
>>>>> sitnikov.vladi...@gmail.com> wrote:
>>>>>> 
>>>>>> Just to clarify the use case: I'm building SQL plugin to analyze
>>> Java
>>>>> heap
>>>>>> dumps.
>>>>>> https://github.com/vlsi/mat-calcite-plugin
>>>>>> 
>>>>>> select * from "java.lang.String" s  produces a row for each String
>>> in
>>>> the
>>>>>> heap dump.
>>>>>> 
>>>>>> Then might be a case like
>>>>>> select u.path from  "java.net.URL" u;
>>>>>> That is java.net.URL has "path" field which is of java.lang.String.
>>>>>> 
>>>>>> Of course Java classes can produce recursive types, so Node { Node
>>>> next;
>>>>> }
>>>>>> bothered me.
>>>>>> 
>>>>>> The relevant issue is
>>>> https://issues.apache.org/jira/browse/CALCITE-207
>>>>>> 
>>>>>> I have asked once if RelDataTypeProxy is welcome in Calcite (
>>>>>> 
>>>>> 
>>>> https://issues.apache.org/jira/browse/CALCITE-207?focusedCom
>>> mentId=14035245&page=com.atlassian.jira.plugin.system.
>>> issuetabpanels:comment-tabpanel#comment-14035245
>>>>>> ), however it looks like I have to implement it and see what breaks.
>>>>>> 
>>>>>> The idea there was to use RelDataTypeProxy("Node") as a type for the
>>>>> "next"
>>>>>> field. That should avoid "stackoverflow" on cyclic dependencies in
>>>> types.
>>>>>> I would love to know your opinion on that if you happen to have one.
>>>>>> 
>>>>>> It's great that you update executor to support nested structs.
>>>>>> 
>>>>>> PS. I've not had a chance to review it.
>>>>>> 
>>>>>> Vladimir
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 



Re: RelNode#getDescription and memory consumption

2018-08-15 Thread Julian Hyde
I thought the digest only included the IDs of the inputs, not the digest of the 
inputs. Am I mistaken?

Could you give an example of large description & digest?

> On Aug 15, 2018, at 1:46 PM, Laurent Goujon  wrote:
> 
> Hi folks,
> 
> I'm looking for some guidance here before opening JIRAs/pull requests.
> 
> I'm examining a memory dump during a planning operation and a significant
> amount of memory are strings used for RelNode digest and description (some
> strings being around 130kb). In that particular case, the relnode tree is
> particularly deep, and since the digest is basically done recursively, the
> deepest/widest the tree, the longer the digest.
> 
> The easy solution would be to not go deep when adding inputs to the digest,
> and instead of adding the input description to only add their type, id and
> traits (and also not recurse). Would this break parts of calcite, or cause
> other inconvenience because some use-cases rely on digest/description to be
> basically the whole tree in a textual form?
> 
> Laurent



Re: RelNode#getDescription and memory consumption

2018-08-15 Thread Julian Hyde
When I run that test I get

LogicalProject(input=HepRelVertex#10,$0=$9)

Have you screwed something up?

> On Aug 15, 2018, at 2:23 PM, Laurent Goujon  wrote:
> 
> Just ran RelOptRulesTest with a breakpoint in
> AbstractRelNode#computeDigest() and I'm able to observe those kind of
> digest:
> "LogicalProject(input=rel#6:LogicalWindow(input=rel#0:LogicalTableScan(table=[CATALOG,
> SALES, EMP]),window#0=window(partition {0} order by [0] range between
> UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT()])),$0=$9)"
> 
> On Wed, Aug 15, 2018 at 2:09 PM Laurent Goujon  wrote:
> 
>> Here's one (partial) example (truncated because it contains potential
>> sensitive info, and didn't obfuscate or try to reproduce locally with non
>> sensitive data):
>> 
>> "rel#8643738:LogicalProject.NONE.ANY([]).[](input=rel#8643736:LogicalUnion.NONE.ANY([]).[](input#0=rel#8643702:LogicalUnion.NONE.ANY([]).[](input#0=rel#8643668:LogicalUnion.NONE.ANY([]).[](input#0=rel#8643634:LogicalProject.NONE.ANY([]).[](input=rel#8643632:LogicalAggregate.NONE.ANY([]).[](input=rel#8643630:LogicalAggregate.NONE.ANY([]).[](input=rel#8643628:LogicalProject.NONE.ANY([]).[](input=rel#8643626:LogicalFilter.NONE.ANY([]).[](input=rel#8643624:LogicalProject.NONE.ANY([]).[](input=rel#8643622:LogicalProject.NONE.ANY([]).[](input=rel#8643842:MultiJoin.NONE.ANY([]).[](input#0=rel#8643838:LogicalProject.NONE.ANY([]).[](input=rel#8643615:MultiJoin.NONE.ANY([]).[](input#0=rel#8643603:LogicalProject.NONE.ANY([]).[](input=rel#8643601:SampleCrel.NONE.ANY([]).[](input=rel#8639853:ScanCrel.NONE.ANY([]).[](table="...
>> 
>> The Logical* relnodes don't override computeDigest method, so this is
>> basically whatever AbstractRelNode#computeDigest is doing:
>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/AbstractRelNode.java#L415
>> 
>> Laurent
>> 
>> 
>> 
>> On Wed, Aug 15, 2018 at 1:57 PM Julian Hyde  wrote:
>> 
>>> I thought the digest only included the IDs of the inputs, not the digest
>>> of the inputs. Am I mistaken?
>>> 
>>> Could you give an example of large description & digest?
>>> 
>>>> On Aug 15, 2018, at 1:46 PM, Laurent Goujon  wrote:
>>>> 
>>>> Hi folks,
>>>> 
>>>> I'm looking for some guidance here before opening JIRAs/pull requests.
>>>> 
>>>> I'm examining a memory dump during a planning operation and a
>>> significant
>>>> amount of memory are strings used for RelNode digest and description
>>> (some
>>>> strings being around 130kb). In that particular case, the relnode tree
>>> is
>>>> particularly deep, and since the digest is basically done recursively,
>>> the
>>>> deepest/widest the tree, the longer the digest.
>>>> 
>>>> The easy solution would be to not go deep when adding inputs to the
>>> digest,
>>>> and instead of adding the input description to only add their type, id
>>> and
>>>> traits (and also not recurse). Would this break parts of calcite, or
>>> cause
>>>> other inconvenience because some use-cases rely on digest/description
>>> to be
>>>> basically the whole tree in a textual form?
>>>> 
>>>> Laurent
>>> 
>>> 



Re: RelNode#getDescription and memory consumption

2018-08-15 Thread Julian Hyde
I see now.

I think the problem only occurs when you call AbstractRelNode.recomputeDigest().

The first time the digest is computed, the input RelNodes have a digest (and 
desc) as it has been set in AbstractRelNode’s constructor: 

  this.digest = getRelTypeName() + "#" + id;
  this.desc = digest;

Explain writer uses the “desc” field to identify inputs, but maybe it should 
use id or type + id. Or maybe the “desc” field should be final.

By the way, the comment

  // Substring uses the same underlying array of chars, so saves a bit
  // of memory.

was true until JDK 1.6 but is no longer true.

Can you log a JIRA case please.

Julian



> On Aug 15, 2018, at 2:37 PM, Laurent Goujon  wrote:
> 
> Sorry, I should have mentioned the method too: HepPlanner#buildFinalPlan
> (when running RelOptRulesTest#testWindowInParenthesis())
> 
> On Wed, Aug 15, 2018 at 2:36 PM Laurent Goujon  wrote:
> 
>> It looks to happen when building the final plan: the hep planner goes
>> recursively to each node to recompute the digest. In that relnode tree,
>> there's no more HepRelVertex nodes, and the digest now includes the whole
>> input(s) description.
>> 
>> On Wed, Aug 15, 2018 at 2:33 PM Julian Hyde  wrote:
>> 
>>> When I run that test I get
>>> 
>>> LogicalProject(input=HepRelVertex#10,$0=$9)
>>> 
>>> Have you screwed something up?
>>> 
>>>> On Aug 15, 2018, at 2:23 PM, Laurent Goujon  wrote:
>>>> 
>>>> Just ran RelOptRulesTest with a breakpoint in
>>>> AbstractRelNode#computeDigest() and I'm able to observe those kind of
>>>> digest:
>>>> 
>>> "LogicalProject(input=rel#6:LogicalWindow(input=rel#0:LogicalTableScan(table=[CATALOG,
>>>> SALES, EMP]),window#0=window(partition {0} order by [0] range between
>>>> UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT()])),$0=$9)"
>>>> 
>>>> On Wed, Aug 15, 2018 at 2:09 PM Laurent Goujon 
>>> wrote:
>>>> 
>>>>> Here's one (partial) example (truncated because it contains potential
>>>>> sensitive info, and didn't obfuscate or try to reproduce locally with
>>> non
>>>>> sensitive data):
>>>>> 
>>>>> 
>>> "rel#8643738:LogicalProject.NONE.ANY([]).[](input=rel#8643736:LogicalUnion.NONE.ANY([]).[](input#0=rel#8643702:LogicalUnion.NONE.ANY([]).[](input#0=rel#8643668:LogicalUnion.NONE.ANY([]).[](input#0=rel#8643634:LogicalProject.NONE.ANY([]).[](input=rel#8643632:LogicalAggregate.NONE.ANY([]).[](input=rel#8643630:LogicalAggregate.NONE.ANY([]).[](input=rel#8643628:LogicalProject.NONE.ANY([]).[](input=rel#8643626:LogicalFilter.NONE.ANY([]).[](input=rel#8643624:LogicalProject.NONE.ANY([]).[](input=rel#8643622:LogicalProject.NONE.ANY([]).[](input=rel#8643842:MultiJoin.NONE.ANY([]).[](input#0=rel#8643838:LogicalProject.NONE.ANY([]).[](input=rel#8643615:MultiJoin.NONE.ANY([]).[](input#0=rel#8643603:LogicalProject.NONE.ANY([]).[](input=rel#8643601:SampleCrel.NONE.ANY([]).[](input=rel#8639853:ScanCrel.NONE.ANY([]).[](table="...
>>>>> 
>>>>> The Logical* relnodes don't override computeDigest method, so this is
>>>>> basically whatever AbstractRelNode#computeDigest is doing:
>>>>> 
>>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/AbstractRelNode.java#L415
>>>>> 
>>>>> Laurent
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Aug 15, 2018 at 1:57 PM Julian Hyde  wrote:
>>>>> 
>>>>>> I thought the digest only included the IDs of the inputs, not the
>>> digest
>>>>>> of the inputs. Am I mistaken?
>>>>>> 
>>>>>> Could you give an example of large description & digest?
>>>>>> 
>>>>>>> On Aug 15, 2018, at 1:46 PM, Laurent Goujon 
>>> wrote:
>>>>>>> 
>>>>>>> Hi folks,
>>>>>>> 
>>>>>>> I'm looking for some guidance here before opening JIRAs/pull
>>> requests.
>>>>>>> 
>>>>>>> I'm examining a memory dump during a planning operation and a
>>>>>> significant
>>>>>>> amount of memory are strings used for RelNode digest and description
>>>>>> (some
>>>>>>> strings being around 130kb). In that particular case, the relnode
>>> tree
>>>>>> is
>>>>>>> particularly deep, and since the digest is basically done
>>> recursively,
>>>>>> the
>>>>>>> deepest/widest the tree, the longer the digest.
>>>>>>> 
>>>>>>> The easy solution would be to not go deep when adding inputs to the
>>>>>> digest,
>>>>>>> and instead of adding the input description to only add their type,
>>> id
>>>>>> and
>>>>>>> traits (and also not recurse). Would this break parts of calcite, or
>>>>>> cause
>>>>>>> other inconvenience because some use-cases rely on digest/description
>>>>>> to be
>>>>>>> basically the whole tree in a textual form?
>>>>>>> 
>>>>>>> Laurent
>>>>>> 
>>>>>> 
>>> 
>>> 



Arrow Flight

2018-08-16 Thread Julian Hyde
There's a discussion on the Apache Arrow list about a proposed RPC
mechanism called Flight.

One of its use cases is for executing SQL. I fear that for SQL they
will likely end up with a protocol similar to Avatica. I think that we
should extend Avatica should allow Arrow as its data format, with as
few copies on server and client side as possible. I would welcome
contributions from the Arrow community.

Here is my email:
https://lists.apache.org/thread.html/d02133aee410d68521165ee3fcfd8f395fb1e6ed630af8df96c15397@%3Cdev.arrow.apache.org%3E

Please join that thread if you are interested.

Julian


Re: Arrow Flight

2018-08-17 Thread Julian Hyde
> I don't really grok how Arrow would be used with Avatica now (I struggle
> with how Arrow would be used in place of Protobuf, period),

For the packets that carry rows I would envision a payload format
where there rows are just a byte-array. The default payload format
would remain protobuf. The rest of the packet would remain protobuf.
An arrow payload would still be, in a sense, protobuf, since one of
the value types in protobuf (I assume) is a byte array.

Other packet types would remain 100% protobuf. This only matters for
"large" data.

To achieve zero copy or single-copy, we'd have to devise a way to have
the producer write into the under-construction protobuf packet in
situ, or perhaps have the network adapter construct a packet from
fragments in several locations.

Julian


On Fri, Aug 17, 2018 at 11:33 AM Enrico Olivelli  wrote:
>
> I am interested in this topic too.
> I  am not (yet?) using Avatica, but I am working a lot in order to save
> resources during data access and data transfer in JDBC driver in my project
> HerdDB (which is using Calcite SQLParser and planner).
>
> Currently the only way it is have a proprietary protocol and plumb deeply
> the network/RPC layer to the data access/internal representation layer.
>
> Joining the efforts and implementing an efficient stack will be great
>
> I will follow the discussion, if possible I will try to contribute.
>
> Enrico
>
> Il ven 17 ago 2018, 18:05 Josh Elser  ha scritto:
>
> > Thanks for the heads-up, Julian. Subscribing.
> >
> > I don't really grok how Arrow would be used with Avatica now (I struggle
> > with how Arrow would be used in place of Protobuf, period), but I should
> > make the time to figure that out.
> >
> > On 8/16/18 1:27 PM, Julian Hyde wrote:
> > > There's a discussion on the Apache Arrow list about a proposed RPC
> > > mechanism called Flight.
> > >
> > > One of its use cases is for executing SQL. I fear that for SQL they
> > > will likely end up with a protocol similar to Avatica. I think that we
> > > should extend Avatica should allow Arrow as its data format, with as
> > > few copies on server and client side as possible. I would welcome
> > > contributions from the Arrow community.
> > >
> > > Here is my email:
> > >
> > https://lists.apache.org/thread.html/d02133aee410d68521165ee3fcfd8f395fb1e6ed630af8df96c15397@%3Cdev.arrow.apache.org%3E
> > >
> > > Please join that thread if you are interested.
> > >
> > > Julian
> > >
> >
> --
>
>
> -- Enrico Olivelli


Re: Sqlline release

2018-08-18 Thread Julian Hyde
Sergey,

I see a lot of pull requests for sqlline coming in from you… which is 
excellent! You have said in a previous thread that it would be helpful if some 
of them are merged to master because you want to build on that. So, could you 
give me an ordered list of PR numbers that are ready to merge? That would be 
helpful for me as I try to work through the backlog and merge them.

Sergey and others, Is first week of September still a good timeframe for 
release?

Julian


> On Aug 9, 2018, at 1:35 AM, Julian Hyde  wrote:
> 
> Let’s wait about a month then. Target the first week in September. 
> 
> (Good to see all these new features Sergey... keep them coming!)
> 
> Julian
> 
>> On Aug 7, 2018, at 11:44 AM, Sergey Nuyanzin  wrote:
>> 
>> +1 for release
>> at the same time I am ok to wait about a month
>> (I have a few ideas about some more improvements)
>> 
>>> On Tue, Aug 7, 2018 at 5:29 AM Julian Hyde  wrote:
>>> 
>>> (Forgive the cross-posting. The sqlline dev list isn’t very active, and
>>> many of the sqlline community are in the calcite community. Please reply to
>>> calcite dev only.)
>>> 
>>> There have been a number of enhancements to sqlline recently[1] (thanks,
>>> Sergey!). Is it time for a release of sqlline? Or should we plan to have a
>>> release in say a month, to give people time to add more features.
>>> 
>>> Julian
>>> 
>>> [1] https://github.com/julianhyde/sqlline/commits/master
>>> 
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "sqlline-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to sqlline-dev+unsubscr...@googlegroups.com.
>>> To post to this group, send email to sqlline-...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/sqlline-dev/32DBFC47-F2D7-4DB4-9B32-3F36B54296C4%40gmail.com
>>> <https://groups.google.com/d/msgid/sqlline-dev/32DBFC47-F2D7-4DB4-9B32-3F36B54296C4%40gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>> 
>> 
>> -- 
>> Best regards,
>> Sergey



Re: Sqlline release

2018-08-19 Thread Julian Hyde
I’ve pushed PR #86... 
https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c
 
<https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c>
 

I’ll try to get to your remaining PRs in the next few days. Keep up the good 
work!

Julian


> On Aug 18, 2018, at 11:04 AM, Sergey Nuyanzin  wrote:
> 
> Julian,
> thank you very much for merging
> 
> about PR on which I would like to build to add one more improvement:
> currently there is only one
> https://github.com/julianhyde/sqlline/pull/86 (commit 340c3b1 )
> 
>>> Sergey and others, Is first week of September still a good timeframe for
> release?
> for me yes it is good
> 
> 
> On Sat, Aug 18, 2018 at 8:14 PM Julian Hyde  wrote:
> 
>> Sergey,
>> 
>> I see a lot of pull requests for sqlline coming in from you… which is
>> excellent! You have said in a previous thread that it would be helpful if
>> some of them are merged to master because you want to build on that. So,
>> could you give me an ordered list of PR numbers that are ready to merge?
>> That would be helpful for me as I try to work through the backlog and merge
>> them.
>> 
>> Sergey and others, Is first week of September still a good timeframe for
>> release?
>> 
>> Julian
>> 
>> 
>>> On Aug 9, 2018, at 1:35 AM, Julian Hyde  wrote:
>>> 
>>> Let’s wait about a month then. Target the first week in September.
>>> 
>>> (Good to see all these new features Sergey... keep them coming!)
>>> 
>>> Julian
>>> 
>>>> On Aug 7, 2018, at 11:44 AM, Sergey Nuyanzin 
>> wrote:
>>>> 
>>>> +1 for release
>>>> at the same time I am ok to wait about a month
>>>> (I have a few ideas about some more improvements)
>>>> 
>>>>> On Tue, Aug 7, 2018 at 5:29 AM Julian Hyde 
>> wrote:
>>>>> 
>>>>> (Forgive the cross-posting. The sqlline dev list isn’t very active, and
>>>>> many of the sqlline community are in the calcite community. Please
>> reply to
>>>>> calcite dev only.)
>>>>> 
>>>>> There have been a number of enhancements to sqlline recently[1]
>> (thanks,
>>>>> Sergey!). Is it time for a release of sqlline? Or should we plan to
>> have a
>>>>> release in say a month, to give people time to add more features.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> [1] https://github.com/julianhyde/sqlline/commits/master
>>>>> 
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>> Groups
>>>>> "sqlline-dev" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>> an
>>>>> email to sqlline-dev+unsubscr...@googlegroups.com.
>>>>> To post to this group, send email to sqlline-...@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> 
>> https://groups.google.com/d/msgid/sqlline-dev/32DBFC47-F2D7-4DB4-9B32-3F36B54296C4%40gmail.com
>>>>> <
>> https://groups.google.com/d/msgid/sqlline-dev/32DBFC47-F2D7-4DB4-9B32-3F36B54296C4%40gmail.com?utm_medium=email&utm_source=footer
>>> 
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> Sergey
>> 
>> 
> 
> -- 
> Best regards,
> Sergey



[DISCUSS] Avatica - how efficient is our protocol?

2018-08-23 Thread Julian Hyde
This is a paper in VLDB 2018, "Don’t Hold My Data Hostage – A Case For Client 
Protocol Redesign” by Mark Rassveldt and Hannes Muhleisen[1]. It claims that 
database client protocols (inside ODBC and JDBC drivers) are very inefficient, 
and has a compelling example where commercial drivers are 10x to 68x slower 
than net-cat.

One of the goals of Avatica is to do better. How are we doing? Are there any 
ideas in the paper we could adopt? Would a closer partnership with Apache Arrow 
help us achieve those goals?

Julian

[1] https://hannes.muehleisen.org/p852-muehleisen.pdf 


Re: graphql query to sql query with Calcite ?

2018-08-23 Thread Julian Hyde
I’d call that a "GraphQL front-end for Calcite". (SQL is our main front end, 
but other front-ends include linq4j and I gather there are other query 
languages in commercial products, e.g. Stardog uses Calcite to translate SPARQL 
to SQL[1].)

I think this is a good fit for Calcite, and would support it. Should it be a 
module in Calcite, or a standalone project that uses Calcite? Both are 
reasonable options.

In case folks on the dev list are not familiar with GraphQL, I will point out 
that it is NOT a query language for graph databases (as are Cypher, SPARQL, 
Gremlin). But it is exceedingly good at running queries on data sources with 
nested data and producing results to power web applications. And it is becoming 
extremely popular. 

My thumbnail sketch of the architecture: write (or better, re-use) a GraphQL 
parser and semantic analyzer. Take the validated GraphQL AST and convert it 
into Calcite relational algebra (probably using RelBuilder[2]). Then use 
RelToSqlConverter to convert relational algebra to SQL. RelToSqlConverter 
handles differences in dialect, and is getting better all the time.

Julian

[1] https://www.stardog.com/blog/virtual-graphs-in-stardog-5/ 


[2] https://calcite.apache.org/docs/algebra.html#algebra-builder 



> On Aug 21, 2018, at 10:21 PM, Eugen Stan  wrote:
> 
> Hello,
> 
> TLDR:
> 
> I'm wondering if I can integrate Calcite with [graphql-java] and use
> Calcite to transform a graphql query into an SQL query and send it
> directly to the database.
> 
> Furthermore, I'm curious if I can use Calcite's adapters to emulate an
> SQL layer on top of other remote services and leverage the query planner
> from Calcite to build smart/optimal queries.
> 
> There is prior art to this: a project called [join-monster] that does
> this for JS. See [join-monster-7-min] video for a short description.
> 
> The process to go from graphql query to SQL query is described in
> [join-monster-process] and it's quite short.
> 
> Longer version
> 
> I'm working on a GraphQL API for a SaaS platform. Right now we are
> facing with a common problem in GraphQL: one query for a graph of
> objects will turn to N+1 queries on the back-end data-store. There is a
> lot of literature on this on the internet and also descibed in
> [data-loader] and [join-monster].
> 
> Now, one solution for this problem is to use [data-loader]  - to cache
> objects. This works for some, and it is kind of the only solution for
> remote data stores (other http API endpoints).
> 
> My initial objective is to transform the AST that graphql-java builds
> into an AST for SQL and push this SQL to one database.
> 
> I believe Calcite can help with this and I'm reaching out to the
> community since I'm not familiar with the project and the features and
> limitations it has.
> 
> Can Calcite help me transform the GraphQL query AST to an SQL AST? 
> 
> Should I look into this or should I go straight to something like ANTLR.
> I know there is a definition for [graphql-java-antlr] . I'm asking this
> to know if it has features that could help me or could block me?
> 
> Features that could help I imagine is the [SQL-grammer] ? 
> 
> 
> Thank you,
> 
> Eugen
> 
> 
> [data-loader] https://github.com/facebook/dataloader
> 
> [graphql-java] https://github.com/graphql-java/graphql-java/ 
> 
> [join-monster] https://join-monster.readthedocs.io/en/latest/
> 
> [join-monster-7-min] https://www.youtube.com/watch?v=Y7AdMIuXOgs
> 
> [join-monster-process]
> https://github.com/acarl005/join-monster/tree/master/src
> 
> [graphql-java-antlr]
> https://github.com/graphql-java/graphql-java/tree/master/src/main/antlr 
> 
> [sql-grammmer] https://calcite.apache.org/docs/reference.html
> 
> 



Re: Assign alias to a column in RelBuilder

2018-08-23 Thread Julian Hyde
As Andrei says, using "RelBuilder.alias(RexNode, String)” on an expression in 
RelBuilder.project works. But only most of the time - RelBuilder reserves the 
right to discard projects that do nothing more than rename columns.

Try using RelBuilder.rename. It will force creation of a Project even if it is 
the identity.

Ultimately there are no guarantees. If RelBuilder finds a way to simplify, it 
will simplify, even if that discards the Project that renamed fields.

Julian


> On Aug 23, 2018, at 3:43 PM, Andrei Sereda  wrote:
> 
> The following works for me
> 
> @Test public void relBuilderProjectAlias() throws Exception {
>  final RelBuilder builder = RelBuilder.create(config().build());
>  final RelNode root =
>  builder.scan("EMP")
>  .project(
>builder.alias(builder.field("EMPNO"), "aaa"),
>builder.alias(builder.field("ENAME"), "bbb")
>  )
>  .build();
> 
>  try (PreparedStatement stm = RelRunners.run(root);
>  ResultSet rset = stm.executeQuery();
>  ) {
>while (rset.next()) {
>  System.out.printf("aaa=%s bbb=%s\n",
>  rset.getString("aaa"),
>  rset.getString("bbb"));
>}
>  }
> }
> 
> 
> aaa=7369 bbb=SMITH
> aaa=7499 bbb=ALLEN
> aaa=7521 bbb=WARD
> 
> 
> On Thu, Aug 23, 2018 at 5:54 PM Anand Gupta  wrote:
> 
>> Hi All,
>> 
>> I am using RelBuilder to build relational expression and then converting it
>> to a SQL query using Rel2SqlConverter. I skimmed through the examples of
>> RelBuilder
>> <
>> https://github.com/apache/calcite/blob/master/core/src/test/java/org/apache/calcite/examples/RelBuilderExample.java
>>> 
>> but
>> couldn't figure out a way to assign an alias to a column name.
>> 
>> I am able to select columns from a table and create a query like
>> 
>> "select col1, col2, col3 from table1"
>> 
>> However, I want to do something like
>> 
>> "select col1 as t1, col2 as t2, col3 as n9 from table1"
>> 
>> Ideally, there should be a way to set an alias while projecting the
>> columns. However, I couldn't find it. I found two relevant methods "as" and
>> "alias" in the code. However, couldn't make them work - "as" only works on
>> table and "alias" refer to table name alias.
>> 
>> Any suggestion on how to make this work will be appreciated.
>> 
>> Thanks,
>> -A
>> 



Re: Casting from BIGINT to TIMESTAMP in Calcite?

2018-08-23 Thread Julian Hyde
I bet the semantics you have in mind makes some reference to the 1970-01-01 
UNIX epoch. The SQL standard makes no mention of that epoch.

Converting numbers to timestamps is “obvious” only in that frame.

I don’t think it’s “obvious” enough for it to be the behavior of CAST. If you 
want to convert, use a conversion function (I’m sure Postgres has one) or 
interval arithmetic:

  Ts = TIMESTAMP ‘1970-01-01 00:00:00’ + m * INTERVAL ‘0.001’ SECOND

Julian


> On Aug 23, 2018, at 5:50 PM, Shuyi Chen  wrote:
> 
> Currently, SqlValidator does not allow casting from BIGINT to TIMESTAMP. Is
> there any reason that we don't support such CAST rule, or it is fine that
> we add it in Calcite? Thanks a lot.
> 
> Shuyi
> -- 
> "So you have to trust that the dots will somehow connect in your future."



Re: Support UUID type in Calcite

2018-08-23 Thread Julian Hyde
It would be a lot of work to support it as a new physical type (e.g. we would 
have to modify Avatica). Easier to support as a logical type, similar to what 
we did with Geometry. Its physical type would be, say, BINARY(16).

Julian


> On Aug 23, 2018, at 5:56 PM, Shuyi Chen  wrote:
> 
> Do we want to add support for UUID type in Calcite? Databases like vertica
>  and
> postgre  are
> already supporting UUID type? Thanks.
> 
> Shuyi
> 
> -- 
> "So you have to trust that the dots will somehow connect in your future."



Re: LogicalAggregate REX pushdown optimization rule

2018-08-24 Thread Julian Hyde
In relational algebra

  select sum(x + y)
  from (select x, y from t)

is identical to

  select sum(xy)
  from (select x + y as xy from t)

because relational algebra does not have “sub-queries”. So, the one thing your 
rewrite rule is accomplishing is converting

  case when product = ‘foo’ when sum(units) else 0 end

into

  sum(case when product = ‘foo’ then 0 else units end)

That rewrite seems valid, as long as units is not null and groups are never 
empty. (If a group is empty, as can happen in windowed aggregates, or all of 
its units values are null, then sum would return null not 0.)

In general, the rewrite seems to be to convert f(agg(x)) into agg(g(x)).

For example if f is “2 *” and agg is “sum”, then g would be “2 *”. Thus “2 * 
sum(units)” becomes “sum(2 * units)”. But as you see, after the rewrite we are 
doing more work (because we are multiplying more values by 2). So it’s valid 
but probably not beneficial. We’d have to be careful how we apply this rule.

I’d name this rule “ProjectAggregateTransposeRule”. To make get it into 
Calcite, you would need to log a JIRA case, convert your code from Kotlin into 
Java, write some test cases in RelOptRulesTest (including a case where the rule 
does NOT fire because null values would make it invalid), and create a PR. I’d 
want the code to be able to handle a bit more than “CASE … ELSE 0 END” applied 
to “SUM”. We want the rule to be extensible to handle other patterns.

Julian


> On Aug 24, 2018, at 7:09 AM, Z. S.  wrote:
> 
> Hi,
> 
> I'm interested in writing a rule to push certain aggregate rexes to a lower
> subqueries to be able to remove not needed group by statements. In SQL this
> would look by transforming:
> 
> select t2.id, sum(case when product = 'foo' then sum_units else 0 end)
> from ( select id, product, sum(units) AS sum_units from orders group by id,
> product ) t2
> group by t2.id
> 
> Into:
> select t2.id, sum(sum_units)
> from ( select id, sum(case when product = 'foo' then sum_units else 0 end)
> as sum_units from orders group by id ) t2 group by t2.id
> 
> I've been able to implement a rule for this specific example so I know in
> theory it's possible to make the rule generic. You can see the rule here:
> https://pastebin.com/G3x9CdAW
> 
> My question is if there's a better way to solve this? Are there any
> existing calcite tools that could be used? I tried looking at PushProjector
> class but it doesn't seem to be for this purpose? Is it OK to traverse the
> tree like I'm doing it by calling the getInput method and casting nodes to
> HepRelVertex or is there a better way to traverse the tree? How would you
> suggest implementing such a rule?
> 
> Thanks



Requesting review for [CALCITE-2470] RelBuilder.project should combine expressions if underlying node is a Project

2018-08-26 Thread Julian Hyde
Can I have one or two reviews of
https://issues.apache.org/jira/browse/CALCITE-2470, please?

Having RelBuilder hold merge projects yields significant improvements
to plans - fewer RelNodes, which should increase planner performance,
and better field aliases. But the plan improvements will cause plan
changes in downstream projects.

I am not sure that I am handling dynamic star correctly. Can a Drill
committer please review and if necessary fix?

Julian


Re: Giving the Calcite logo some love

2018-08-26 Thread Julian Hyde
Thanks for kicking this off, Daniel. Let me say up front that I think
your proposed logo is much better than the current logo. Especially
the clarity of design, and the choice of font. If we can't come up
with a better alternative shortly (i.e. in this thread), I think we
should vote on adopting it. I would vote for it.

That said, if we're going to change the logo, let's give people a
couple of days to submit other options.

The reason I chose the name Calcite (and the previous name, Optiq) was
because refraction (specifically birefringence) suggests elegance and
economy of purpose. Query planning is about working smarter, not
harder, and the hammer in Daniel's proposed logo suggests brute force.

So, I refer you to the previous thread on the subject of logos, about
2 years ago [1]. And here is a nice picture of birefringence [2]; my
awful hand-drawn sketches in the logo thread were alluding to that. If
those sources inspire people to create some more logo options for us
to vote on, I would be delighted.

Julian

[1] 
https://lists.apache.org/thread.html/1e1cbac26e5f07a661b2c0e0d1ebe1ced0d105e19b82dfe72ee47fa9@1457378768@%3Cdev.calcite.apache.org%3E

[2] https://commons.wikimedia.org/wiki/File:Fluorescence_in_calcite.jpg
On Sun, Aug 26, 2018 at 2:26 PM Muhammad Gelbana  wrote:
>
> I don't know the tool name you've put there but I guess the message you're
> trying to deliver is: We dissect bulks of data to bring up useful
> information.
>
> I'm probably wrong. We also need to think about the reason behind choosing
> the name: Calcite. So the final logo would make sense.
>
> What do you think ?
>
> Thanks,
> Gelbana
>
>
> On Sun, Aug 26, 2018 at 10:56 PM Daniel Gruno  wrote:
>
> > Hello, awesome Calcite people!
> >
> > As some of you may know, I'm on an arduous mission of gathering and
> > touching up all logos we have at Apache. During this task, I realized
> > the calcite logo has some flaws that does not make it super fit for
> > print, so perhaps it's time to look for a new logo?
> >
> > I did a quick proposal, figuring "calcite...minerals...mining!" - if I'm
> > completely off track, let me know :D Anyway, the proposal is at:
> >
> > http://www.apache.org/logos/comdev-test/res/calcite/calcite-proposed.svg
> >
> > (current logo is found at
> > http://www.apache.org/logos/comdev-test/#calcite for reference)
> >
> > If you like it, it's hereby ALv2 so feel free to use it.
> > If you want changes, or maybe a logo contest, that's also a-okay! Just
> > let me know if you have feedback, and remember to CC me if so, as I'm
> > not on this mailing list.
> >
> > With warm regards,
> > Daniel.
> >


Re: Giving the Calcite logo some love

2018-08-27 Thread Julian Hyde
Everyone, please be sure to cc Daniel when you reply to this thread. (He is 
making the same offer to a lot of projects and understandably he does not want 
to subscribe to all of their dev lists.)

> On Aug 27, 2018, at 5:37 AM, Michael Mior  wrote:
> 
> Definitely agreed that the logo could use some love. I like what you've
> done with the type in the logo and I also like Julian's suggestion of
> refraction. I'll see what I can do over the next couple days about
> combining those two things.
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> 
> Le dim. 26 août 2018 à 16:56, Daniel Gruno  a écrit :
> 
>> Hello, awesome Calcite people!
>> 
>> As some of you may know, I'm on an arduous mission of gathering and
>> touching up all logos we have at Apache. During this task, I realized
>> the calcite logo has some flaws that does not make it super fit for
>> print, so perhaps it's time to look for a new logo?
>> 
>> I did a quick proposal, figuring "calcite...minerals...mining!" - if I'm
>> completely off track, let me know :D Anyway, the proposal is at:
>> 
>> http://www.apache.org/logos/comdev-test/res/calcite/calcite-proposed.svg
>> 
>> (current logo is found at
>> http://www.apache.org/logos/comdev-test/#calcite for reference)
>> 
>> If you like it, it's hereby ALv2 so feel free to use it.
>> If you want changes, or maybe a logo contest, that's also a-okay! Just
>> let me know if you have feedback, and remember to CC me if so, as I'm
>> not on this mailing list.
>> 
>> With warm regards,
>> Daniel.
>> 



Re: Maven wrapper

2018-08-27 Thread Julian Hyde
Re-opening a discussion from earlier this year (and logged as 
https://issues.apache.org/jira/browse/CALCITE-2112 
<https://issues.apache.org/jira/browse/CALCITE-2112>).

I have changed my mind on this issue. I encountered a user today for whom 
getting a valid version of maven was a significant obstacle. I am now +1 — I 
think it would be beneficial to include mvnw and mvnw.bat in the source 
distribution, and use them in our default build instructions.

I do not think it increases complexity. Advanced users can use “mvn” if they 
choose, but the default instructions would mention only “./mvnw”.

We do not need to include maven-wrapper.jar or MavenWrapperDownloader.jar. mvnw 
and mvnw.bat work fine without them. As they make release votes more 
complicated, I think we should exclude these.

There were a mixture or -1, -0 and +1 in the original thread. Has anyone else 
changed position?

Julian


> On Jan 2, 2018, at 5:01 PM, Julian Hyde  wrote:
> 
> We already have a tool that provides a container for the whole build process. 
> That tool is maven. I do not recall a time where someone had problems because 
> they had the wrong version of maven installed; so this is a non-problem.
> 
> I’ve written C/C++ projects before (using autoconf, libtool, mingw etc.), and 
> thank heavens we don’t have their problems.
> 
> If we were to use a wrapper, we’d either have to get the wrapper from an 
> external repo, or we’d have to distribute the wrapper. For the first, we’ve 
> just shifted the version-management problem; for the second, we’d be 
> distributing our own tool-chain, including binaries (non-source files), which 
> is problematic for an open source project. 
> 
> Julian
> 
> 
>> On Jan 2, 2018, at 3:31 PM, Josh Elser  wrote:
>> 
>> +1 to the simplicity.
>> 
>> But, to Vladimir's issues (thx btw), maybe we can solve some of those 
>> pain-points another way? I've seen some projects (notably, those with 
>> compilation of C/C++ code) provide a Dockerfile that can create an 
>> environment capable of building that native code.
>> 
>> It seems like a lot of the things Vladimir cites could be solved by a 
>> similar approach which would keep us on a single build tool (instead, 
>> providing a ready-made environment to build without polluting anything else).
>> 
>> I'd be OK with that approach if someone wanted to make that work.
>> 
>> On 1/2/18 5:03 PM, Julian Hyde wrote:
>>> Yes.
>>> But I claim that adding mvnw to the picture makes things more complicated 
>>> for the typical user, because there are now more options to understand.
>>> Julian
>>>> On Jan 2, 2018, at 2:00 PM, Michael Mior  wrote:
>>>> 
>>>> Even if we do include mvnw, isn't it still possible to use a compatible mvn
>>>> directly?
>>>> 
>>>> --
>>>> Michael Mior
>>>> mm...@apache.org
>>>> 
>>>> 2018-01-02 15:35 GMT-05:00 Julian Hyde :
>>>> 
>>>>> True, but for 2 and 3 it’s not much of a hardship to type
>>>>> 
>>>>> $ /usr/local/maven-x.y.z/bin/mvn -s my-settings.xml target
>>>>> 
>>>>> rather than
>>>>> 
>>>>> $ mvn target
>>>>> 
>>>>> And for 1, I claim that typing “mvn” is less surprising to most people
>>>>> than typing “mvnw”. Because most people who build java code these days are
>>>>> familiar with mvn.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jan 2, 2018, at 12:17 PM, Vladimir Sitnikov <
>>>>> sitnikov.vladi...@gmail.com> wrote:
>>>>>> 
>>>>>>> Multiple versions of Maven can be installed side-by-side (and we don't
>>>>>> have esoteric requirements). As such, I don't see the need for such a
>>>>>> change
>>>>>> 
>>>>>> The reasons could include:
>>>>>> 1) Simplified Apache Maven installation for those who have no experience
>>>>>> with it
>>>>>> 2) Having multiple settings.xml files (e.g. if corporate rules requires
>>>>>> certain settings.xml that is incompatible with Apache Calcite
>>>>> settings.xml)
>>>>>> 3) Simplified management of multiple Apache Maven versions. In the same
>>>>>> way, corporate rules might require specific mvn version (outdated due to
>>>>>> plugins, etc), so that version would likely be the default.
>>>>>> 
>>>>>> Vladimir
>>>>> 
>>>>> 
> 



Re: Maven wrapper

2018-08-28 Thread Julian Hyde
> On Aug 28, 2018, at 8:10 AM, Josh Elser  wrote:
> 
> Is it worthwhile to share the details of that situation with the community 
> (or are the specifics you provided all that's really relevant)? Asking to 
> better understand if there is some legitimate criticism of what Maven lets 
> you do, or if it's something we can make better in Calcite itself.

This particular case was a consultant for my company for whom I was building a 
custom version of Calcite. The consultant is technical and uses git all the 
time, has a JVM installed on his machine (mainly for JRuby), but does not do 
Java development, therefore does not have maven. 

Since his machine is macOS it was straightforward to do “brew install maven”. 
(Which took about 20 minutes, because he first had to upgrade home-brew.)

Clearly it was not that hard for him to install maven, but if we used mvnw we 
could remove even that friction.

> As long as we don't create a schism where some things can only be done by 
> mvnw, I'm OK with this change.

I promise that won’t happen.

I believe that if you have mvn installed, mvnw will use it. Therefore most 
developers will continue to use the same path, regardless of whether they type 
“mvn” or “./mvnw”. I will continue to type “mvn”.

Julan

Re: JDK 8 syntax

2018-08-28 Thread Julian Hyde
Excellent. Can you commit the fix please, Vova?

> On Aug 28, 2018, at 10:34 AM, Vova Vysotskyi  wrote:
> 
> Hi all,
> 
> Janino issue <https://github.com/janino-compiler/janino/issues/47> with
> this "strange" default method was fixed, so now we can revert the temporary
> fix made in CALCITE-2261
> <https://issues.apache.org/jira/browse/CALCITE-2261> and update Janino
> version to 3.0.9.
> 
> Kind regards,
> Volodymyr Vysotskyi
> 
> 
> On Sat, Apr 21, 2018 at 11:27 AM Enrico Olivelli 
> wrote:
> 
>> Patch merged!
>> 
>> Welcome to java 8 !
>> 
>> Enrico
>> 
>> Il mar 17 apr 2018, 17:09 Enrico Olivelli  ha
>> scritto:
>> 
>>> Issue
>>> https://issues.apache.org/jira/browse/CALCITE-2261
>>> 
>>> Patch
>>> https://github.com/apache/calcite/pull/667
>>> 
>>> Cheers
>>> Enrico
>>> 
>>> 
>>> 2018-04-17 16:51 GMT+02:00 Enrico Olivelli :
>>> 
>>>> Vova,
>>>> I tried to add some "default" methods and all tests are passing (maybe
>>>> you already saw this).
>>>> Thank you !
>>>> 
>>>> I will be happy to contribute my patch as it is really simple and I have
>>>> it on my laptop
>>>> 
>>>> Enrico
>>>> 
>>>> 
>>>> 2018-04-17 15:49 GMT+02:00 Vova Vysotskyi :
>>>> 
>>>>> Taking a step to the side of a workaround, the current version of
>> Janino
>>>>> prefers default methods instead of "abstract", so we may declare
>>>>> *SchemaPlus.getSubSchema()* method as default and it will help to
>> choose
>>>>> this method instead of the method from the parent interface :)
>>>>> 
>>>>> Kind regards,
>>>>> Volodymyr Vysotskyi
>>>>> 
>>>>> 2018-04-17 15:10 GMT+03:00 Enrico Olivelli :
>>>>> 
>>>>>> I have tried to add an 'unwrap' method to Schema but then Janino
>> keeps
>>>>>> breaking for other similar reasons about method overriding with
>>>>> narrower
>>>>>> return types.
>>>>>> 
>>>>>> I guess it will be an hard task to adapt Calcite code.
>>>>>> The approach of working on Janino is better.
>>>>>> 
>>>>>> Enrico
>>>>>> 
>>>>>> Il dom 15 apr 2018, 14:43 Enrico Olivelli  ha
>>>>>> scritto:
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Il dom 15 apr 2018, 14:22 Vova Vysotskyi  ha
>>>>> scritto:
>>>>>>> 
>>>>>>>> I have reproduced it in Janino only and created the issue:
>>>>>>>> https://github.com/janino-compiler/janino/issues/47
>>>>>>> 
>>>>>>> 
>>>>>>> Great work Vova,
>>>>>>> Thank you
>>>>>>> 
>>>>>>> Enrico
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Kind regards,
>>>>>>>> Volodymyr Vysotskyi
>>>>>>>> 
>>>>>>>> 2018-04-14 20:15 GMT+03:00 Vova Vysotskyi :
>>>>>>>> 
>>>>>>>>> Ok, I will try to prepare a test case and will log a bug on
>> Janino
>>>>>> soon.
>>>>>>>>> 
>>>>>>>>> Kind regards,
>>>>>>>>> Volodymyr Vysotskyi
>>>>>>>>> 
>>>>>>>>> 2018-04-14 20:02 GMT+03:00 Julian Hyde :
>>>>>>>>> 
>>>>>>>>>> Vova,
>>>>>>>>>> 
>>>>>>>>>> Thanks for doing the research. Your explanation sounds very
>>>>> plausible
>>>>>>>>>> (I suspected default methods, too). Can you please log a bug on
>>>>>>>>>> JANINO? https://github.com/janino-compiler/janino/issues Arno,
>>>>> the
>>>>>>>>>> project maintainer, has been very good to us over the years.
>>>>>>>>>> 
>>>>>>>>>> Julian
>>>>>>>>>> 
>>>&

Re: Calcite contributions

2018-08-28 Thread Julian Hyde
Vladimir,

I think Michael is being a bit too nice. Your behavior was extremely rude.

It does seem that you were technically correct in this case. But we require 
civility in this community.

Julian


> On Aug 28, 2018, at 11:11 AM, Michael Mior  wrote:
> 
> I understand that when there are issues with trivial fixes, it's sometimes
> much easier to just fix them yourself. I think it's still beneficial if
> people are willing to work with the original contributor to fix things, but
> I understand not everyone will be able to invest that time. If an alternate
> fix is made that supersedes a patch which was contributed, I still think
> this should be communicated to the original contributor along with a link
> to the fix. This at least gives the person an opportunity to learn.
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> 
> Le mar. 28 août 2018 à 09:56, Vladimir Sitnikov 
> a écrit :
> 
>> Michael> But the default in those cases should hopefully be to work with
>> the
>> Michael>original contributor to make any necessary changes or in some cases
>> explain
>> 
>> Please clarify if you mean that BOTH trivial and non-trivial cases to be
>> handled in exactly the same way.
>> 
>> Vladimir
>> 



Re: Giving the Calcite logo some love

2018-08-28 Thread Julian Hyde
The drop shadow is kind of the point (see birefringence[1]).

But I agree that it makes the logo difficult to read.

Julian

[1] https://en.wikipedia.org/wiki/Birefringence 


> On Aug 28, 2018, at 11:44 AM, Vladimir Sitnikov  
> wrote:
> 
> Stamatis>From the discussion so far, I came also with a few quick drafts
> 
> How about making text separate from the shape?
> E.g. shape on the left, text on the right.
> 
> Michael>without the drop shadow on the text
> 
> +1
> 
> Vladimir



Re: [DISCUSS] Towards Avatica-Go 3.1.0

2018-08-28 Thread Julian Hyde
Thanks for driving this, Francis.

Apache release policy and Calcite release practice have both changed recently:
* Calcite no longer releases a .zip, only a .tar.gz; and we no longer release 
an .md5[1]. If you want to drop the .zip feel free; if not, that’s fine also.
* Apache policy now says you SHOULD NOT release an .md5[2]; you must drop that 
from the release.

Julian

[1] https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.17.0-rc0/ 


[2] https://www.apache.org/dev/release-distribution.html#sigs-and-sums 



> On Aug 28, 2018, at 4:05 PM, Francis Chuang  wrote:
> 
> Hi all,
> 
> I'd like to start a vote for Avatica-Go 3.1.0 over the next few days.
> 
> Some key things I'd like to address in this release:
> 
> - Go 1.11 was released a few days ago and now includes support for dependency 
> management using "Go modules" (done).
> 
> Some history:
> 
> The Go community released a package manager called Dep[1] in the middle of 
> 2017. Dep is designed to be very similar to npm, composer and cargo. 
> Initially, this was poised to be the official package manager for Go. At the 
> beginning of 2018, Russ Cox (a member of the Go team) announced vgo (aka Go 
> modules) which is a different approach to package management for Go. While 
> there was some push back from the community working on Dep, Go modules is now 
> officially in the Go tool chain and will be the package management solution 
> of choice for all Go projects.
> 
> Transition plan for Avatica-Go:
> 
> Avatica-Go currently uses Dep and has the Gopkg.toml and Gopkg.lock files 
> committed. In terms of the Go team, the current (1.11) and last (1.10) 
> versions of Go are the actively maintained versions. Since Go 1.10 does not 
> have support for Go modules (but there was a patch release to support Go 
> modules import paths to work with libraries using Go modules), we need to 
> keep Dep in place for now. I have added support for Go modules (go.mod and 
> go.sum files) to support people using Go modules for package management. When 
> Go 1.12 is released in early 2019, I will remove support for Dep and all 
> users will be required to use Go modules. This will allow us to simplify the 
> configuration for continuous integration and the documentation for using and 
> releasing Avatica-Go.
> 
> - Update dependencies (done).
> 
> - Test against Avatica 1.12 and Phoenix 5.0.0 (in progress).
> 
> - Update documentation (in progress).
> 
> This release should be pretty routine and there should be no significant 
> changes. I will send another email to start a vote in the next few days. If 
> you have any comments or questions, please let me know.
> 
> Francis
> 
> [1] https://github.com/golang/dep/releases
> 



Re: [DISCUSS] Towards Avatica-Go 3.1.0

2018-08-28 Thread Julian Hyde
Also, it’s helpful that you are publishing the transition plan up-front. Do you 
plan to include it in the release notes, or perhaps reference a JIRA case? 
Users of avatica-go may not be subscribed to this list, but they still need to 
be aware of the transition plan.

> On Aug 28, 2018, at 4:05 PM, Francis Chuang  wrote:
> 
> Hi all,
> 
> I'd like to start a vote for Avatica-Go 3.1.0 over the next few days.
> 
> Some key things I'd like to address in this release:
> 
> - Go 1.11 was released a few days ago and now includes support for dependency 
> management using "Go modules" (done).
> 
> Some history:
> 
> The Go community released a package manager called Dep[1] in the middle of 
> 2017. Dep is designed to be very similar to npm, composer and cargo. 
> Initially, this was poised to be the official package manager for Go. At the 
> beginning of 2018, Russ Cox (a member of the Go team) announced vgo (aka Go 
> modules) which is a different approach to package management for Go. While 
> there was some push back from the community working on Dep, Go modules is now 
> officially in the Go tool chain and will be the package management solution 
> of choice for all Go projects.
> 
> Transition plan for Avatica-Go:
> 
> Avatica-Go currently uses Dep and has the Gopkg.toml and Gopkg.lock files 
> committed. In terms of the Go team, the current (1.11) and last (1.10) 
> versions of Go are the actively maintained versions. Since Go 1.10 does not 
> have support for Go modules (but there was a patch release to support Go 
> modules import paths to work with libraries using Go modules), we need to 
> keep Dep in place for now. I have added support for Go modules (go.mod and 
> go.sum files) to support people using Go modules for package management. When 
> Go 1.12 is released in early 2019, I will remove support for Dep and all 
> users will be required to use Go modules. This will allow us to simplify the 
> configuration for continuous integration and the documentation for using and 
> releasing Avatica-Go.
> 
> - Update dependencies (done).
> 
> - Test against Avatica 1.12 and Phoenix 5.0.0 (in progress).
> 
> - Update documentation (in progress).
> 
> This release should be pretty routine and there should be no significant 
> changes. I will send another email to start a vote in the next few days. If 
> you have any comments or questions, please let me know.
> 
> Francis
> 
> [1] https://github.com/golang/dep/releases
> 



Re: [DISCUSS] Towards Avatica-Go 3.1.0

2018-08-28 Thread Julian Hyde
True. But you will run into grief from the Apache release police if you do. I 
would strongly recommend following the recommendation. :)


> On Aug 28, 2018, at 4:23 PM, Michael Mior  wrote:
> 
> Looks like a great plan. Thanks Francis :)
> 
> Also, while I think we should drop the .md5 (and I don't see a good reason
> not to), SHOULD NOT indicates that it's a recommendation, not a requirement.
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> 
> Le mar. 28 août 2018 à 19:13, Julian Hyde  a écrit :
> 
>> Thanks for driving this, Francis.
>> 
>> Apache release policy and Calcite release practice have both changed
>> recently:
>> * Calcite no longer releases a .zip, only a .tar.gz; and we no longer
>> release an .md5[1]. If you want to drop the .zip feel free; if not, that’s
>> fine also.
>> * Apache policy now says you SHOULD NOT release an .md5[2]; you must drop
>> that from the release.
>> 
>> Julian
>> 
>> [1]
>> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.17.0-rc0/
>> <https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.17.0-rc0/
>>> 
>> 
>> [2] https://www.apache.org/dev/release-distribution.html#sigs-and-sums <
>> https://www.apache.org/dev/release-distribution.html#sigs-and-sums>
>> 
>> 
>>> On Aug 28, 2018, at 4:05 PM, Francis Chuang 
>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I'd like to start a vote for Avatica-Go 3.1.0 over the next few days.
>>> 
>>> Some key things I'd like to address in this release:
>>> 
>>> - Go 1.11 was released a few days ago and now includes support for
>> dependency management using "Go modules" (done).
>>> 
>>> Some history:
>>> 
>>> The Go community released a package manager called Dep[1] in the middle
>> of 2017. Dep is designed to be very similar to npm, composer and cargo.
>> Initially, this was poised to be the official package manager for Go. At
>> the beginning of 2018, Russ Cox (a member of the Go team) announced vgo
>> (aka Go modules) which is a different approach to package management for
>> Go. While there was some push back from the community working on Dep, Go
>> modules is now officially in the Go tool chain and will be the package
>> management solution of choice for all Go projects.
>>> 
>>> Transition plan for Avatica-Go:
>>> 
>>> Avatica-Go currently uses Dep and has the Gopkg.toml and Gopkg.lock
>> files committed. In terms of the Go team, the current (1.11) and last
>> (1.10) versions of Go are the actively maintained versions. Since Go 1.10
>> does not have support for Go modules (but there was a patch release to
>> support Go modules import paths to work with libraries using Go modules),
>> we need to keep Dep in place for now. I have added support for Go modules
>> (go.mod and go.sum files) to support people using Go modules for package
>> management. When Go 1.12 is released in early 2019, I will remove support
>> for Dep and all users will be required to use Go modules. This will allow
>> us to simplify the configuration for continuous integration and the
>> documentation for using and releasing Avatica-Go.
>>> 
>>> - Update dependencies (done).
>>> 
>>> - Test against Avatica 1.12 and Phoenix 5.0.0 (in progress).
>>> 
>>> - Update documentation (in progress).
>>> 
>>> This release should be pretty routine and there should be no significant
>> changes. I will send another email to start a vote in the next few days. If
>> you have any comments or questions, please let me know.
>>> 
>>> Francis
>>> 
>>> [1] https://github.com/golang/dep/releases
>>> 
>> 
>> 



Re: Calcite contributions

2018-08-28 Thread Julian Hyde
Again, I’m going to speak more strongly than Michael.

Vladimir,

Even if you are right, that doesn’t give you the right to be uncivil. 
Over-riding Zoltan’s PR, for which he had asked for a review, but not received, 
was not OK.

If you had just said “Hey Zoltan, I think I’ve come up with a better fix than 
your PR; do you mind if I commit it?” then Zoltan would have said “Sure”. 
Please do that next time.

If we don’t have civility and trust, this community will fall apart. 
Maintaining civility is more important than fixing a bug.

Julian


> On Aug 28, 2018, at 4:32 PM, Michael Mior  wrote:
> 
> Thanks for clarifying. It seems like this is a case where there's a broader
> discussion to be had about the merits of the optimization (completely
> separate from the issue at hand).
> 
> 1) I'm not opposed to developing a fix that's better than one which was
> already proposed although I think it would be good to run it by whoever
> originally filed the issue.
> 2) I think "backward" here is subjective. I'm not picking a side here, but
> certainly there are cases where disabling a buggy optimization is the right
> thing to do.
> 3) Developing a different fix may sometimes be the right thing to do, but I
> think other contributors would appreciate a discussion before their code is
> effectively ignored.
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> 
> Le mar. 28 août 2018 à 17:01, Vladimir Sitnikov 
> a écrit :
> 
>> Michael>One of the other side effects in this case
>> Michael>seems to be (without having examined the technical merit of either
>> Michael>solution) that the fix which was ultimately committed still didn't
>> solve
>> Michael>the original issue.
>> 
>> I'm afraid you did are wrong here.
>> My first commit implemented exactly one test. I removed @Ignored from the
>> test and implemented a fix.
>> 
>> It turned out Zoltan crafted more complex test that identified a bug in the
>> v1 of the implementation.
>> Note: that was a new test, and it was not included in PR707.
>> Note: there are millions of test cases missing, and I know the proper way
>> to cover it.
>> However, it looks like everybody here likes "one test per issue" approach
>> more, so I follow it somehow: I unlock a single test, so everybody is
>> happy.
>> 
>> Michael>In this case, it seems like Zoltan was still willing to help
>> provide a
>> solution
>> 
>> AFAIK, no-one (including Zoltan) cares to suggest a test case to defeat
>> current code in master.
>> I treat that as "the feature is good enough".
>> 
>> Michael>see the last activity on both of those before the period of
>> inactivity was
>> by Zoltan
>> 
>> My point here is
>> 1) It takes ~30 min to "develop+test" the fix
>> 2) PR707 goes in opposite direction: it disables the optimization instead
>> of just unlocking a single @Ignore test
>> 3) The bug does bother me
>> ==> I just fix it and merge it.
>> On top of that, I see nothing I could reuse from PR707, so I had to just
>> discard it.
>> 
>> Vladimir
>> 



Re: joins and low selectivity optimization

2018-08-28 Thread Julian Hyde
If I recall correctly, Hive does this kind of optimization. It’s pretty 
important you have a date dimension table and your fact table is partitioned on 
date. Example:

  select *
  from sales
join date_dim on sales.date_id = date_dim.id
  where sales.product_name = ‘foo'
  and date_dim.quarter = ‘2018-Q2'

Hive would like to transform it to

  select *
  from sales
  where date_id in (20180401, 20180402, … , 20180630)
  and sales.product_name = ‘foo'

by pre-evaluating the query on the date_dim table. It doesn’t do the 
optimization at logical planning time (where Calcite is involved) but at 
physical planning time (which occurs later). The list of date_id values allows 
it to scan a much more limited set of partitions of the sales fact table.

Michael is correct that optimizers don’t usually have access to data. But if 
the date_dim table changes only slowly, you could set up a “tripwire” that will 
invalidate the plan if the date_dim table happens to change between planning 
and execution.

Julian




> On Aug 28, 2018, at 6:04 PM, Michael Mior  wrote:
> 
> As far as I am aware, the optimizer has no access to data, only metadata.
> The traditional way to solve such problems would be to select among
> different join algorithms which perform better for varying cardinalities of
> each side of the join. Unfortunately, I think you're likely to have a tough
> time extracting the necessary data to do the rewrite you're aiming for.
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> 
> Le mar. 28 août 2018 à 20:34, Andrei Sereda  a écrit :
> 
>> Hello,
>> 
>> I’m looking for a way to improve performance of a join query.
>> 
>> Suppose one joins two heterogeneous sources t1 and t2 with some predicates.
>> 
>> Further assume that cardinality of one of the predicates is very low
>> (compared cardinality of the second one). (How) Is it possible to convert
>> second query (predicate) to include results (primary keys) of the first one
>> (with low selectivity) ?
>> Example
>> 
>> select *from
>>  t1 left join t1 on (t1.id = t2.id)where
>>  t1.attr = 'foo' and t2.attr = 'bar'
>> 
>> Let’s say that predicate t1.attr = 'foo' results in 3 rows (id=1, 2, 3).
>> Will it be possible to rewrite t2 query to :
>> 
>> select *from t2 where
>>   id in (1, 2, 3) and t2.attr = 'bar'
>> 
>> I’m aware of existence of Metadata
>> <
>> https://calcite.apache.org/apidocs/org/apache/calcite/rel/metadata/Metadata.html
>>> 
>> but not sure to use it.
>> 
>> Any hits / directions are appreciated.
>> 
>> Thanks,
>> Andrei.
>> 



Re: joins and low selectivity optimization

2018-08-28 Thread Julian Hyde
Sure, Calcite makes use of stats in its cost formulas. And you are correct that 
“metadata” is what Calcite calls statistics.

But you have to be careful to only treat statistics as approximate. If the 
statistics were gathered using an “ANALYZE TABLE” command a month ago they may 
be out of date, so you cannot use them to, say, remove “WHERE x < 10” if a 
month ago x only had values 2, 4, and 6.

> On Aug 28, 2018, at 7:14 PM, Andrei Sereda  wrote:
> 
> Thank you, Michael and Julian, for your answers.
> 
> Even if optimizers don't have access to data can they have access to table
> statistics ? If I remember correctly Oracle CBO is estimating selectivity
> based on column distribution (histograms) and some formulas for density
> <https://gerardnico.com/db/oracle/statistics/density>. I realize these
> statistics are not available for all data stores but can calcite optimizer
> be "smarter" when this data is available ?
> 
> On Tue, Aug 28, 2018 at 9:46 PM Julian Hyde  wrote:
> 
>> If I recall correctly, Hive does this kind of optimization. It’s pretty
>> important you have a date dimension table and your fact table is
>> partitioned on date. Example:
>> 
>>  select *
>>  from sales
>>join date_dim on sales.date_id = date_dim.id
>>  where sales.product_name = ‘foo'
>>  and date_dim.quarter = ‘2018-Q2'
>> 
>> Hive would like to transform it to
>> 
>>  select *
>>  from sales
>>  where date_id in (20180401, 20180402, … , 20180630)
>>  and sales.product_name = ‘foo'
>> 
>> by pre-evaluating the query on the date_dim table. It doesn’t do the
>> optimization at logical planning time (where Calcite is involved) but at
>> physical planning time (which occurs later). The list of date_id values
>> allows it to scan a much more limited set of partitions of the sales fact
>> table.
>> 
>> Michael is correct that optimizers don’t usually have access to data. But
>> if the date_dim table changes only slowly, you could set up a “tripwire”
>> that will invalidate the plan if the date_dim table happens to change
>> between planning and execution.
>> 
>> Julian
>> 
>> 
>> 
>> 
>>> On Aug 28, 2018, at 6:04 PM, Michael Mior  wrote:
>>> 
>>> As far as I am aware, the optimizer has no access to data, only metadata.
>>> The traditional way to solve such problems would be to select among
>>> different join algorithms which perform better for varying cardinalities
>> of
>>> each side of the join. Unfortunately, I think you're likely to have a
>> tough
>>> time extracting the necessary data to do the rewrite you're aiming for.
>>> 
>>> --
>>> Michael Mior
>>> mm...@apache.org
>>> 
>>> 
>>> 
>>> Le mar. 28 août 2018 à 20:34, Andrei Sereda  a écrit :
>>> 
>>>> Hello,
>>>> 
>>>> I’m looking for a way to improve performance of a join query.
>>>> 
>>>> Suppose one joins two heterogeneous sources t1 and t2 with some
>> predicates.
>>>> 
>>>> Further assume that cardinality of one of the predicates is very low
>>>> (compared cardinality of the second one). (How) Is it possible to
>> convert
>>>> second query (predicate) to include results (primary keys) of the first
>> one
>>>> (with low selectivity) ?
>>>> Example
>>>> 
>>>> select *from
>>>> t1 left join t1 on (t1.id = t2.id)where
>>>> t1.attr = 'foo' and t2.attr = 'bar'
>>>> 
>>>> Let’s say that predicate t1.attr = 'foo' results in 3 rows (id=1, 2, 3).
>>>> Will it be possible to rewrite t2 query to :
>>>> 
>>>> select *from t2 where
>>>>  id in (1, 2, 3) and t2.attr = 'bar'
>>>> 
>>>> I’m aware of existence of Metadata
>>>> <
>>>> 
>> https://calcite.apache.org/apidocs/org/apache/calcite/rel/metadata/Metadata.html
>>>>> 
>>>> but not sure to use it.
>>>> 
>>>> Any hits / directions are appreciated.
>>>> 
>>>> Thanks,
>>>> Andrei.
>>>> 
>> 
>> 



Re: Giving the Calcite logo some love

2018-08-29 Thread Julian Hyde
Yes indeed!

If someone feels inspired to produce a logo, here’s my suggestion of a 
theme/image: a spider, specifically a Barn Spider (Araneus Cavaticus)[1]. It 
was the origin of the name “avatica”, connects and spins webs, and the 
eponymous individual in Charlotte’s Web had rather exceptional communication 
skills. 

Julian

[1] https://en.m.wikipedia.org/wiki/Barn_spider

> On Aug 28, 2018, at 9:49 PM, Francis Chuang  wrote:
> 
> The designs I have seen so far look really good! Would it also make sense to 
> design a variant for Avatica as well? This is what the current Avatica logo 
> looks like: https://calcite.apache.org/avatica/img/logo.png
> 
> Francis
> 
>> On 29/08/2018 7:08 AM, Vladimir Sitnikov wrote:
>> Stamatis>How about something like the following:
>> 
>> There's left-to-right vs right-to-left issue, however I would claim that
>> the direction of improvement is right+up.
>> For instance: BTC price is good when plots go to the right and go upward.
>> 
>> https://svgur.com/s/83y is slanted backward.
>> That creates perception of "Calcite holding back the progress" or "Apache
>> pushing C away" or something like that.
>> Could you flip rhombus so it goes right-up?
>> 
>> 
>> Vladimir
>> 
> 


Re: CALCITE-2463 Silence ERROR logs from CalciteException, SqlValidatorException

2018-08-29 Thread Julian Hyde
A sql validation error when a statement is being prepared (say by a jdbc 
server) is really an error, and I think it should be logged.

However when such exceptions occur during tests such as SqlValidatorTest they 
are clearly not errors - they are expected behavior. So ideally they would not 
be logged or printed to the screen.

Is there a way to achieve both of the above? I agree with your instinct that 
the tests are currently much too verbose. 

Julian

> On Aug 29, 2018, at 12:01 AM, Vladimir Sitnikov  
> wrote:
> 
> Hi,
> 
> I would like to hear if there are reasons not to merge CALCITE-2463/PR797
> It is a trivial change, the build is green, and I see no comments (==> that
> sounds like an approval).
> 
> The issue is CalciteException and SqlValidatorException always log, and it
> clutters build logs for no reason.
> 
> The solution is to print the errors in calcite.debug=true mode only.
> There's alternative option to eliminate LOGGER.error(toString());
> altogether, and I'm +0.5 for that.
> 
> I am sure ERROR logs should be dedicated to true errors. Fake errors like
> Calite produces now do not belong to error logs.
> 
> The issue does impact me:
> 1) SqlAdvisor always logs "errors", however it is SqlAdvisor that crafts
> invalid SQL.
>  2018-08-28 18:32:12,082 [main] ERROR -
> org.apache.calcite.sql.validate.SqlValidatorException: Object '_SUGGEST_'
> not found within 'SALES'
> 2) Travis build logs are huge, and it is hard to analyze failures. I ignore
> "SqlValidatorException" in 100% of the cases, however it does take
> noticeable time to load and scroll.
> 
> Vladimir


Re: joins and low selectivity optimization

2018-08-29 Thread Julian Hyde
Regarding Vladimir’s ideas of Bloom filters and nested loop joins. Both are 
excellent if you can do them. They are fairly easy in single-node architectures 
(especially single-threaded) but get harder in distributed architectures. Bloom 
filters (also magic sets) require data to be pushed “up stream”, and may 
require re-starting sub-graphs.

So, you have to devise query processing algorithms that are appropriate for 
your architecture.

Hive is an example of a highly distributed, parallel engine. Hive would like to 
do Bloom filters but has still not gotten around to it. Nested loops are never 
likely to happen. But Hive uses other techniques.

Julian


> On Aug 29, 2018, at 8:37 AM, Andrei Sereda  wrote:
> 
> Hi Vladimir,
> 
> Thanks for follow-up and explanation. I wanted to make sure I'm not missing
> (mis-understanding) anything.
> 
> Andrei.
> 
> 
> 
> On Wed, Aug 29, 2018 at 11:01 AM Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
> 
>> One of the approaches to such queries is to throw Bloom filters all over
>> the place.
>> 
>> That is it could execute "small side" of the join, collect the ids (or a
>> lossy version of it in a form of Bloom filters),
>> and it could propagate that Bloom filter to the second source to reduce the
>> set of rows produced by the second row source.
>> Then the join would be easier to do since the second row source is reduced.
>> 
>> The sad thing is not all systems support propagation of bloom filters.
>> 
>>> select *from
>>> t1 join t2 on (t1.id = t2.id)where
>>> t2.id in (select id from t1) -- force sub selec
>> 
>> What if Calcite did just a regular batched nested loop join?
>> That is:
>> 1. Fetch next 10 rows from t1
>> 2. Fetch "from t2 where id in (...)"
>> 3. goto 1
>> 
>> It can be expressed via correlated subqueries, however:
>> a) I'm not sure correlated subqueries work great at the moment
>> b) Support for "batched" correlated execution is likely not there
>> c) Calcite should somehow know the true cost of "from t2 where id in (1,2)"
>> vs "from t2 where id in (1,2,3,4)". In other words, current costing model
>> does not take into account if the table has index or not. One can code such
>> costing rules, however I think it is not there yet.
>> 
>> Vladimir
>> 



Re: Calcite contributions

2018-08-29 Thread Julian Hyde
It’s never going to be black and white. We’re never going to have a code of 
conduct that covers all cases. (But please, everyone go read the CoC if you 
have not read it recently: 
https://www.apache.org/foundation/policies/conduct.html 
.)

But we all know civility when we see it.

In this case, Vladimir was a bit hasty in committing a fix, and in so doing 
trod on Zoltan’s toes. Zoltan was a bit upset, and started this thread to voice 
his feelings.

If I literally stepped on someone’s toes while walking along the street, I’d 
say “Sorry, dude! I didn’t mean to tread on your toes!” and that would probably 
be the end of the matter. We don’t need to start a debate about whether Zoltan 
was walking on the right part of the sidewalk. I think an apology is often all 
that is needed.

Also, everyone should assume good faith in everyone else. Vladimir isn’t trying 
to diminish Zoltan, he just wants to make the product better. Zoltan isn’t 
trying to make a fuss, and would be happy to see his PR overridden if he saw a 
better solution.

Julian





> On Aug 29, 2018, at 10:13 AM, Michael Mior  wrote:
> 
>> I bet it is impossible to set "code of conduct" that makes everybody is
> happy
> Agreed, although we may be able to agree on a minimum standard.
> 
>> Would you call me violent if I just commit the proper fix and ignore
> PR802?
> I don't think anyone was suggesting violence.
> 
>> What if I have committed the fix yesterday?
>> What if I have committed the fix a couple of days ago?
> I don't think the issue here is timing so much as that in the case of
> CALCITE-2327, there was no effort made to run the fix past Zoltan before
> committing (please correct me if I'm wrong). In general, I think waiting a
> day or two is reasonable. Even if someone isn't able to respond in that
> window, I think people will appreciate that a heads up was given.
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> 
> Le mer. 29 août 2018 à 04:00, Vladimir Sitnikov 
> a écrit :
> 
>> Julian>If you had just said “Hey Zoltan, I think I’ve come up with a better
>> fix than your PR; do you mind if I commit it?” then Zoltan would have said
>> “Sure”.
>> 
>> While I agree with general points (tough I bet it is impossible to set
>> "code of conduct" that makes everybody is happy), however reality is not
>> black and white.
>> 
>> What is the timeout for the answer?
>> Does "absence of answer within 2 hours" count as "sure"?
>> Does "absence of answer within 24 hours" count as "sure"?
>> Does "absence of answer within 48 hours" count as "sure"?
>> ...
>> 
>> Here's an (on-going!) example:
>> https://issues.apache.org/jira/browse/CALCITE-2484 (Dynamic table tests
>> give wrong results when running tests concurrently)
>> There's a bug, there's PR.
>> 
>> I have reviewed the PR and suggested changes. JIRA reads that my review was
>> "4 days ago".
>> Would you call me violent if I just commit the proper fix and ignore PR802?
>> What if I have committed the fix yesterday?
>> What if I have committed the fix a couple of days ago?
>> 
>> In both cases, PR/Issue author puts no warnings to the issue/pr that
>> suggest if (s)he is actively working on the problem.
>> 
>> I do agree it feels bad when your work (issue comments, code changes) is
>> discarded. However, people make mistakes, so it might happen they raise
>> tickets/do code changes that should never be made in the first place
>> (==>those changes are doomed to be discarded).
>> 
>> Vladimir
>> 



Re: CALCITE-2463 Silence ERROR logs from CalciteException, SqlValidatorException

2018-08-29 Thread Julian Hyde
Are you perhaps conflating two similar but different problems:
1. as a developer, it’s irritating how much stuff appears on the screen when I 
run tests;
2. as an administrator there is too much / too little / too much useless stuff 
in my logs.

I think we should just solve problem 1 for now.

For 2, if we continue to write "No match found for function signature 
ABCDE(, , )” to the log it’s not perfect but it’s 
better than nothing.

Julian




> On Aug 29, 2018, at 1:24 AM, Vladimir Sitnikov  
> wrote:
> 
>> A sql validation error when a statement is being prepared (say by a jdbc
> server) is really an error, and I think it should be logged.
> 
> Suppose there's an end-to-end data management system that happens to use
> Calcite.
> 
> What action should system administrator take when the following error is
> printed in the log?
> org.apache.calcite.runtime.CalciteContextException: From line 1, column 17
> to line 1, column 40: No match found for function signature
> ABCDE(, , )
> 
> Note: those error messages provide NO context on the stack trace. SQL is
> missing as well. Business level info (user name, transaction name, business
> action name) is missing as well.
> 
> Note2: the application might HANDLE the error and issue an alternative
> query.
> For instance: if query fails, it might use different syntax. If query
> timeouts, the application might use alternative query.
> When that happens, the original "exception" is not an error. It must not be
> printed as error.
> How Calcite knows if application handles the error?
> 
> Julian>Is there a way to achieve both of the above? I agree with your
> instinct that the tests are currently much too verbose.
> 
> The point is it is not Calcite's business to identify if the error is
> SEVERE for application or not.
> Calcite throws the errors, and it makes very little reason to log "just
> messages".
> 
> I have went though the same "ignorance-denying-despair-acceptance" route
> with silencing exceptions in https://github.com/pgjdbc/pgjdbc/pull/1187
> 
> The errors (and warnings) should be actionable. In other words, system
> administrator should be able to perform corrective actions.
> Logging all the CalciteExceptions provides no way to distinguish truly
> important stuff from "dev tracing".
> 
> Note: there's always an option to enable TRACE logging level for
> CalciteException which will activate "automatic printing of all the errors
> with stacktraces".
> That might be useful for debugging/analyzing of bad system behavior,
> however it should be disabled by default.
> 
> Vladimir



JSON support

2018-08-29 Thread Julian Hyde
Somehow I missed this… we have a pull request for JSON support. It’s a big 
change (both in terms of importance and the amount of effort).

https://github.com/apache/calcite/pull/785 


https://issues.apache.org/jira/browse/CALCITE-2266 


I made a quick review, and it looks good. I see Vladimir has reviewed. Can 
someone else (not necessarily a committer) take a look?

Julian




Re: CALCITE-2463 Silence ERROR logs from CalciteException, SqlValidatorException

2018-08-29 Thread Julian Hyde
I haven’t read the full PR. (Sorry, finite time due to $dayjob.) If the PR only 
removes output from the screen or log during tests, I’m fine.

> On Aug 29, 2018, at 11:55 AM, Vladimir Sitnikov  
> wrote:
> 
> Julian>I think we should just solve problem 1 for now.
> 
> Does that mean you agree to merge the PR?
> 
> Julian>For 2, if we continue to write "No match found for function
> signature ABCDE(, , )” to the log it’s not perfect
> but it’s better than nothing.
> 
> Why do you think this is the only logged message?
> 
> Does that exception die in Calcite sources? I hope it does not.
> Does it propagate to the client API?
> Vladimir



Re: JSON support

2018-08-29 Thread Julian Hyde
The rules for Parser.jj are a little different. JavaCC needs to parse the Java 
fragments inside rules and sometimes it can’t handle modern Java syntax such as 
“new ArrayList<>()”.

> On Aug 29, 2018, at 1:16 PM, Michael Mior  wrote:
> 
> One thing that jumped out to me was the use of an explicit type when
> calling the ArrayList constructor in the parser. I initially thought this
> was something that Julian had gotten rid of in his big update to support
> Java 8 syntax. However, I see this still persists in Parser.jj . Is this
> intentional?
> --
> Michael Mior
> mm...@apache.org
> 
> 
> 
> Le mer. 29 août 2018 à 16:11, Julian Hyde  a écrit :
> 
>> Somehow I missed this… we have a pull request for JSON support. It’s a big
>> change (both in terms of importance and the amount of effort).
>> 
>> https://github.com/apache/calcite/pull/785 <
>> https://github.com/apache/calcite/pull/785>
>> 
>> https://issues.apache.org/jira/browse/CALCITE-2266 <
>> https://issues.apache.org/jira/browse/CALCITE-2266>
>> 
>> I made a quick review, and it looks good. I see Vladimir has reviewed. Can
>> someone else (not necessarily a committer) take a look?
>> 
>> Julian
>> 
>> 
>> 



Re: Maven wrapper

2018-08-30 Thread Julian Hyde
Please review https://github.com/julianhyde/calcite/tree/2112-mvnw 
<https://github.com/julianhyde/calcite/tree/2112-mvnw>, and give it a try in 
your own sandbox.

I have built on the original patch. We no longer need to include a .jar or 
.java. And I’ve updated the documentation to use ‘./mvnw’ rather than ‘mvn’.

Julian


> On Aug 28, 2018, at 10:35 AM, Julian Hyde  wrote:
> 
>> On Aug 28, 2018, at 8:10 AM, Josh Elser  wrote:
>> 
>> Is it worthwhile to share the details of that situation with the community 
>> (or are the specifics you provided all that's really relevant)? Asking to 
>> better understand if there is some legitimate criticism of what Maven lets 
>> you do, or if it's something we can make better in Calcite itself.
> 
> This particular case was a consultant for my company for whom I was building 
> a custom version of Calcite. The consultant is technical and uses git all the 
> time, has a JVM installed on his machine (mainly for JRuby), but does not do 
> Java development, therefore does not have maven. 
> 
> Since his machine is macOS it was straightforward to do “brew install maven”. 
> (Which took about 20 minutes, because he first had to upgrade home-brew.)
> 
> Clearly it was not that hard for him to install maven, but if we used mvnw we 
> could remove even that friction.
> 
>> As long as we don't create a schism where some things can only be done by 
>> mvnw, I'm OK with this change.
> 
> I promise that won’t happen.
> 
> I believe that if you have mvn installed, mvnw will use it. Therefore most 
> developers will continue to use the same path, regardless of whether they 
> type “mvn” or “./mvnw”. I will continue to type “mvn”.
> 
> Julan



Re: Maven wrapper

2018-08-31 Thread Julian Hyde
Thanks Sergey. I contributed back to the maven-wrapper project:
https://github.com/takari/maven-wrapper/pull/89.
On Fri, Aug 31, 2018 at 8:05 AM Michael Mior  wrote:
>
> Works for me on Ubuntu 18.04. Skimmed the doc changes as well and looks
> good to me.
> --
> Michael Mior
> mm...@apache.org
>
>
>
> Le jeu. 30 août 2018 à 19:57, Julian Hyde  a écrit :
>
> > Please review https://github.com/julianhyde/calcite/tree/2112-mvnw <
> > https://github.com/julianhyde/calcite/tree/2112-mvnw>, and give it a try
> > in your own sandbox.
> >
> > I have built on the original patch. We no longer need to include a .jar or
> > .java. And I’ve updated the documentation to use ‘./mvnw’ rather than ‘mvn’.
> >
> > Julian
> >
> >
> > > On Aug 28, 2018, at 10:35 AM, Julian Hyde  wrote:
> > >
> > >> On Aug 28, 2018, at 8:10 AM, Josh Elser  wrote:
> > >>
> > >> Is it worthwhile to share the details of that situation with the
> > community (or are the specifics you provided all that's really relevant)?
> > Asking to better understand if there is some legitimate criticism of what
> > Maven lets you do, or if it's something we can make better in Calcite
> > itself.
> > >
> > > This particular case was a consultant for my company for whom I was
> > building a custom version of Calcite. The consultant is technical and uses
> > git all the time, has a JVM installed on his machine (mainly for JRuby),
> > but does not do Java development, therefore does not have maven.
> > >
> > > Since his machine is macOS it was straightforward to do “brew install
> > maven”. (Which took about 20 minutes, because he first had to upgrade
> > home-brew.)
> > >
> > > Clearly it was not that hard for him to install maven, but if we used
> > mvnw we could remove even that friction.
> > >
> > >> As long as we don't create a schism where some things can only be done
> > by mvnw, I'm OK with this change.
> > >
> > > I promise that won’t happen.
> > >
> > > I believe that if you have mvn installed, mvnw will use it. Therefore
> > most developers will continue to use the same path, regardless of whether
> > they type “mvn” or “./mvnw”. I will continue to type “mvn”.
> > >
> > > Julan
> >
> >


Re: Had some exceptions when testing Row type

2018-08-31 Thread Julian Hyde
They all look plausible.

Calcite uses a “flattener” to convert structs into regular columns and if it 
doesn’t do it right, you tend to get off-by-one errors.

> On Aug 31, 2018, at 11:03 AM, Rui Wang  wrote:
> 
> Hi community,
> 
> I am testing ROW type on top of Apache Beam. I am tying to make these cases
> work:
> 
> INSERT INTO table_with_row_column
> SELECT row from another_table_with_row_column
> 
> or
> 
> INSERT INTO table_with_row_column
> SELECT row FROM (
>  SELECT ... FROM (
> SELECT ...
>  )
> )
> 
> I tested a bunch of queries and had some exceptions that stopped queries to
> finish. Here are some JIRAs I created with what query I tested, what Java
> stack was and the source code link where the code execution failed:
> 
> https://issues.apache.org/jira/browse/CALCITE-2515
> https://issues.apache.org/jira/browse/CALCITE-2516
> https://issues.apache.org/jira/browse/CALCITE-2517
> 
> 
> I am wondering whether these JIRAs are valid?
> 
> Thanks,
> Rui



Re: identifying original SQL functions after rewrites (in adapters)

2018-08-31 Thread Julian Hyde
Search for uses of SqlKind.COALESCE. I initially thought that COALESCE was 
being expanded by a convertlet. But it seems that there isn’t such a covertlet. 
More likely that it is being simplified by RexSimplify.simplifyCoalesce.

If the result is significantly simpler, then I don’t think you should worry 
about sending the exact same expression to ES.

Julian


> On Aug 30, 2018, at 2:42 PM, Andrei Sereda  wrote:
> 
> Hello,
> 
> I’m trying to implement COALESCE for elastic adapter and would like to know
> what is recommended approach to identify original function (after possible
> rewrites).
> 
> Let me give an example.
> 
> Currently calcite converts COALESCE(attr, 'foo') into CASE(IS NOT
> NULL(attr), attr, 'foo'). So in adapter I have to traverse RexCall and
> check kind=CASE and operators[0] == IS NOT NULL and operators[1] ==
> INPUT_REF and operators[2] is literal to properly convert this call into
> native elastic operation. Elastic has something similar
> 
> to
> coalesce (missing value) but not a generic case statement.
> 
> What if tomorrow calcite decides to keep coalesce as is or rewrite to CASE(IS
> NULL(attr),'foo', attr) ? It seems to me that trying all possible
> combinations is a wrong thing to do (correct me if I’m wrong).
> 
> How can I make this logic future-proof ?
> 
> Thanks,
> 
> Andrei.



Re: Test class and package name conventions

2018-08-31 Thread Julian Hyde
It seems easier to run all of the tests in a debugger if they are organized 
into a suite.

Also, in the suite they are ordered. The quick ones that are more likely to 
fail are towards the front of the list.

It is true that people forget to add tests to CalciteSuite sometimes.

> On Aug 31, 2018, at 2:04 PM, Vladimir Sitnikov  
> wrote:
> 
> Julian>Add the test class to the annotation in CalciteSuite.java
> 
> Just wondering: what is the purpose of CalciteSuite at all?
> It happens that a new test is not added to CalciteSuite, and it is
> invisible to CI.
> The test does not get executed, and everybody just assumes the test is fine.
> 
> Could we drop CalciteSuite, and just use **Test.java for execution?
> 
> Vladimir



Babel and Reserved keywords

2018-08-31 Thread Julian Hyde
As you may now, we have a ‘Babel’ SQL parser whose mission is to accept SQL of 
the widest possible set of SQL dialects.

One of the things we can improve is to take reserved keywords and make them not 
reserved. They would then have a meaning if used in a specific context, but 
could otherwise be used as identifiers (table names, column names, aliases) 
without being quoted.

I tried making “YEAR” non-reserved, and the experiment was successful. However 
I tried making everything non-reserved, adding all of the reserved keywords[1] 
to the non-reserved list in Babel[2]. But it didn’t work too well.

Can anyone devise a way to figure out the minimal set of reserved keywords?

Julian

[1] 
https://github.com/apache/calcite/blob/master/core/src/test/java/org/apache/calcite/sql/parser/SqlParserTest.java#L94
 


[2] 
https://github.com/apache/calcite/blob/master/babel/src/main/codegen/config.fmpp#L32
 

 

Re: calcite git commit: [CALCITE-2498] fix bug when geode adapter quotes booleans as strings (Andrei Sereda)

2018-09-01 Thread Julian Hyde
When you commit, please make sure the commit message is high quality. A commit 
message should never start with “fix bug when...” and should always start with 
a capital letter. 

Julian

> On Aug 31, 2018, at 6:55 PM, vladimirsitni...@apache.org wrote:
> 
> Repository: calcite
> Updated Branches:
>  refs/heads/master 2817bda61 -> 9589a3606
> 
> 
> [CALCITE-2498] fix bug when geode adapter quotes booleans as strings (Andrei 
> Sereda)
> 
> GeodeFilter was incorrectly quoting boolean literals as SQL strings ('true' 
> instead of true)
> 
> fixes #809
> 
> 
> Project: http://git-wip-us.apache.org/repos/asf/calcite/repo
> Commit: http://git-wip-us.apache.org/repos/asf/calcite/commit/9589a360
> Tree: http://git-wip-us.apache.org/repos/asf/calcite/tree/9589a360
> Diff: http://git-wip-us.apache.org/repos/asf/calcite/diff/9589a360
> 
> Branch: refs/heads/master
> Commit: 9589a360656a752be73fb27ce285cd32b22bc0e0
> Parents: 2817bda
> Author: Andrei Sereda <25229979+asereda...@users.noreply.github.com>
> Authored: Tue Aug 28 18:08:09 2018 -0400
> Committer: Vladimir Sitnikov 
> Committed: Sat Sep 1 04:55:29 2018 +0300
> 
> --
> .../java/org/apache/calcite/adapter/geode/rel/GeodeFilter.java| 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> --
> 
> 
> http://git-wip-us.apache.org/repos/asf/calcite/blob/9589a360/geode/src/main/java/org/apache/calcite/adapter/geode/rel/GeodeFilter.java
> --
> diff --git 
> a/geode/src/main/java/org/apache/calcite/adapter/geode/rel/GeodeFilter.java 
> b/geode/src/main/java/org/apache/calcite/adapter/geode/rel/GeodeFilter.java
> index e4e5ac9..ca0b482 100644
> --- 
> a/geode/src/main/java/org/apache/calcite/adapter/geode/rel/GeodeFilter.java
> +++ 
> b/geode/src/main/java/org/apache/calcite/adapter/geode/rel/GeodeFilter.java
> @@ -36,6 +36,7 @@ import java.util.ArrayList;
> import java.util.Collections;
> import java.util.List;
> 
> +import static org.apache.calcite.sql.type.SqlTypeName.BOOLEAN_TYPES;
> import static org.apache.calcite.sql.type.SqlTypeName.CHAR;
> import static org.apache.calcite.sql.type.SqlTypeName.NUMERIC_TYPES;
> 
> @@ -225,7 +226,7 @@ public class GeodeFilter extends Filter implements 
> GeodeRel {
> private String translateOp2(String op, String name, RexLiteral right) {
>   String valueString = literalValue(right);
>   SqlTypeName typeName = rowType.getField(name, true, 
> false).getType().getSqlTypeName();
> -  if (NUMERIC_TYPES.contains(typeName)) {
> +  if (NUMERIC_TYPES.contains(typeName) || 
> BOOLEAN_TYPES.contains(typeName)) {
> // leave the value as it is
>   } else if (typeName != SqlTypeName.CHAR) {
> valueString = "'" + valueString + "'";
> 


Re: Ordering By Projection Alias in RelToSqlConverter

2018-09-03 Thread Julian Hyde
It sounds like a bug. Can you please log a JIRA case. 

We should consult the target dialect and generate sql according to the rules 
for what is allowed in ORDER BY. 

In calcite, and probably also in hive, “ORDER BY emp.first_name” would be 
valid. But probably “ORDER BY ” is the best solution on dialects that 
support it (and most do). 

Julian

> On Sep 3, 2018, at 09:41, Krishnakant Agrawal  wrote:
> 
> Hi All,
> 
> I am trying to convert a Simple RelNode to SQL Text with Hive as Dialect
> using RelToSqlConverter.
> 
> Problem is if the Order By Key is a Projection which was Aliased, the
> Output query contains the original column name instead of the Alias, which
> is not allowed in Hive as valid Order By keys are projections only.
> 
> For Instance,
> Select first_name as n1 from emp order by first_name; (Failing in hive!)
> 
> Expected SQL,
> Select first_name as n1 from emp order by n1;
> 
> I create the TableScan, Projection(with the alias) & Sort in the mentioned
> order.
> 
> Any leads would be greatly appreciated.
> 
> Thanks & Regards,
> Krishnakant


Re: Maven wrapper

2018-09-04 Thread Julian Hyde
Agreed. Done in latest https://github.com/julianhyde/calcite/tree/2112-mvnw 
<https://github.com/julianhyde/calcite/tree/2112-mvnw>.

It seems that there is consensus that the wrapper is a good thing. I’ll merge 
in the next day or two.



> On Sep 2, 2018, at 3:21 PM, Michael Mior  wrote:
> 
> One thing I didn't initially notice is that using mvnw created a directory
> .mvn in the root of the project. This should probably be added to
> .gitignore.
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> Le ven. 31 août 2018 à 11:05, Michael Mior  a écrit :
> 
>> Works for me on Ubuntu 18.04. Skimmed the doc changes as well and looks
>> good to me.
>> --
>> Michael Mior
>> mm...@apache.org
>> 
>> 
>> 
>> Le jeu. 30 août 2018 à 19:57, Julian Hyde  a écrit :
>> 
>>> Please review https://github.com/julianhyde/calcite/tree/2112-mvnw <
>>> https://github.com/julianhyde/calcite/tree/2112-mvnw>, and give it a try
>>> in your own sandbox.
>>> 
>>> I have built on the original patch. We no longer need to include a .jar
>>> or .java. And I’ve updated the documentation to use ‘./mvnw’ rather than
>>> ‘mvn’.
>>> 
>>> Julian
>>> 
>>> 
>>>> On Aug 28, 2018, at 10:35 AM, Julian Hyde  wrote:
>>>> 
>>>>> On Aug 28, 2018, at 8:10 AM, Josh Elser  wrote:
>>>>> 
>>>>> Is it worthwhile to share the details of that situation with the
>>> community (or are the specifics you provided all that's really relevant)?
>>> Asking to better understand if there is some legitimate criticism of what
>>> Maven lets you do, or if it's something we can make better in Calcite
>>> itself.
>>>> 
>>>> This particular case was a consultant for my company for whom I was
>>> building a custom version of Calcite. The consultant is technical and uses
>>> git all the time, has a JVM installed on his machine (mainly for JRuby),
>>> but does not do Java development, therefore does not have maven.
>>>> 
>>>> Since his machine is macOS it was straightforward to do “brew install
>>> maven”. (Which took about 20 minutes, because he first had to upgrade
>>> home-brew.)
>>>> 
>>>> Clearly it was not that hard for him to install maven, but if we used
>>> mvnw we could remove even that friction.
>>>> 
>>>>> As long as we don't create a schism where some things can only be done
>>> by mvnw, I'm OK with this change.
>>>> 
>>>> I promise that won’t happen.
>>>> 
>>>> I believe that if you have mvn installed, mvnw will use it. Therefore
>>> most developers will continue to use the same path, regardless of whether
>>> they type “mvn” or “./mvnw”. I will continue to type “mvn”.
>>>> 
>>>> Julan
>>> 
>>> 



Re: Sqlline release

2018-09-04 Thread Julian Hyde
We said first week of September. Now it’s the first release of September. I 
think I can make a release candidate for sqlline-1.5.0 in the next few days. 
(As a non-Apache project, there is no formal vote for releases, but I welcome 
feedback.)

There are few issues/PRs[1][2][3] that require jline3, and jline3 only works on 
JDK 8 and higher. I propose to take those PRs immediately AFTER this release. 
Thus, sqlline-1.5 will be the last sqlline release that supports JDK 1.6 or 1.7.

Julian

[1] https://github.com/julianhyde/sqlline/issues/105 
<https://github.com/julianhyde/sqlline/issues/105>

[2] https://github.com/julianhyde/sqlline/pull/115 
<https://github.com/julianhyde/sqlline/pull/115>

[3] https://github.com/julianhyde/sqlline/issues/73 
<https://github.com/julianhyde/sqlline/issues/73>


> On Aug 19, 2018, at 10:22 AM, Sergey Nuyanzin  wrote:
> 
> Thank you for a good news.
> I've added a new PR based on this #86 and I guess in one or two days will
> add one more
> 
> On Sun, Aug 19, 2018 at 7:06 PM Julian Hyde  <mailto:jh...@apache.org>> wrote:
> 
>> I’ve pushed PR #86...
>> https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c
>>  
>> <https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c>
>> <
>> https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c
>>  
>> <https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c>>
>> 
>> 
>> I’ll try to get to your remaining PRs in the next few days. Keep up the
>> good work!
>> 
>> Julian
>> 
>> 
>>> On Aug 18, 2018, at 11:04 AM, Sergey Nuyanzin 
>> wrote:
>>> 
>>> Julian,
>>> thank you very much for merging
>>> 
>>> about PR on which I would like to build to add one more improvement:
>>> currently there is only one
>>> https://github.com/julianhyde/sqlline/pull/86 (commit 340c3b1 )
>>> 
>>>>> Sergey and others, Is first week of September still a good timeframe
>> for
>>> release?
>>> for me yes it is good
>>> 
>>> 
>>> On Sat, Aug 18, 2018 at 8:14 PM Julian Hyde  wrote:
>>> 
>>>> Sergey,
>>>> 
>>>> I see a lot of pull requests for sqlline coming in from you… which is
>>>> excellent! You have said in a previous thread that it would be helpful
>> if
>>>> some of them are merged to master because you want to build on that. So,
>>>> could you give me an ordered list of PR numbers that are ready to merge?
>>>> That would be helpful for me as I try to work through the backlog and
>> merge
>>>> them.
>>>> 
>>>> Sergey and others, Is first week of September still a good timeframe for
>>>> release?
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Aug 9, 2018, at 1:35 AM, Julian Hyde 
>> wrote:
>>>>> 
>>>>> Let’s wait about a month then. Target the first week in September.
>>>>> 
>>>>> (Good to see all these new features Sergey... keep them coming!)
>>>>> 
>>>>> Julian
>>>>> 
>>>>>> On Aug 7, 2018, at 11:44 AM, Sergey Nuyanzin 
>>>> wrote:
>>>>>> 
>>>>>> +1 for release
>>>>>> at the same time I am ok to wait about a month
>>>>>> (I have a few ideas about some more improvements)
>>>>>> 
>>>>>>> On Tue, Aug 7, 2018 at 5:29 AM Julian Hyde 
>>>> wrote:
>>>>>>> 
>>>>>>> (Forgive the cross-posting. The sqlline dev list isn’t very active,
>> and
>>>>>>> many of the sqlline community are in the calcite community. Please
>>>> reply to
>>>>>>> calcite dev only.)
>>>>>>> 
>>>>>>> There have been a number of enhancements to sqlline recently[1]
>>>> (thanks,
>>>>>>> Sergey!). Is it time for a release of sqlline? Or should we plan to
>>>> have a
>>>>>>> release in say a month, to give people time to add more features.
>>>>>>> 
>>>>>>> Julian
>>>>>>> 
>>>>>>> [1] https://github.com/julianhyde/sqlline/commits/master
>>>>>>> 
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>> Groups
>>>>>>> "sqlline-dev" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>> send
>>>> an
>>>>>>> email to sqlline-dev+unsubscr...@googlegroups.com.
>>>>>>> To post to this group, send email to sqlline-...@googlegroups.com.
>>>>>>> To view this discussion on the web visit
>>>>>>> 
>>>> 
>> https://groups.google.com/d/msgid/sqlline-dev/32DBFC47-F2D7-4DB4-9B32-3F36B54296C4%40gmail.com
>>>>>>> <
>>>> 
>> https://groups.google.com/d/msgid/sqlline-dev/32DBFC47-F2D7-4DB4-9B32-3F36B54296C4%40gmail.com?utm_medium=email&utm_source=footer
>>>>> 
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best regards,
>>>>>> Sergey
>>>> 
>>>> 
>>> 
>>> --
>>> Best regards,
>>> Sergey
>> 
>> 
> 
> -- 
> Best regards,
> Sergey



Re: [Discuss] Make flattening on Struct/Row optional

2018-09-05 Thread Julian Hyde
Flattening was introduced mainly because the original engine used flat
column-oriented storage. Now we have several ways to executing,
including generating java code.

Adding a mode to disable flattening might make sense.
On Tue, Sep 4, 2018 at 12:52 PM Rui Wang  wrote:
>
> Hi Community,
>
> While trying to support Row type in Apache Beam SQL on top of Calcite, I
> realized flattening Row logic will make structure information of Row lost
> after Projections. There is a use case where users want to mix Beam
> programming model with Beam SQL together to process a dataset. The
> following is an example of the use case:
>
> dataset.apply(something user defined)
> .apply(SELECT ...)
> .apply(something user defined)
>
> As you can see, after the SQL statement is applied, the data structure
> should be preserved for further processing.
>
> The most straightforward way to me is to make Struct fattening optional so
> I could choose to disable it and the Row structure is preserved. Can I ask
> if it is feasible to make it happen? What could happen if Calcite just
> doesn't flatten Struct in flattener? (I tried to disable it but had
> exceptions in optimizer. I wasn't sure if that were some minor thing to fix
> or Struct flattening was a design choice so the impact of change was huge)
>
> Additionally, if there is a way to keep the information that I can use to
> reconstruct the Row after projections, it might be ok as well. Does this
> idea exist in Calcite? If it does not exist, how is this idea compared with
> disabling Struct flattening?
>
> Thanks,
> Rui


Re: [VOTE] Release calcite-avatica-go-3.1.0 (release candidate 1)

2018-09-05 Thread Julian Hyde
The .tar.gz contains a .idea directory that is not in source-control:

.idea/
.idea/go.imports.xml
.idea/inspectionProfiles
.idea/workspace.xml
.idea/vcs.xml
.idea/dataSources
.idea/dataSources/8818a189-beb8-4519-a1af-aec6a92afd20.xml
.idea/dataSources/ed276130-2ebd-463d-b1a0-78e39ca4d29f.xml
.idea/calcite-avatica-go.iml
.idea/watcherTasks.xml
.idea/dataSources.local.xml
.idea/usage.statistics.xml
.idea/misc.xml
.idea/dataSources.xml
.idea/modules.xml

In NOTICE, copyright should be 2012-2018 not 2012-2017.

Due to these issues, my vote is -1.

Checked LICENSE, README.md, checksums/signatures, file headers.

I did not build or run tests, or check that dependencies have suitable
licenses. I don't know enough about Go. Can someone else please do
that.

Julian

On Tue, Sep 4, 2018 at 3:38 PM Francis Chuang  wrote:
>
> Hi All,
>
> Just a reminder that Apache Calcite Avatica Go 3.1.0-rc1 is available
> for voting. If you have a chance, please try to run the tests, verify
> the signature and vote. Your vote will be extremely useful to determine
> if this release candidate is suitable to release.
>
> Francis
>
> On 31/08/2018 10:35 AM, Francis Chuang wrote:
> > Hi all,
> >
> > I have created a release for Apache Calcite Avatica Go 3.1.0, release
> > candidate 1.
> >
> > The release notes are available here:
> > https://github.com/apache/calcite-avatica-go/blob/master/site/_docs/go_history.md
> >
> > The commit to be voted on:
> > http://git-wip-us.apache.org/repos/asf/calcite-avatica-go/commit/f564cfa42948b9a5c4c7f98f3e43ab5971bcaeda
> >
> > The hash is f564cfa42948b9a5c4c7f98f3e43ab5971bcaeda
> >
> > The artifacts to be voted on are located here:
> > https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-avatica-go-3.1.0-rc1/
> >
> > The hashes of the artifacts are as follows:
> >
> > src.tar.gz.sha256 64BC4D9D 197457DC B0573F12 4A75A298 CB5207D6 FDC14DCB 
> > 675FF731 0555F5A3
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/francischuang.asc
> >
> > Instructions for running the test suite is located here:
> > https://github.com/apache/calcite-avatica-go/blob/master/site/develop/avatica-go.md#testing
> >
> > Please vote on releasing this package as Apache Calcite Avatica Go 3.1.0.
> >
> > To run the tests without a Go environment, install docker and 
> > docker-compose. Then, in the root of the release's directory, run:
> > docker-compose up --build
> >
> > The vote is open for the next 72 hours and passes if a majority of
> > at least three +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Calcite Avatica Go 3.0.0
> > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > [ ] -1 Do not release this package because...
> >
> >
> > Here is my vote:
> >
> > +1 (binding)
> >
> > Francis
> >
>


Re: [VOTE] Release calcite-avatica-go-3.1.0 (release candidate 1)

2018-09-05 Thread Julian Hyde
You could use

 tar ... $(git ls-files)

Easier than using .gitignore I think.  

> On Sep 5, 2018, at 2:28 AM, Francis Chuang  wrote:
> 
> Thanks for voting, Julian.
> 
> I will update the release script to use the .gitignore file to exclude 
> folders when building the tar.gz.
> 
> I will also add a check to check if the year == current year when building a 
> release.
> 
> If you have docker on your machine, you can run "docker-compose up --build" 
> from the root directory to run tests. I recommend having at least 8GB of RAM, 
> as Phoenix is quite heavy.
> 
> I will cancel this vote and start a new one tomorrow.
> 
>> On 5/09/2018 6:21 PM, Julian Hyde wrote:
>> The .tar.gz contains a .idea directory that is not in source-control:
>> 
>> .idea/
>> .idea/go.imports.xml
>> .idea/inspectionProfiles
>> .idea/workspace.xml
>> .idea/vcs.xml
>> .idea/dataSources
>> .idea/dataSources/8818a189-beb8-4519-a1af-aec6a92afd20.xml
>> .idea/dataSources/ed276130-2ebd-463d-b1a0-78e39ca4d29f.xml
>> .idea/calcite-avatica-go.iml
>> .idea/watcherTasks.xml
>> .idea/dataSources.local.xml
>> .idea/usage.statistics.xml
>> .idea/misc.xml
>> .idea/dataSources.xml
>> .idea/modules.xml
>> 
>> In NOTICE, copyright should be 2012-2018 not 2012-2017.
>> 
>> Due to these issues, my vote is -1.
>> 
>> Checked LICENSE, README.md, checksums/signatures, file headers.
>> 
>> I did not build or run tests, or check that dependencies have suitable
>> licenses. I don't know enough about Go. Can someone else please do
>> that.
>> 
>> Julian
>> 
>>> On Tue, Sep 4, 2018 at 3:38 PM Francis Chuang  
>>> wrote:
>>> Hi All,
>>> 
>>> Just a reminder that Apache Calcite Avatica Go 3.1.0-rc1 is available
>>> for voting. If you have a chance, please try to run the tests, verify
>>> the signature and vote. Your vote will be extremely useful to determine
>>> if this release candidate is suitable to release.
>>> 
>>> Francis
>>> 
>>>> On 31/08/2018 10:35 AM, Francis Chuang wrote:
>>>> Hi all,
>>>> 
>>>> I have created a release for Apache Calcite Avatica Go 3.1.0, release
>>>> candidate 1.
>>>> 
>>>> The release notes are available here:
>>>> https://github.com/apache/calcite-avatica-go/blob/master/site/_docs/go_history.md
>>>> 
>>>> The commit to be voted on:
>>>> http://git-wip-us.apache.org/repos/asf/calcite-avatica-go/commit/f564cfa42948b9a5c4c7f98f3e43ab5971bcaeda
>>>> 
>>>> The hash is f564cfa42948b9a5c4c7f98f3e43ab5971bcaeda
>>>> 
>>>> The artifacts to be voted on are located here:
>>>> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-avatica-go-3.1.0-rc1/
>>>> 
>>>> The hashes of the artifacts are as follows:
>>>> 
>>>> src.tar.gz.sha256 64BC4D9D 197457DC B0573F12 4A75A298 CB5207D6 FDC14DCB 
>>>> 675FF731 0555F5A3
>>>> 
>>>> Release artifacts are signed with the following key:
>>>> https://people.apache.org/keys/committer/francischuang.asc
>>>> 
>>>> Instructions for running the test suite is located here:
>>>> https://github.com/apache/calcite-avatica-go/blob/master/site/develop/avatica-go.md#testing
>>>> 
>>>> Please vote on releasing this package as Apache Calcite Avatica Go 3.1.0.
>>>> 
>>>> To run the tests without a Go environment, install docker and 
>>>> docker-compose. Then, in the root of the release's directory, run:
>>>> docker-compose up --build
>>>> 
>>>> The vote is open for the next 72 hours and passes if a majority of
>>>> at least three +1 PMC votes are cast.
>>>> 
>>>> [ ] +1 Release this package as Apache Calcite Avatica Go 3.0.0
>>>> [ ]  0 I don't feel strongly about it, but I'm okay with the release
>>>> [ ] -1 Do not release this package because...
>>>> 
>>>> 
>>>> Here is my vote:
>>>> 
>>>> +1 (binding)
>>>> 
>>>> Francis
>>>> 
> 


Re: [Discuss] Make flattening on Struct/Row optional

2018-09-05 Thread Julian Hyde
It might not be minor, but it’s worth a try. At optimization time we treat all 
fields as fields, regardless of whether they have complex types (maps, arrays, 
multisets, records) so there should not be too many problems. The flattening 
was mainly for the benefit of the runtime.


> On Sep 5, 2018, at 11:32 AM, Rui Wang  wrote:
> 
> Thanks for your helpful response! It seems like disabling the flattening
> will at least affect some rules in optimization. It might not be a minor
> change.
> 
> 
> -Rui
> 
> On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis 
> wrote:
> 
>> Hi Rui,
>> 
>> Disabling flattening in some cases seems reasonable.
>> 
>> If I am not mistaken, even in the existing code it is not used all the time
>> so it makes sense to become configurable.
>> For example, Calcite prepared statements (CalcitePrepareImpl) are using the
>> flattener only for DDL operations that create materialized views (and this
>> is because this code at some point passes from the PlannerImpl).
>> On the other hand, any query that is using the Planner will also pass from
>> the flattener.
>> 
>> Disabling the flattener does not mean that all rules will work without
>> problems. The Javadoc of the RelStructuredTypeFlattener at some point says
>> "This approach has the benefit that real optimizer and codegen rules never
>> have to deal with structured types.". Due to this, it is very likely that
>> some rules were written based on the fact that there are no structured
>> types.
>> 
>> Best,
>> Stamatis
>> 
>> 
>> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde 
>> έγραψε:
>> 
>>> Flattening was introduced mainly because the original engine used flat
>>> column-oriented storage. Now we have several ways to executing,
>>> including generating java code.
>>> 
>>> Adding a mode to disable flattening might make sense.
>>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang 
>>> wrote:
>>>> 
>>>> Hi Community,
>>>> 
>>>> While trying to support Row type in Apache Beam SQL on top of Calcite,
>> I
>>>> realized flattening Row logic will make structure information of Row
>> lost
>>>> after Projections. There is a use case where users want to mix Beam
>>>> programming model with Beam SQL together to process a dataset. The
>>>> following is an example of the use case:
>>>> 
>>>> dataset.apply(something user defined)
>>>>.apply(SELECT ...)
>>>>.apply(something user defined)
>>>> 
>>>> As you can see, after the SQL statement is applied, the data structure
>>>> should be preserved for further processing.
>>>> 
>>>> The most straightforward way to me is to make Struct fattening optional
>>> so
>>>> I could choose to disable it and the Row structure is preserved. Can I
>>> ask
>>>> if it is feasible to make it happen? What could happen if Calcite just
>>>> doesn't flatten Struct in flattener? (I tried to disable it but had
>>>> exceptions in optimizer. I wasn't sure if that were some minor thing to
>>> fix
>>>> or Struct flattening was a design choice so the impact of change was
>>> huge)
>>>> 
>>>> Additionally, if there is a way to keep the information that I can use
>> to
>>>> reconstruct the Row after projections, it might be ok as well. Does
>> this
>>>> idea exist in Calcite? If it does not exist, how is this idea compared
>>> with
>>>> disabling Struct flattening?
>>>> 
>>>> Thanks,
>>>> Rui
>>> 
>> 



Re: [2/2] calcite-avatica-go git commit: Update hash in release announcement

2018-09-05 Thread Julian Hyde
My two cents on the last two changes:
 * File names with spaces are so toxic (at least to those of us who use Stone 
Age tools such as ‘sed’) that I’d be tempted to do a force-push;
 * I don’t add a release announcement until after the vote has passed.

But then I’m pedantic about maintaining a clean commit history that (on 
occasion) covers up my past mistakes.

Julian

> On Sep 5, 2018, at 3:56 PM, francischu...@apache.org wrote:
> 
> Update hash in release announcement
> 
> 
> Project: http://git-wip-us.apache.org/repos/asf/calcite-avatica-go/repo
> Commit: 
> http://git-wip-us.apache.org/repos/asf/calcite-avatica-go/commit/7348c3bf
> Tree: http://git-wip-us.apache.org/repos/asf/calcite-avatica-go/tree/7348c3bf
> Diff: http://git-wip-us.apache.org/repos/asf/calcite-avatica-go/diff/7348c3bf
> 
> Branch: refs/heads/master
> Commit: 7348c3bf1d76b32fd795857259c18718860d61c8
> Parents: 0e1ae23
> Author: Francis Chuang 
> Authored: Thu Sep 6 08:56:06 2018 +1000
> Committer: Francis Chuang 
> Committed: Thu Sep 6 08:56:06 2018 +1000
> 
> --
> site/_posts/2018-09-10-release-avatica-go-3.1.0.md | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> --
> 
> 
> http://git-wip-us.apache.org/repos/asf/calcite-avatica-go/blob/7348c3bf/site/_posts/2018-09-10-release-avatica-go-3.1.0.md
> --
> diff --git a/site/_posts/2018-09-10-release-avatica-go-3.1.0.md 
> b/site/_posts/2018-09-10-release-avatica-go-3.1.0.md
> index 45fd7bd..e232f79 100644
> --- a/site/_posts/2018-09-10-release-avatica-go-3.1.0.md
> +++ b/site/_posts/2018-09-10-release-avatica-go-3.1.0.md
> @@ -5,7 +5,7 @@ author: francischuang
> version: 3.1.0
> categories: [release]
> tag: v3-1-0
> -sha: d328101
> +sha: 0e1ae23
> component: avatica-go
> ---
> 

Re: [2/2] calcite-avatica-go git commit: Update hash in release announcement

2018-09-05 Thread Julian Hyde
Force-pushing is considered anti-social most of the time, but it is allowed. 
Use your discretion.

> On Sep 5, 2018, at 4:08 PM, Francis Chuang  wrote:
> 
> I wasn't sure that force pushing to master would work with the Apache git 
> repo. I'll keep this in mind next time (along with posting the announcement 
> after release).
> 
> On 6/09/2018 9:05 AM, Julian Hyde wrote:
>> My two cents on the last two changes:
>>  * File names with spaces are so toxic (at least to those of us who use 
>> Stone Age tools such as ‘sed’) that I’d be tempted to do a force-push;
>>  * I don’t add a release announcement until after the vote has passed.
>> 
>> But then I’m pedantic about maintaining a clean commit history that (on 
>> occasion) covers up my past mistakes.
>> 
>> Julian
>> 
>>> On Sep 5, 2018, at 3:56 PM, francischu...@apache.org wrote:
>>> 
>>> Update hash in release announcement
>>> 
>>> 
>>> Project: http://git-wip-us.apache.org/repos/asf/calcite-avatica-go/repo
>>> Commit: 
>>> http://git-wip-us.apache.org/repos/asf/calcite-avatica-go/commit/7348c3bf
>>> Tree: 
>>> http://git-wip-us.apache.org/repos/asf/calcite-avatica-go/tree/7348c3bf
>>> Diff: 
>>> http://git-wip-us.apache.org/repos/asf/calcite-avatica-go/diff/7348c3bf
>>> 
>>> Branch: refs/heads/master
>>> Commit: 7348c3bf1d76b32fd795857259c18718860d61c8
>>> Parents: 0e1ae23
>>> Author: Francis Chuang 
>>> Authored: Thu Sep 6 08:56:06 2018 +1000
>>> Committer: Francis Chuang 
>>> Committed: Thu Sep 6 08:56:06 2018 +1000
>>> 
>>> --
>>> site/_posts/2018-09-10-release-avatica-go-3.1.0.md | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> --
>>> 
>>> 
>>> http://git-wip-us.apache.org/repos/asf/calcite-avatica-go/blob/7348c3bf/site/_posts/2018-09-10-release-avatica-go-3.1.0.md
>>> --
>>> diff --git a/site/_posts/2018-09-10-release-avatica-go-3.1.0.md 
>>> b/site/_posts/2018-09-10-release-avatica-go-3.1.0.md
>>> index 45fd7bd..e232f79 100644
>>> --- a/site/_posts/2018-09-10-release-avatica-go-3.1.0.md
>>> +++ b/site/_posts/2018-09-10-release-avatica-go-3.1.0.md
>>> @@ -5,7 +5,7 @@ author: francischuang
>>> version: 3.1.0
>>> categories: [release]
>>> tag: v3-1-0
>>> -sha: d328101
>>> +sha: 0e1ae23
>>> component: avatica-go
>>> ---
>>> 

Re: calcite git commit: Reduce HepPlannerTest#testRuleApplyCount complexity

2018-09-06 Thread Julian Hyde
Dammit, Vladimir. Stop removing tests!!

We need that complexity to make things break.

You seem to have a personal agenda to make the test suite run in under a 
minute. That goal cannot be met. Calcite is a complex piece of software. 

Julian



> On Sep 6, 2018, at 8:54 AM, vladimirsitni...@apache.org wrote:
> 
> Repository: calcite
> Updated Branches:
>  refs/heads/master 3df638c9d -> 88f125541
> 
> 
> Reduce HepPlannerTest#testRuleApplyCount complexity
> 
> 
> Project: http://git-wip-us.apache.org/repos/asf/calcite/repo
> Commit: http://git-wip-us.apache.org/repos/asf/calcite/commit/88f12554
> Tree: http://git-wip-us.apache.org/repos/asf/calcite/tree/88f12554
> Diff: http://git-wip-us.apache.org/repos/asf/calcite/diff/88f12554
> 
> Branch: refs/heads/master
> Commit: 88f125541a2875f693a02dbbd12ad5184124bafa
> Parents: 3df638c
> Author: Vladimir Sitnikov 
> Authored: Thu Sep 6 18:53:47 2018 +0300
> Committer: Vladimir Sitnikov 
> Committed: Thu Sep 6 18:53:47 2018 +0300
> 
> --
> .../org/apache/calcite/test/HepPlannerTest.java | 86 +---
> 1 file changed, 2 insertions(+), 84 deletions(-)
> --
> 
> 
> http://git-wip-us.apache.org/repos/asf/calcite/blob/88f12554/core/src/test/java/org/apache/calcite/test/HepPlannerTest.java
> --
> diff --git a/core/src/test/java/org/apache/calcite/test/HepPlannerTest.java 
> b/core/src/test/java/org/apache/calcite/test/HepPlannerTest.java
> index dea4c16..8eb78d5 100644
> --- a/core/src/test/java/org/apache/calcite/test/HepPlannerTest.java
> +++ b/core/src/test/java/org/apache/calcite/test/HepPlannerTest.java
> @@ -77,85 +77,6 @@ public class HepPlannerTest extends RelOptTestBase {
>   + "  select ENAME, 350401 as cat_id, '18' as cat_name, 0 as 
> require_free_postage, 1 as require_15return, 1 as require_48hour,0 as 
> require_insurance from emp where EMPNO = 20171216 and MGR = 0 and ENAME = 'Y' 
> and SAL = 350401 union all\n"
>   + "  select ENAME, 50015560 as cat_id, '19' as cat_name, 0 as 
> require_free_postage, 0 as require_15return, 0 as require_48hour,0 as 
> require_insurance from emp where EMPNO = 20171216 and MGR = 0 and ENAME = 'Y' 
> and SAL = 50015560 union all\n"
>   + "  select ENAME, 122658003 as cat_id, '20' as cat_name, 0 as 
> require_free_postage, 1 as require_15return, 1 as require_48hour,0 as 
> require_insurance from emp where EMPNO = 20171216 and MGR = 0 and ENAME = 'Y' 
> and SAL = 122658003 union all\n"
> -  + "  select ENAME, 122716008 as cat_id, '21' as cat_name, 0 as 
> require_free_postage, 1 as require_15return, 1 as require_48hour,0 as 
> require_insurance from emp where EMPNO = 20171216 and MGR = 0 and ENAME = 'Y' 
> and SAL = 122716008 union all\n"
> -  + "  select ENAME, 50018406 as cat_id, '22' as cat_name, 0 as 
> require_free_postage, 1 as require_15return, 1 as require_48hour,0 as 
> require_insurance from emp where EMPNO = 20171216 and MGR = 0 and ENAME = 'Y' 
> and SAL = 50018406 union all\n"
> -  + "  select ENAME, 50018407 as cat_id, '23' as cat_name, 0 as 
> require_free_postage, 1 as require_15return, 1 as require_48hour,0 as 
> require_insurance from emp where EMPNO = 20171216 and MGR = 0 and ENAME = 'Y' 
> and SAL = 50018407 union all\n"
> -  + "  select ENAME, 50024678 as cat_id, '24' as cat_name, 0 as 
> require_free_postage, 1 as require_15return, 1 as require_48hour,0 as 
> require_insurance from emp where EMPNO = 20171216 and MGR = 0 and ENAME = 'Y' 
> and SAL = 50024678 union all\n"
> -  + "  select ENAME, 50022290 as cat_id, '25' as cat_name, 0 as 
> require_free_postage, 1 as require_15return, 1 as require_48hour,0 as 
> require_insurance from emp where EMPNO = 20171216 and MGR = 0 and ENAME = 'Y' 
> and SAL = 50022290 union all\n"
> -  + "  select ENAME, 50020072 as cat_id, '26' as cat_name, 0 as 
> require_free_postage, 1 as require_15return, 1 as require_48hour,0 as 
> require_insurance from emp where EMPNO = 20171216 and MGR = 0 and ENAME = 'Y' 
> and SAL = 50020072 union all\n"
> -  + "  select ENAME, 50024679 as cat_id, '27' as cat_name, 1 as 
> require_free_postage, 1 as require_15return, 1 as require_48hour,0 as 
> require_insurance from emp where EMPNO = 20171216 and MGR = 0 and ENAME = 'Y' 
> and SAL = 50024679 union all\n"
> -  + "  select ENAME, 50013326 as cat_id, '28' as cat_name, 1 as 
> require_free_postage, 1 as require_15return, 1 as require_48hour,0 as 
> require_insurance from emp where EMPNO = 20171216 and MGR = 0 and ENAME = 'Y' 
> and SAL = 50013326 union all\n"
> -  + "  select ENAME, 50020032 as cat_id, '19' as cat_name, 0 as 
> require_free_postage, 1 as require_15return, 1 as require_48hour,0 as 
> require_insurance from emp where EMPNO = 20171216 and MGR = 0 and ENAME = 'Y' 
> and SAL = 50020032 union all\n"
> -  + "  selec

Re: Correctness of SortJoinTransposeRule

2018-09-06 Thread Julian Hyde
Yes, it depends very much on the operator. Some examples:
Merge join typically requires inputs to be sorted, and preserves that order. 
(But some outer joins may throw in null values out of order.)
Map join typically preserves the order of the probing side, not the build side.
Hash join typically destroys the order of both sides.
Use the rule with caution.

Julian


> On Sep 6, 2018, at 9:33 AM, Stamatis Zampetakis  wrote:
> 
> Hello,
> 
> I noticed that there is a Calcite rule (i.e., SortJoinTransposeRule) that
> pushes a LogicalSort past a LogicalJoin if the join is either left outer or
> right outer.
> 
> Who guarantees that the left and right outer joins are preserving the order
> of the inputs?
> Does the SQL standard requires that these types of joins are order
> preserving?
> 
> Since we are working with logical operators, I would tend to think that we
> cannot assume anything about the physical equivalent.
> 
> Best,
> Stamatis



Re: calcite git commit: Reduce HepPlannerTest#testRuleApplyCount complexity

2018-09-06 Thread Julian Hyde
You are incorrectly assuming that the sole purpose of that test is to test that 
particular fix. Yes, it was introduced in that change. But it tests a whole lot 
else, including things that are currently working but may stop working in 
future. Or things that work with reasonable performance now and may get slower 
in future.

Complex tests are hard to find, and I was delighted when someone gave us that 
huge query. (Have you seen real-world SQL queries? Some are many thousands of 
lines long. But people rarely contribute them because they depend on 
proprietary models or data.)

CPU time is cheap. Developer time is expensive.

Julian


> On Sep 6, 2018, at 10:12 AM, Vladimir Sitnikov  
> wrote:
> 
> Julian>Dammit, Vladimir. Stop removing tests!!
> 
> The test is there, and it still verifies that DEPTH_FIRST approach is
> better. I did not remove the test.
> 
> Julian>You seem to have a personal agenda to make the test suite run in
> under a minute
> 
> The agenda is to reduce test suite execution while still keeping test
> quality.
> 
> Vladimir



Re: Correctness of SortJoinTransposeRule

2018-09-06 Thread Julian Hyde
Ah, that makes sense.

Reading the code, I couldn’t figure out why it applies to LEFT and RIGHT but 
not to INNER. (For some kinds of join, for example inner merge join, it could 
push the sort to both sides, as long as the sort was compatible with what is 
needed to ensure that the keys arrive at the right time.)

If needed, we could have a variant of the rule that omits the Sort after the 
Join. Or perhaps we leave the Sort and have a rule that notices the output 
order of the Join and, based on that, weakens[1] or removes the Sort.

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-2540 
<https://issues.apache.org/jira/browse/CALCITE-2540>



> On Sep 6, 2018, at 10:08 AM, Jesus Camacho Rodriguez 
>  wrote:
> 
> If I remember correctly, the rule pushes the Sort through the Join (if 
> possible), but it also preserves the Sort on top of the Join to ensure 
> correctness.
> 
> -Jesús
> 
> 
> On 9/6/18, 9:57 AM, "Julian Hyde"  wrote:
> 
>Yes, it depends very much on the operator. Some examples:
>Merge join typically requires inputs to be sorted, and preserves that 
> order. (But some outer joins may throw in null values out of order.)
>Map join typically preserves the order of the probing side, not the build 
> side.
>Hash join typically destroys the order of both sides.
>Use the rule with caution.
> 
>Julian
> 
> 
>> On Sep 6, 2018, at 9:33 AM, Stamatis Zampetakis  wrote:
>> 
>> Hello,
>> 
>> I noticed that there is a Calcite rule (i.e., SortJoinTransposeRule) that
>> pushes a LogicalSort past a LogicalJoin if the join is either left outer or
>> right outer.
>> 
>> Who guarantees that the left and right outer joins are preserving the order
>> of the inputs?
>> Does the SQL standard requires that these types of joins are order
>> preserving?
>> 
>> Since we are working with logical operators, I would tend to think that we
>> cannot assume anything about the physical equivalent.
>> 
>> Best,
>> Stamatis
> 
> 
> 



Re: calcite git commit: Reduce HepPlannerTest#testRuleApplyCount complexity

2018-09-06 Thread Julian Hyde
You don’t have to sit and stare at your computer while the suite is running. Go 
and have a cup of coffee. Work on another bug.

Or re-organize the suite so that the slow parts get run nightly. But don’t do 
what you just did, which is throw stuff away.

Julian


> On Sep 6, 2018, at 10:23 AM, Vladimir Sitnikov  
> wrote:
> 
>> CPU time is cheap. Developer time is expensive
> 
> That is why known to be slow tests must not be executed in the default test
> suite.
> 
> Vladimir



Re: VARCHAR literals

2018-09-06 Thread Julian Hyde
Does https://issues.apache.org/jira/browse/CALCITE-2321 
 help?


> On Sep 6, 2018, at 4:03 AM, Piotr Nowojski  wrote:
> 
> Hi,
> 
> We have small problem with CHAR type in Flink. Officially we do not support 
> it and all input/output columns are of type VARCHAR. Because of that, nobody 
> has ever thought about CHAR semantic (for example correctly handling padding 
> in comparisons or other functions). However this collides with a teeny tiny 
> problem that in Calcite string literals are of type CHAR. This leads to not 
> so funny inconsistencies in queries and incorrect results.
> 
> I wonder if we could provide a switch in Calcite to change the type of String 
> literals to VARCHAR? I know that this is against the SQL standard, however 
> quite a few databases are doing so for various reasons . One of them is that 
> providing proper CHAR support can be tricky and nowadays it doesn’t provide 
> much value to the user. I have seen quite often the pattern that some new db 
> starts without CHAR support at all and add it only later (if ever). Providing 
> such switch in Calcite would allow Calcite users do the same thing.
> 
> I was thinking about alternative workaround - rewriting query plan and 
> changing all of the CHAR types to VARCHAR on our side, but this seems like 
> not that easy thing to do. But maybe I’m wrong and there is an easy way to do 
> so on our side?
> 
> Thanks, Piotrek



Re: Correctness of SortJoinTransposeRule

2018-09-06 Thread Julian Hyde
I’d forgotten about LIMIT. 

The rule could be extended to push limit through inner join if there is a 
foreign key (e.g. we know that the join is a lookup that does not 
increase/decrease the number of rows).

And the rule could also be extended to deal with Sort operators that do not 
have a limit. Limits seriously constrain what the rule can safely do, and if 
there is no limit, we can safely push through inner join.

Julian


> On Sep 6, 2018, at 10:39 AM, Jesus Camacho Rodriguez 
>  wrote:
> 
> The idea for that rule was to be able to exploit the limit/fetch spec of the 
> Sort operator to reduce the number of rows that needed to be joined, that is 
> why it was only applied to LEFT/RIGHT outer join.
> 
> I think option 2 below sounds better than creating a new rule variant.
> 
> Thanks,
> Jesús
> 
> 
> On 9/6/18, 10:28 AM, "Julian Hyde"  wrote:
> 
>Ah, that makes sense.
> 
>Reading the code, I couldn’t figure out why it applies to LEFT and RIGHT 
> but not to INNER. (For some kinds of join, for example inner merge join, it 
> could push the sort to both sides, as long as the sort was compatible with 
> what is needed to ensure that the keys arrive at the right time.)
> 
>If needed, we could have a variant of the rule that omits the Sort after 
> the Join. Or perhaps we leave the Sort and have a rule that notices the 
> output order of the Join and, based on that, weakens[1] or removes the Sort.
> 
>Julian
> 
>[1] https://issues.apache.org/jira/browse/CALCITE-2540 
> <https://issues.apache.org/jira/browse/CALCITE-2540>
> 
> 
> 
>> On Sep 6, 2018, at 10:08 AM, Jesus Camacho Rodriguez 
>>  wrote:
>> 
>> If I remember correctly, the rule pushes the Sort through the Join (if 
>> possible), but it also preserves the Sort on top of the Join to ensure 
>> correctness.
>> 
>> -Jesús
>> 
>> 
>> On 9/6/18, 9:57 AM, "Julian Hyde"  wrote:
>> 
>>   Yes, it depends very much on the operator. Some examples:
>>   Merge join typically requires inputs to be sorted, and preserves that 
>> order. (But some outer joins may throw in null values out of order.)
>>   Map join typically preserves the order of the probing side, not the build 
>> side.
>>   Hash join typically destroys the order of both sides.
>>   Use the rule with caution.
>> 
>>   Julian
>> 
>> 
>>> On Sep 6, 2018, at 9:33 AM, Stamatis Zampetakis  wrote:
>>> 
>>> Hello,
>>> 
>>> I noticed that there is a Calcite rule (i.e., SortJoinTransposeRule) that
>>> pushes a LogicalSort past a LogicalJoin if the join is either left outer or
>>> right outer.
>>> 
>>> Who guarantees that the left and right outer joins are preserving the order
>>> of the inputs?
>>> Does the SQL standard requires that these types of joins are order
>>> preserving?
>>> 
>>> Since we are working with logical operators, I would tend to think that we
>>> cannot assume anything about the physical equivalent.
>>> 
>>> Best,
>>> Stamatis
>> 
>> 
>> 
> 
> 
> 



Re: Sqlline release

2018-09-06 Thread Julian Hyde
We’re very close to an RC for sqlline-1.5.0. Arina is working to finalize 
https://github.com/julianhyde/sqlline/issues/106 
<https://github.com/julianhyde/sqlline/issues/106>. 

This is going to be the biggest sqlline release for a long time. (Special 
thanks to Sergey Nuyanzin for many, many high-quality PRs.)

If you’re in the Calcite community and would like to help, please give the new 
sqlline a try and log any bugs you see. Get the latest sqlline master, run ‘mvn 
install’, then in your Calcite pom.xml change sqlline.version from 1.4.0 to 
1.5.0-SNAPSHOT. You may need to ‘rm target/fullclasspath.txt’ before you run 
‘./sqlline’.

Julian


> On Sep 4, 2018, at 2:00 PM, Julian Hyde  wrote:
> 
> We said first week of September. Now it’s the first release of September. I 
> think I can make a release candidate for sqlline-1.5.0 in the next few days. 
> (As a non-Apache project, there is no formal vote for releases, but I welcome 
> feedback.)
> 
> There are few issues/PRs[1][2][3] that require jline3, and jline3 only works 
> on JDK 8 and higher. I propose to take those PRs immediately AFTER this 
> release. Thus, sqlline-1.5 will be the last sqlline release that supports JDK 
> 1.6 or 1.7.
> 
> Julian
> 
> [1] https://github.com/julianhyde/sqlline/issues/105 
> <https://github.com/julianhyde/sqlline/issues/105>
> 
> [2] https://github.com/julianhyde/sqlline/pull/115 
> <https://github.com/julianhyde/sqlline/pull/115>
> 
> [3] https://github.com/julianhyde/sqlline/issues/73 
> <https://github.com/julianhyde/sqlline/issues/73>
> 
> 
>> On Aug 19, 2018, at 10:22 AM, Sergey Nuyanzin > <mailto:snuyan...@gmail.com>> wrote:
>> 
>> Thank you for a good news.
>> I've added a new PR based on this #86 and I guess in one or two days will
>> add one more
>> 
>> On Sun, Aug 19, 2018 at 7:06 PM Julian Hyde > <mailto:jh...@apache.org>> wrote:
>> 
>>> I’ve pushed PR #86...
>>> https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c
>>>  
>>> <https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c>
>>> <
>>> https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c
>>>  
>>> <https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c>>
>>> 
>>> 
>>> I’ll try to get to your remaining PRs in the next few days. Keep up the
>>> good work!
>>> 
>>> Julian
>>> 
>>> 
>>>> On Aug 18, 2018, at 11:04 AM, Sergey Nuyanzin >>> <mailto:snuyan...@gmail.com>>
>>> wrote:
>>>> 
>>>> Julian,
>>>> thank you very much for merging
>>>> 
>>>> about PR on which I would like to build to add one more improvement:
>>>> currently there is only one
>>>> https://github.com/julianhyde/sqlline/pull/86 
>>>> <https://github.com/julianhyde/sqlline/pull/86> (commit 340c3b1 )
>>>> 
>>>>>> Sergey and others, Is first week of September still a good timeframe
>>> for
>>>> release?
>>>> for me yes it is good
>>>> 
>>>> 
>>>> On Sat, Aug 18, 2018 at 8:14 PM Julian Hyde >>> <mailto:jh...@apache.org>> wrote:
>>>> 
>>>>> Sergey,
>>>>> 
>>>>> I see a lot of pull requests for sqlline coming in from you… which is
>>>>> excellent! You have said in a previous thread that it would be helpful
>>> if
>>>>> some of them are merged to master because you want to build on that. So,
>>>>> could you give me an ordered list of PR numbers that are ready to merge?
>>>>> That would be helpful for me as I try to work through the backlog and
>>> merge
>>>>> them.
>>>>> 
>>>>> Sergey and others, Is first week of September still a good timeframe for
>>>>> release?
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>>> On Aug 9, 2018, at 1:35 AM, Julian Hyde >>>>> <mailto:jhyde.apa...@gmail.com>>
>>> wrote:
>>>>>> 
>>>>>> Let’s wait about a month then. Target the first week in September.
>>>>>> 
>>>>>> (Good to see all these new features Sergey... keep them coming!)
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>>> On Aug 7, 2018, at 11:44 AM, Sergey Nuyanzin >

Re: Sqlline release

2018-09-07 Thread Julian Hyde
We have a release candidate for sqlline-1.5.0, based on
https://github.com/julianhyde/sqlline/commit/73d5974b6e694154bc47172f65d56ae47aa536aa.

This is not an Apache release, so there is no .tar.gz to look at, and
no formal vote, but I'd appreciate if one or two people look it over.
The jar is staged at maven central, so you can try it out by changing
1.4.0 to 1.5.0 in your project's pom.xml.

If there are no objections I'll make the release official tomorrow.

Julian

On Thu, Sep 6, 2018 at 11:36 AM Julian Hyde  wrote:
>
> We’re very close to an RC for sqlline-1.5.0. Arina is working to finalize 
> https://github.com/julianhyde/sqlline/issues/106.
>
> This is going to be the biggest sqlline release for a long time. (Special 
> thanks to Sergey Nuyanzin for many, many high-quality PRs.)
>
> If you’re in the Calcite community and would like to help, please give the 
> new sqlline a try and log any bugs you see. Get the latest sqlline master, 
> run ‘mvn install’, then in your Calcite pom.xml change sqlline.version from 
> 1.4.0 to 1.5.0-SNAPSHOT. You may need to ‘rm target/fullclasspath.txt’ before 
> you run ‘./sqlline’.
>
> Julian
>
>
> On Sep 4, 2018, at 2:00 PM, Julian Hyde  wrote:
>
> We said first week of September. Now it’s the first release of September. I 
> think I can make a release candidate for sqlline-1.5.0 in the next few days. 
> (As a non-Apache project, there is no formal vote for releases, but I welcome 
> feedback.)
>
> There are few issues/PRs[1][2][3] that require jline3, and jline3 only works 
> on JDK 8 and higher. I propose to take those PRs immediately AFTER this 
> release. Thus, sqlline-1.5 will be the last sqlline release that supports JDK 
> 1.6 or 1.7.
>
> Julian
>
> [1] https://github.com/julianhyde/sqlline/issues/105
>
> [2] https://github.com/julianhyde/sqlline/pull/115
>
> [3] https://github.com/julianhyde/sqlline/issues/73
>
>
> On Aug 19, 2018, at 10:22 AM, Sergey Nuyanzin  wrote:
>
> Thank you for a good news.
> I've added a new PR based on this #86 and I guess in one or two days will
> add one more
>
> On Sun, Aug 19, 2018 at 7:06 PM Julian Hyde  wrote:
>
> I’ve pushed PR #86...
> https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c
> <
> https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c>
>
>
> I’ll try to get to your remaining PRs in the next few days. Keep up the
> good work!
>
> Julian
>
>
> On Aug 18, 2018, at 11:04 AM, Sergey Nuyanzin 
>
> wrote:
>
>
> Julian,
> thank you very much for merging
>
> about PR on which I would like to build to add one more improvement:
> currently there is only one
> https://github.com/julianhyde/sqlline/pull/86 (commit 340c3b1 )
>
> Sergey and others, Is first week of September still a good timeframe
>
> for
>
> release?
> for me yes it is good
>
>
> On Sat, Aug 18, 2018 at 8:14 PM Julian Hyde  wrote:
>
> Sergey,
>
> I see a lot of pull requests for sqlline coming in from you… which is
> excellent! You have said in a previous thread that it would be helpful
>
> if
>
> some of them are merged to master because you want to build on that. So,
> could you give me an ordered list of PR numbers that are ready to merge?
> That would be helpful for me as I try to work through the backlog and
>
> merge
>
> them.
>
> Sergey and others, Is first week of September still a good timeframe for
> release?
>
> Julian
>
>
> On Aug 9, 2018, at 1:35 AM, Julian Hyde 
>
> wrote:
>
>
> Let’s wait about a month then. Target the first week in September.
>
> (Good to see all these new features Sergey... keep them coming!)
>
> Julian
>
> On Aug 7, 2018, at 11:44 AM, Sergey Nuyanzin 
>
> wrote:
>
>
> +1 for release
> at the same time I am ok to wait about a month
> (I have a few ideas about some more improvements)
>
> On Tue, Aug 7, 2018 at 5:29 AM Julian Hyde 
>
> wrote:
>
>
> (Forgive the cross-posting. The sqlline dev list isn’t very active,
>
> and
>
> many of the sqlline community are in the calcite community. Please
>
> reply to
>
> calcite dev only.)
>
> There have been a number of enhancements to sqlline recently[1]
>
> (thanks,
>
> Sergey!). Is it time for a release of sqlline? Or should we plan to
>
> have a
>
> release in say a month, to give people time to add more features.
>
> Julian
>
> [1] https://github.com/julianhyde/sqlline/commits/master
>
> --
> You received this message because you are subscribed to the Google
>
> Groups
>
> "sqlline-dev" group.
> To unsubscribe from this gr

Re: [VOTE] Release calcite-avatica-go-3.1.0 (release candidate 2)

2018-09-07 Thread Julian Hyde
+1 (binding)

Checked checksums/signatures, L&N, README, file headers.

Checked that contents of .tar.gz are consistent with git commit.

The issues noted last time (copyright date and .idea directory) are
resolved. Thanks, Francis!

Julian

On Fri, Sep 7, 2018 at 5:35 PM Josh Elser  wrote:
>
> +1 (binding)
>
> * xsums/sigs OK
> * Can "build" from source
> * Ran tests successfully
> * L&N look OK
> * Checked packages in Gopkg.toml are permissively licensed
> * Spot-checked files for license headers (this would be nice to automate
> in a script)
>
> On 9/5/18 4:05 PM, Francis Chuang wrote:
> > Hi all,
> >
> > I have created a release for Apache Calcite Avatica Go 3.1.0, release
> > candidate 2.
> >
> > The release notes are available here:
> > https://github.com/apache/calcite-avatica-go/blob/master/site/_docs/go_history.md
> >
> >
> > The commit to be voted on:
> > http://git-wip-us.apache.org/repos/asf/calcite-avatica-go/commit/0e1ae23d79c9f6d92337a13c9c15318a1c7570dc
> >
> >
> > The hash is 0e1ae23d79c9f6d92337a13c9c15318a1c7570dc
> >
> > The artifacts to be voted on are located here:
> > https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-avatica-go-3.1.0-rc2/
> >
> >
> > The hashes of the artifacts are as follows:
> >
> > src.tar.gz 0B396E74 3FD68D9D DD311A81 B0441401 DDB7E414 955A4D78
> > 761747E3 D68C8DEA
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/francischuang.asc
> >
> > Instructions for running the test suite is located here:
> > https://github.com/apache/calcite-avatica-go/blob/master/site/develop/avatica-go.md#testing
> >
> >
> > Please vote on releasing this package as Apache Calcite Avatica Go 3.1.0.
> >
> > To run the tests without a Go environment, install docker and
> > docker-compose. Then, in the root of the release's directory, run:
> > docker-compose up --build
> >
> > The vote is open for the next 72 hours and passes if a majority of
> > at least three +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Calcite Avatica Go 3.1.0
> > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > [ ] -1 Do not release this package because...
> >
> >
> > Here is my vote:
> >
> > +1 (binding)
> >
> > Francis
> >
> >


Re: Sqlline release

2018-09-08 Thread Julian Hyde
Thanks, Arina.

I knew about the JDK 1.7 issue. For JDK 1.7 and 1.6, I ran with
"-Ddocbkx.skip=true -Dhsqldb.version=2.3.4" and the build succeeded.

I saw your two PRs; thanks for these; I will merge them in and cut a
new release candidate.

Julian

On Sat, Sep 8, 2018 at 7:03 AM Arina Yelchiyeva
 wrote:
>
> Checked from the scratch branch:
> 1. Ran unit tests:
> - JDK8 - all unit tests pass.
> - JDK7 - unable to run unit tests, fails due to upgrade to
> 2.4.1.
> [ERROR] java.lang.UnsupportedClassVersionError:
> org/hsqldb/jdbc/JDBCDatabaseMetaData : Unsupported major.minor version 52.0
>
> 2. Connected to Apache Drill:
> - custom application config (custom info message, excluded commands,
> session options etc).
> - checked custom application reset
> - checked connection / re-connection / close
> - checked proper exceptions are displayed when using incorrect url / driver
> - ran random queries
> - checked randomly commands / session options
>
> Kind regards,
> Arina
>
>
>
> On Sat, Sep 8, 2018 at 8:25 AM Julian Hyde  wrote:
>
> > We have a release candidate for sqlline-1.5.0, based on
> >
> > https://github.com/julianhyde/sqlline/commit/73d5974b6e694154bc47172f65d56ae47aa536aa
> > .
> >
> > This is not an Apache release, so there is no .tar.gz to look at, and
> > no formal vote, but I'd appreciate if one or two people look it over.
> > The jar is staged at maven central, so you can try it out by changing
> > 1.4.0 to 1.5.0 in your project's pom.xml.
> >
> > If there are no objections I'll make the release official tomorrow.
> >
> > Julian
> >
> > On Thu, Sep 6, 2018 at 11:36 AM Julian Hyde  wrote:
> > >
> > > We’re very close to an RC for sqlline-1.5.0. Arina is working to
> > finalize https://github.com/julianhyde/sqlline/issues/106.
> > >
> > > This is going to be the biggest sqlline release for a long time.
> > (Special thanks to Sergey Nuyanzin for many, many high-quality PRs.)
> > >
> > > If you’re in the Calcite community and would like to help, please give
> > the new sqlline a try and log any bugs you see. Get the latest sqlline
> > master, run ‘mvn install’, then in your Calcite pom.xml change
> > sqlline.version from 1.4.0 to 1.5.0-SNAPSHOT. You may need to ‘rm
> > target/fullclasspath.txt’ before you run ‘./sqlline’.
> > >
> > > Julian
> > >
> > >
> > > On Sep 4, 2018, at 2:00 PM, Julian Hyde  wrote:
> > >
> > > We said first week of September. Now it’s the first release of
> > September. I think I can make a release candidate for sqlline-1.5.0 in the
> > next few days. (As a non-Apache project, there is no formal vote for
> > releases, but I welcome feedback.)
> > >
> > > There are few issues/PRs[1][2][3] that require jline3, and jline3 only
> > works on JDK 8 and higher. I propose to take those PRs immediately AFTER
> > this release. Thus, sqlline-1.5 will be the last sqlline release that
> > supports JDK 1.6 or 1.7.
> > >
> > > Julian
> > >
> > > [1] https://github.com/julianhyde/sqlline/issues/105
> > >
> > > [2] https://github.com/julianhyde/sqlline/pull/115
> > >
> > > [3] https://github.com/julianhyde/sqlline/issues/73
> > >
> > >
> > > On Aug 19, 2018, at 10:22 AM, Sergey Nuyanzin 
> > wrote:
> > >
> > > Thank you for a good news.
> > > I've added a new PR based on this #86 and I guess in one or two days will
> > > add one more
> > >
> > > On Sun, Aug 19, 2018 at 7:06 PM Julian Hyde  wrote:
> > >
> > > I’ve pushed PR #86...
> > >
> > https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c
> > > <
> > >
> > https://github.com/julianhyde/sqlline/commit/8e0061f113d89a11ca03f4b48eda3340fd00375c
> > >
> > >
> > >
> > > I’ll try to get to your remaining PRs in the next few days. Keep up the
> > > good work!
> > >
> > > Julian
> > >
> > >
> > > On Aug 18, 2018, at 11:04 AM, Sergey Nuyanzin 
> > >
> > > wrote:
> > >
> > >
> > > Julian,
> > > thank you very much for merging
> > >
> > > about PR on which I would like to build to add one more improvement:
> > > currently there is only one
> > > https://github.com/julianhyde/sqlline/pull/86 (commit 340c3b1 )
> > >
> > > Sergey and others, Is first week of September still a good timeframe
> > >
> > 

Re: Sqlline release

2018-09-08 Thread Julian Hyde
Turns out that Arina made *FOUR* PRs last night. (So much for the
weekend!) Thank you - I put them all in.

Release sqlline-1.5.0 is now final[1], and the binaries should be
appearing on Maven Central in an hour or two.

A big thanks to all of the contributors to this release but especially
to Sergey Nuyanzin (25 contributions) and Arina Ielchiieva (6
contributions). In recognition of your efforts, I have made you two
committers; see [2].

Julian

[1] https://github.com/julianhyde/sqlline/releases/tag/sqlline-1.5.0

[2] https://github.com/julianhyde/sqlline#committers
On Sat, Sep 8, 2018 at 3:23 PM Julian Hyde  wrote:
>
> Thanks, Arina.
>
> I knew about the JDK 1.7 issue. For JDK 1.7 and 1.6, I ran with
> "-Ddocbkx.skip=true -Dhsqldb.version=2.3.4" and the build succeeded.
>
> I saw your two PRs; thanks for these; I will merge them in and cut a
> new release candidate.
>
> Julian
>
> On Sat, Sep 8, 2018 at 7:03 AM Arina Yelchiyeva
>  wrote:
> >
> > Checked from the scratch branch:
> > 1. Ran unit tests:
> > - JDK8 - all unit tests pass.
> > - JDK7 - unable to run unit tests, fails due to upgrade to
> > 2.4.1.
> > [ERROR] java.lang.UnsupportedClassVersionError:
> > org/hsqldb/jdbc/JDBCDatabaseMetaData : Unsupported major.minor version 52.0
> >
> > 2. Connected to Apache Drill:
> > - custom application config (custom info message, excluded commands,
> > session options etc).
> > - checked custom application reset
> > - checked connection / re-connection / close
> > - checked proper exceptions are displayed when using incorrect url / driver
> > - ran random queries
> > - checked randomly commands / session options
> >
> > Kind regards,
> > Arina
> >
> >
> >
> > On Sat, Sep 8, 2018 at 8:25 AM Julian Hyde  wrote:
> >
> > > We have a release candidate for sqlline-1.5.0, based on
> > >
> > > https://github.com/julianhyde/sqlline/commit/73d5974b6e694154bc47172f65d56ae47aa536aa
> > > .
> > >
> > > This is not an Apache release, so there is no .tar.gz to look at, and
> > > no formal vote, but I'd appreciate if one or two people look it over.
> > > The jar is staged at maven central, so you can try it out by changing
> > > 1.4.0 to 1.5.0 in your project's pom.xml.
> > >
> > > If there are no objections I'll make the release official tomorrow.
> > >
> > > Julian
> > >
> > > On Thu, Sep 6, 2018 at 11:36 AM Julian Hyde  wrote:
> > > >
> > > > We’re very close to an RC for sqlline-1.5.0. Arina is working to
> > > finalize https://github.com/julianhyde/sqlline/issues/106.
> > > >
> > > > This is going to be the biggest sqlline release for a long time.
> > > (Special thanks to Sergey Nuyanzin for many, many high-quality PRs.)
> > > >
> > > > If you’re in the Calcite community and would like to help, please give
> > > the new sqlline a try and log any bugs you see. Get the latest sqlline
> > > master, run ‘mvn install’, then in your Calcite pom.xml change
> > > sqlline.version from 1.4.0 to 1.5.0-SNAPSHOT. You may need to ‘rm
> > > target/fullclasspath.txt’ before you run ‘./sqlline’.
> > > >
> > > > Julian
> > > >
> > > >
> > > > On Sep 4, 2018, at 2:00 PM, Julian Hyde  wrote:
> > > >
> > > > We said first week of September. Now it’s the first release of
> > > September. I think I can make a release candidate for sqlline-1.5.0 in the
> > > next few days. (As a non-Apache project, there is no formal vote for
> > > releases, but I welcome feedback.)
> > > >
> > > > There are few issues/PRs[1][2][3] that require jline3, and jline3 only
> > > works on JDK 8 and higher. I propose to take those PRs immediately AFTER
> > > this release. Thus, sqlline-1.5 will be the last sqlline release that
> > > supports JDK 1.6 or 1.7.
> > > >
> > > > Julian
> > > >
> > > > [1] https://github.com/julianhyde/sqlline/issues/105
> > > >
> > > > [2] https://github.com/julianhyde/sqlline/pull/115
> > > >
> > > > [3] https://github.com/julianhyde/sqlline/issues/73
> > > >
> > > >
> > > > On Aug 19, 2018, at 10:22 AM, Sergey Nuyanzin 
> > > wrote:
> > > >
> > > > Thank you for a good news.
> > > > I've added a new PR based on this #86 and I guess in one or two days 
> > > > will
> > > > add one more
> > > >
&

Commit 88f125541, "Reduce HepPlannerTest#testRuleApplyCount complexity"

2018-09-09 Thread Julian Hyde
Vladimir,

I strongly disagree with your commit 88f125541. You have removed test
code that is useful in creating a high-quality product. I presume that
your goal is to make the test suite run a little faster.

I think you should back out your commit.

I sent a previous email on the subject but you did not respond.

Julian


Re: Removing o.a.c.u.Compatible and o.a.c.u.CompatibleGuava11

2018-09-09 Thread Julian Hyde
Kevin,

I think we should remove those classes.

We may run into compatibility issues in future — or, as in this case, want to 
use features that are in a version of a library or the JDK that not all of our 
users are happy to upgrade to — and if so, we can always resurrect the files 
from git history.

Julian


> On Sep 9, 2018, at 9:16 AM, Vladimir Sitnikov  
> wrote:
> 
> Kevin>I think that means that both Compatible and CompatibleGuava11 can be
> removed since they should no longer be used.
> 
> Could we keep the files as a monument to Guava's version policy?
> 
> Vladimir



Re: Commit 88f125541, "Reduce HepPlannerTest#testRuleApplyCount complexity"

2018-09-09 Thread Julian Hyde
You seem to believe that the only purpose of a test is to test the specific bug 
or change for which it was introduced. That is indeed the philosophy of 
test-driven development, but there are more things in heaven and earth than TDD.

Calcite is a complex project, and needs complex tests to shake out complex 
issues. We do not have a QA team to write tests. Therefore the tests will be 
written during the course of fixing bugs.

Julian


> On Sep 9, 2018, at 1:48 PM, Vladimir Sitnikov  
> wrote:
> 
> Julian>I strongly disagree with your commit 88f125541. You have removed test
> code that is useful in creating a high-quality product.
> 
> Could you please provide technical justification?
> 
> Note: the test is still there.
> Note2: the test still runs in both Travis and Apache Jenkins. The test
> still runs from mvn test by default.
> 
> I'm inclined to suppose your "removed test code" wording is not quite right.
> 
> Note2: it still verifies that ARBITRARY order results in more rule
> applications than DEPTH_FIRST.
> Note3: original test code did not verify that the result of Hep planning
> matched to the expected one.
> 
> Juilan> I think you should back out your commit.
> 
> I think I should not.
> 
> Julina>I sent a previous email on the subject but you did not respond.
> 
> I'm afraid I fail to see which response do you expect.
> My opinion on the testRuleApplyCount test is as follows:
> a) It is useful to compare ARBITRARY and DEPTH_FIRST implementations.
> DEPTH_FIRST wins there, and it does NOT require to do a 1000'000 iterations
> to prove that.
> That is why I reduced test complexity so a unit does not exceed 5 seconds.
> 
> b) The SQL+Hep program result in bad scalability for Calcite. In other
> words, as the number of unions grows, planning becomes very slow. While it
> is interesting, and it might be even a bug in Calcite, I do not see how
> executing such a test (on every Travis build, on every Jenkins build, and
> on every dev build) helps Calcite development.
> 
> I can add a benchmark that would run that test with various numbers of
> unions and print the response time.
> However, we should use Apache and Travis' resources wisely and we should
> refrain from burning CPU for little reason.
> 
> I see your passion to build stable and robust software (that is great!),
> however I'm quite sure HepPlannerTest#testRuleApplyCount is not something
> that provides extra safety.
> 
> Vladimir



Re: Commit 88f125541, "Reduce HepPlannerTest#testRuleApplyCount complexity"

2018-09-09 Thread Julian Hyde
The technical justification is that the code — yes, sql is code — that you 
removed might have found a bug someday. A deep relational algebra tree places 
stresses on hep planner (and other parts of the system) that we cannot predict. 
The (implicit) expected behavior is that it doesn’t crash, produces the same 
result as yesterday, doesn’t use excessive memory, and finishes in a reasonable 
time. We test all of those things every time we run the test.

You are absolutely right that Calcite needs more testing, in many different 
categories. But we have no resources to write and maintain tests, so we have to 
do our best.

For example, I would LOVE to do performance regression tests. I know that the 
right way to do performance regression tests is to run the same test every day 
in the same controlled environment, measure its performance, and detect 
deviations over time. But we don’t have the resources to do that. All way have 
is a few tests that work on inputs that are big and complex that if something 
breaks the delta will be so large that someone will be bound to notice it. One 
of these tests you just gutted.

I didn’t read the rest of your email.

Julian


> On Sep 9, 2018, at 2:57 PM, Vladimir Sitnikov  
> wrote:
> 
> Julian>You seem to believe that the only purpose of a test is to test the
> specific bug or change for which it was introduced
> 
> I'm afraid you seem to put words in my mouth.
> 
> Note: so far I have technical reasons to keep the test with reduced number
> of lines while you don't.
> 
> Note2: original test included SQL with 100 or so lines. Somehow that
> pleases you. I've reduced the test to 10 or so, and it pisses you off.
> Could you please provide a technical clarification/justification on what
> number of SQL lines is enough in that test?
> I'm sure you can't, so could we just agree that you over-reacted on "commit
> 88f125541"?
> 
> You might ignore the below part, however you might learn a bit on
> property-based testing and/or fuzzy-testing and/or performance testing.
> 
> 
> Of course a test method can test zillions of scenarios "by accident".
> 
> However, each test should have some EXPECTED behaviour.
> For instance:
> a) "the number of HEP applied rules equals to 42 union all union all query"
> b) "the number of HEP applied rules is less when DEPTH_FIRST order is used
> for union all union all query"
> b*) "the number of HEP applied rules is less when DEPTH_FIRST order is used
> for ANY query"
> c) "the number of HEP applied rules grows linearly as the number of unions
> grows"
> d) "planning time grows linearly as the number of unions grows"
> e) "planning time fits in X seconds for query of size Y at CPU of Z GHz"
> f) "Calcite does not format hard drive when execute union all union all
> query"
> g) "Calcite is able to sustain load for at least 1 week and avoid out of
> memory, performance degradations, etc, etc"
> h) "your pick"
> 
> I find risks a..g quite important for Calcite.
> 
> I'm sure testRuleApplyCount can cover cases a and b only, and for those
> cases it does not require to process 1000 unions.
> 
> a is straight-forward. It does not require to have 1000'000 iterations in
> order to ensure "a".
> b is straight-forward as well as long as we are dealing with single SQL.
> 
> b*, c is not that trivial to prove since queries might have different size
> and shape.
> Property-based testing suits here: one can generate a query, apply two
> different HEP programs and compare if DEPTH_FIRST produces result faster.
> Current test verifies just a single SQL shape, however property-based test
> could pick random shape and size, and it could validate new SQL every time.
> Note: that randomization could still fit in 5 seconds, however those 5
> seconds would be multiplied by the number of test-suite executions, so it
> would result in quite good coverage.
> Old code of testRuleApplyCount massaged just a single SQL again and again,
> so it very unlikely to find a true bug.
> 
> CALCITE-2506
>  is
> and example of bug identified by fuzzing + property-based testing.
> 
> d and e
> 
> performance tests (e.g. measurement of planning duration) require more than
> a single point.
> You can't measure performance by executing a single test on a random
> hardware.
> One would likely require isolated hardware as there's no way to measure
> responses time on a shared environment.
> If planning sequence like in testRuleApplyCount is truly important, it is a
> good candidate for inclusion to ubenchmark suite. There we could
> use @Param({1, 10, 100, 1000}) numberOfUnions and see how response time
> grows as number of unions is changed.
> 
> 
> f) Calcite does not format hard drive...
> The tests for unexpected behaviour are better be done via fuzzing.
> That is one could craft a randomized SQL, pass it to Calcite (with
> randomized set of activated rules) and see what happens.
> OutOfMemory and/or StackOverflow w

Re: VARCHAR literals

2018-09-10 Thread Julian Hyde
Yes, as long as it is fully tested and documented. This is not a one line fix. 
It’s quite a major feature, and if it’s not done properly Calcite will be 
picking up the pieces for years.

Julian


> On Sep 10, 2018, at 1:26 AM, Piotr Nowojski  wrote:
> 
> Julian, would you be ok if we provided a contribution containing a 
> configurable switch for string literal type?
> 
> Piotrek 
> 
>> On 6 Sep 2018, at 20:47, Piotr Nowojski  wrote:
>> 
>> We know about this switch. Unfortunately it only solves/hides one of the 
>> problems. 
>> 
>>> On 6 Sep 2018, at 20:02, Julian Hyde  wrote:
>>> 
>>> Does https://issues.apache.org/jira/browse/CALCITE-2321 
>>> <https://issues.apache.org/jira/browse/CALCITE-2321> help?
>>> 
>>> 
>>>> On Sep 6, 2018, at 4:03 AM, Piotr Nowojski  wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> We have small problem with CHAR type in Flink. Officially we do not 
>>>> support it and all input/output columns are of type VARCHAR. Because of 
>>>> that, nobody has ever thought about CHAR semantic (for example correctly 
>>>> handling padding in comparisons or other functions). However this collides 
>>>> with a teeny tiny problem that in Calcite string literals are of type 
>>>> CHAR. This leads to not so funny inconsistencies in queries and incorrect 
>>>> results.
>>>> 
>>>> I wonder if we could provide a switch in Calcite to change the type of 
>>>> String literals to VARCHAR? I know that this is against the SQL standard, 
>>>> however quite a few databases are doing so for various reasons . One of 
>>>> them is that providing proper CHAR support can be tricky and nowadays it 
>>>> doesn’t provide much value to the user. I have seen quite often the 
>>>> pattern that some new db starts without CHAR support at all and add it 
>>>> only later (if ever). Providing such switch in Calcite would allow Calcite 
>>>> users do the same thing.
>>>> 
>>>> I was thinking about alternative workaround - rewriting query plan and 
>>>> changing all of the CHAR types to VARCHAR on our side, but this seems like 
>>>> not that easy thing to do. But maybe I’m wrong and there is an easy way to 
>>>> do so on our side?
>>>> 
>>>> Thanks, Piotrek
>>> 
>> 
> 



Re: [ANNOUNCE] Apache Calcite Avatica Go 3.1.0 released

2018-09-10 Thread Julian Hyde
Well done, and thank you for driving this, Francis.

Did you send this to annou...@apache.org ? If you 
didn’t, you should; quite a few people get their news that way. If you did, I 
didn’t see a message on announce, but maybe it’s still in moderation.

Julian


> On Sep 9, 2018, at 7:51 PM, Francis Chuang  wrote:
> 
> The Apache Calcite team is pleased to announce the release of
> Apache Calcite Avatica Go 3.1.0.
> 
> Avatica is a framework for building database drivers. Avatica
> defines a wire API and serialization mechanism for clients to
> communicate with a server as a proxy to a database. The reference
> Avatica client and server are implemented in Java and communicate
> over HTTP. Avatica is a sub-project of Apache Calcite.
> 
> The Avatica Go client is a Go database/sql driver that enables Go
> programs to communicate with the Avatica server.
> 
> Apache Calcite Avatica Go 3.1.0 is a minor release of the Avatica Go
> client to bring in support for Go modules. This release includes
> updated dependencies, testing against more targets and support
> for Go Modules as described in the release notes:
> 
>https://calcite.apache.org/avatica/docs/go_history.html#v3-1-0
> 
> The release is available here:
> 
>
> https://www.apache.org/dyn/closer.cgi/calcite/apache-calcite-avatica-go-3.1.0/
> 
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> 
>https://calcite.apache.org/avatica
> 
> Francis Chuang, on behalf of the Apache Calcite Team
> 



Re: [DISCUSS] reasonable duration for tests in the Calcite codebase

2018-09-10 Thread Julian Hyde
Calcite is a DBMS. Most DBMSs I know have test suites that take several hours 
to run. If our test suite took 50x longer I would start to get worried.

There are separate questions about whether developers should be required to run 
the full suite every commit, whether the CI system should run all tests every 
commit, and whether we should continue to use a free Travis instance that has 
strict time limits.

Julian


> On Sep 10, 2018, at 9:56 AM, Andrew Pilloud  
> wrote:
> 
> I would expect acceptable test run time to be somewhat bimodal: maximum
> around 100ms for unit tests (which should be the majority of tests) and
> minutes for functional and integration tests. It would be good for Travis
> to run all these tests on every PR, but it would be nice if I could limit
> my local test runs to only the unit tests
> 
> My biggest pain point when working with calcite is the size of the core
> project. Individual tests are mostly fast, but just building core takes 100
> seconds. It would also be easier to run just the relevant tests if there
> were several smaller maven projects.
> 
> Andrew
> 
> On Mon, Sep 10, 2018 at 4:53 AM Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
> 
>> Michael>just because a test is slow of course doesn't mean it's not useful
>> 
>> Does "slow test" impact your development experience in a bad way?
>> Does "slow test" impact contributor's experience in a bad way?
>> Do things like "MaterializationTest 160sec" and "LatticeTest 212sec" make
>> materialization/lattice much stronger?
>> 
>> Spoiler: yes, yes, no. In the latter case, most of the time is spent not in
>> materialization/lattice engine, but most of the time is spent in data
>> loading.
>> That is the tests cover very little cases, and they spend too much time in
>> testing HSQLDB (or whatever DB is used to load the data).
>> In other words, current MaterializationTest/LatticeTest are much less
>> related to Calcite than they relate to H*DB.
>> 
>> Michael>I'm not sure it makes sense to have one single duration that's
>> acceptable
>> for all tests
>> 
>> Of course there are always exceptions. Like the one I have listed with
>> "Cassandra startup".
>> 
>> Let me rephrase: "what test duration would make your eyebrow to raise?"
>> What test duration would raise both of your eyebrows?
>> 
>> Michael>We could also put some slower tests behind a define since
>> just because a test is slow of course doesn't mean it's not useful.
>> 
>> Calcite is an intermediate framework. It is not a database. It is not a
>> training framework for machine learning.
>> Thus Calcite should not take much time to perform its duties.
>> 
>> For instance, if simple query fetches 100 rows by primary key literals
>> takes 80 seconds to "prepare" in Calcite, I would definitely call it a bug.
>> It does not mean that that case is "useful to run in a day-to-day"
>> codebase.
>> We can have it with @Ignore("the test is too slow, see CALCITE-"),
>> however it makes absolutely no sense in running the test on each test
>> execution.
>> 
>> Michael>If we are going to pick a single duration,
>> 
>> Note: I'm not suggesting to veto tests based on its duration.
>> 
>> Suppose I'm reviewing a PR, and I see that the added test there takes 95
>> seconds on my machine.
>> Is it reasonable to ask the author to reduce test duration?
>> 
>> Michael>then I think it should be much higher than 5 seconds
>> 
>> What's your idea on the reasonable duration for a single test?
>> 
>> Just in case. https://issues.apache.org/jira/browse/CALCITE-2509 shows
>> that
>> a simple
>> coalesce(vInt(0), vInt(1), vInt(2), vInt(3), vInt(4), vInt(5), vInt(6),
>> vInt(7), vInt(8), vInt(9), vIntNotNull(0))
>> takes 30+seconds in RexSimplifyTest.
>> 
>> Would you mind if I add that kind of enabled-by-default test?
>> 
>> Vladimir
>> 



Re: [ANNOUNCE] Apache Calcite Avatica Go 3.1.0 released

2018-09-10 Thread Julian Hyde
I think you should make the announcement anyway. It will help debug the process 
of making an announcement, and will alert people to the good work you have put 
into the release, and give people the impression of project momentum. (I don’t 
feel terribly strongly, so if you decide to defer until 3.2.0 that’s fine too.)

> On Sep 10, 2018, at 3:23 PM, Francis Chuang  wrote:
> 
> I did forward a copy to anno...@apache.org, but my email client has been 
> really slow with my apache email account over the last few days. I just got a 
> rejection from anno...@apache.org saying that the announcement was rejected, 
> because the email was corrupt.
> 
> Since there is currently discussion for releasing Avatica Go 3.2.0 due to an 
> import paths issue with Avatica Go 3.1.0, should we still post the 
> announcement to anno...@apache.org? For the 3.2.0 release, the recommendation 
> will be for all users to drop 3.1.0 and move to 3.2.0 directly.
> 
> Francis
> 
> On 11/09/2018 3:30 AM, Volodymyr Vysotskyi wrote:
>> Hi Francis,
>> 
>> Thanks for releasing Avatica Go!
>> 
>> Before sending a letter to annou...@apache.org, please replace download
>> link with this one:
>> https://calcite.apache.org/avatica/downloads/avatica-go.html,
>> since there are not allowed links to https://www.apache.org/dyn/closer.cgi 
>> (the
>> initial Calcite 1.17.0 release announcement wasn't accepted because of
>> this).
>> 
>> For more details please see
>> http://www.apache.org/legal/release-policy.html#release-announcements.
>> 
>> Kind regards,
>> Volodymyr Vysotskyi
>> 
>> 
>> On Mon, Sep 10, 2018 at 8:20 PM Julian Hyde  wrote:
>> 
>>> Well done, and thank you for driving this, Francis.
>>> 
>>> Did you send this to annou...@apache.org <mailto:annou...@apache.org>? If
>>> you didn’t, you should; quite a few people get their news that way. If you
>>> did, I didn’t see a message on announce, but maybe it’s still in moderation.
>>> 
>>> Julian
>>> 
>>> 
>>>> On Sep 9, 2018, at 7:51 PM, Francis Chuang 
>>> wrote:
>>>> The Apache Calcite team is pleased to announce the release of
>>>> Apache Calcite Avatica Go 3.1.0.
>>>> 
>>>> Avatica is a framework for building database drivers. Avatica
>>>> defines a wire API and serialization mechanism for clients to
>>>> communicate with a server as a proxy to a database. The reference
>>>> Avatica client and server are implemented in Java and communicate
>>>> over HTTP. Avatica is a sub-project of Apache Calcite.
>>>> 
>>>> The Avatica Go client is a Go database/sql driver that enables Go
>>>> programs to communicate with the Avatica server.
>>>> 
>>>> Apache Calcite Avatica Go 3.1.0 is a minor release of the Avatica Go
>>>> client to bring in support for Go modules. This release includes
>>>> updated dependencies, testing against more targets and support
>>>> for Go Modules as described in the release notes:
>>>> 
>>>>https://calcite.apache.org/avatica/docs/go_history.html#v3-1-0
>>>> 
>>>> The release is available here:
>>>> 
>>>> 
>>> https://www.apache.org/dyn/closer.cgi/calcite/apache-calcite-avatica-go-3.1.0/
>>>> We welcome your help and feedback. For more information on how to
>>>> report problems, and to get involved, visit the project website at
>>>> 
>>>>https://calcite.apache.org/avatica
>>>> 
>>>> Francis Chuang, on behalf of the Apache Calcite Team
>>>> 
>>> 
> 



<    3   4   5   6   7   8   9   10   11   12   >