CI passed error tests

2019-12-12 Thread XING JIN
Hi guys,
I made a PR and run continuous integration tests. [1]
A error test contained in the PR and tagged with @slowTest.
The tests should be failed but CI passed by mistake.
I doubt our current CI is not running with 'testSlow' configuration. Isn't
it ?
I'm not sure if I should create a JIRA.

Best,
Jin

[1] https://github.com/apache/calcite/pull/1653


[jira] [Created] (CALCITE-3600) Rule to solve the filter partially by end application and remaining by calcite

2019-12-12 Thread anjali shrishrimal (Jira)
anjali shrishrimal created CALCITE-3600:
---

 Summary: Rule to solve the filter partially by end application and 
remaining by calcite
 Key: CALCITE-3600
 URL: https://issues.apache.org/jira/browse/CALCITE-3600
 Project: Calcite
  Issue Type: Wish
Reporter: anjali shrishrimal


Add Rule to check if filter condition is solvable by end application. If part 
of the filter condition can be solved by end application, then it should get 
pushed to end application, and remaining part which can not be solved by end 
application, should get solved by calcite secondarily (i.e. upon fetch remove 
unwanted data as per filter condition)

 

Consider an application which can solve only limited operators while filtering, 
say "=,<,>" and can not solve operator 'LIKE'.

 

Example, filter condition is "id > 1000 AND name LIKE '%an%'"

 

we would like to restrict the condition passed to application to "id > 1000" 
and remaining part "name LIKE '%an%'" should get solved by calcite. (The way it 
does for csv-adapter)

 

To replicate the situation, consider test-case testFilter in MongoAdapterTest 
(org.apache.calcite.adapter.mongodb.MongoAdapterTest) of mongo-adapter. 
 And modify it like below:

 

@Test public void testFilter()

{ assertModel(MODEL) .query("select state, city from zips where state = 'CA' 
AND city LIKE '%E%'") .returnsUnordered("STATE=CA; CITY=LOS ANGELES", 
"STATE=CA; CITY=BELL GARDENS"); }

 

 

Expected output of above query : 

STATE=CA; CITY=LOS ANGELES,

STATE=CA; CITY=BELL GARDENS

 

Expected plan :

EnumerableFilter(condition=[LIKE(CAST(ITEM($0, 'city')):VARCHAR(20), '%E%')])

{{MongoToEnumerableConverter}}

{{MongoProject(STATE=[CAST(ITEM($0, 'state')):VARCHAR(2)], CITY=[CAST(ITEM($0, 
'city')):VARCHAR(20)])}}

{{MongoFilter(condition=[=(CAST(ITEM($0, 'state')):VARCHAR(2), 'CA')])}}

{{MongoTableScan(table=[[mongo_raw, zips]])}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Line endings for source files on Windows

2019-12-12 Thread Julian Hyde
No, I’m going to ignore this trolling.

> On Dec 12, 2019, at 11:57 AM, Vladimir Sitnikov  
> wrote:
> 
> Julian>For all other purpose, just use git.
> 
> "use git" contradicts with
> http://www.apache.org/legal/release-policy.html#publication
> 
> legal/release-policy> Projects MUST direct outsiders towards official
> releases rather than raw source repositories, nightly builds, snapshots,
> release candidates, or any other similar packages. The only people who are
> supposed to know about such developer resources are individuals actively
> participating in development or following the dev list and thus aware of
> the conditions placed on unreleased materials.
> 
> Julian>for no reason
> 
> Ok. You have chosen to ignore Windows users.
> 
> Vladimir



Re: Line endings for source files on Windows

2019-12-12 Thread Vladimir Sitnikov
Julian>For all other purpose, just use git.

"use git" contradicts with
http://www.apache.org/legal/release-policy.html#publication

legal/release-policy> Projects MUST direct outsiders towards official
releases rather than raw source repositories, nightly builds, snapshots,
release candidates, or any other similar packages. The only people who are
supposed to know about such developer resources are individuals actively
participating in development or following the dev list and thus aware of
the conditions placed on unreleased materials.

Julian>for no reason

Ok. You have chosen to ignore Windows users.

Vladimir


Re: Line endings for source files on Windows

2019-12-12 Thread Julian Hyde
The main purpose of the source distribution is to have legal record of the 
release, and something from which people could re-create the release if GitHub 
and all mirrors thereof were to disappear. 

For all other purpose, just use git.

So, I see no reason to create different source distributions for different 
platforms. It is complication for no reason.

Julian


> On Dec 12, 2019, at 11:00 AM, Vladimir Sitnikov  
> wrote:
> 
> Julian>The git repo, at a particular commit, has an objective contents
> 
> That is true for binary blobs.
> However, text files are converted on checkout as per core.autocrlf and
> core.eol settings.
> 
> Julian>I suspect that when you use ‘git checkout’ with particular options
> 
> I suspect you are not very well aware of typical recommendations for Git
> for Windows.
> 
> Here's what GitHub recommends:
> https://help.github.com/en/github/using-git/configuring-git-to-handle-line-endings#global-settings-for-line-endings
> 
> "core.autocrlf true" means Git will convert text files to CRLF line endings
> when "simple"  "git clone https://...; is used.
> 
> Vladimir



Re: [VOTE] Release apache-calcite-avatica-1.16.0 (release candidate 1)

2019-12-12 Thread Stamatis Zampetakis
Ubuntu 18.04.3 LTS, jdk1.8.0_202, Gradle 6.0.1
 * Checked signatures and checksums OK
 * Went over release note OK
 * Run build and tests (./gradlew clean build) on git repo OK
 * Run build and tests (./gradlew clean build) on staged sources OK
 * Run build and Calcite tests on Calcite current master (./gradlew clean
build) with Avatica 1.16.0 OK
 * Run diff between staged sources and git commit (diff
-qr apache-calcite-avatica-1.16.0-src ~/git/Apache/Avatica) ?
Files apache-calcite-avatica-1.16.0-src/LICENSE and
/home/zabetak/git/Apache/Avatica/LICENSE differ
Only in apache-calcite-avatica-1.16.0-src: licenses

Minor remarks:
There seems to be some differences regarding the licenses between the git
repo and the staged sources. It seems that this is intended but I though it
is worth mentioning.

The release note has the full history of commits as usual but there are
quite a few messages who are not self-contained and easy to understand.
In general, maybe it would be better if we removed these commits from the
release notes.


+1 (binding)

On Wed, Dec 11, 2019 at 11:20 PM Francis Chuang 
wrote:

> Hi all,
>
> I have created a build for Apache Calcite Avatica 1.16.0, release
> candidate 1.
>
> Thanks to everyone who has contributed to this release.
>
> You can read the release notes here:
>
> https://github.com/apache/calcite-avatica/blob/512bbee4aa24ef9fb8106d0286d1243679dce2d0/site/_docs/history.md
>
> The commit to be voted upon:
>
> https://gitbox.apache.org/repos/asf?p=calcite-avatica.git;a=commit;h=512bbee4aa24ef9fb8106d0286d1243679dce2d0
>
> Its hash is 512bbee4aa24ef9fb8106d0286d1243679dce2d0
>
> Tag:
>
> https://gitbox.apache.org/repos/asf?p=calcite-avatica.git;a=tag;h=refs/tags/avatica-1.16.0-rc1
>
> The artifacts to be voted on are located here:
>
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-avatica-1.16.0-rc1
> (revision 37181)
>
> The hashes of the artifacts are as follows:
>
> 102d3ab0e90dd1db5e012a966d265bdfa8a0f24f9016a4187a6e5f0135a14770da124493dd2c7a18c9d8d8b9af5ecf4f5aceb90d48421251f38bc6ce6f5be697
> *apache-calcite-avatica-1.16.0-src.tar.gz
>
> A staged Maven repository is available for review at:
>
> https://repository.apache.org/content/repositories/orgapachecalcite-1071/org/apache/calcite/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/francischuang.asc
> https://www.apache.org/dist/calcite/KEYS
>
> N.B.
> To create the jars and test Apache Calcite Avatica: "./gradlew build
> -PskipSigning".
>
> If you do not have a Java environment available, you can run the tests
> using docker. To do so, install docker and docker-compose, then run
> "docker-compose run test" from the root of the directory.
>
> Please vote on releasing this package as Apache Calcite Avatica 1.16.0.
>
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Calcite 1.16.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
> Francis
>


Re: Line endings for source files on Windows

2019-12-12 Thread Vladimir Sitnikov
Julian>The git repo, at a particular commit, has an objective contents

That is true for binary blobs.
However, text files are converted on checkout as per core.autocrlf and
core.eol settings.

Julian>I suspect that when you use ‘git checkout’ with particular options

I suspect you are not very well aware of typical recommendations for Git
for Windows.

Here's what GitHub recommends:
https://help.github.com/en/github/using-git/configuring-git-to-handle-line-endings#global-settings-for-line-endings

"core.autocrlf true" means Git will convert text files to CRLF line endings
when "simple"  "git clone https://...; is used.

Vladimir


Re: Line endings for source files on Windows

2019-12-12 Thread Julian Hyde
The git repo, at a particular commit, has an objective contents. I suspect that 
when you use ‘git checkout’ with particular options, you don’t get those 
objective contents, you get something customized for your line-ending 
preference.


> On Dec 12, 2019, at 7:46 AM, Vladimir Sitnikov  
> wrote:
> 
>> I think a source distribution should contain the raw, unprocessed source
> files.
> 
> What do you mean by "raw"?
> 
> If I check out the same repository on Windows and macOS, I would get
> **different** file contents for *.java files.
> Windows machine would check out files as CRLF, and macOS would checkout the
> same files as LF.
> 
> It is something that is controlled with
> https://github.com/apache/calcite-avatica/blob/512bbee4aa24ef9fb8106d0286d1243679dce2d0/.gitattributes#L2
> 
> Vladimir



Re: [Discuss] Make flattening on Struct/Row optional

2019-12-12 Thread Rui Wang
Absolutely. Thanks lgor for the contribution! :)


-Rui

On Wed, Dec 11, 2019 at 10:54 PM Stamatis Zampetakis 
wrote:

> So basically thanks to Igor :)
>
> On Wed, Dec 11, 2019 at 9:56 PM Rui Wang  wrote:
>
> > Thanks Stamatis's suggestion. Indeed a recent effort in [1] enhanced the
> > support that reconstructs ROW in the top SELECT, which is supposed to
> solve
> > the problem.
> >
> >
> >
> > [1]: https://jira.apache.org/jira/browse/CALCITE-3138
> >
> > On Mon, Dec 9, 2019 at 3:21 PM Rui Wang  wrote:
> >
> > > Hello,
> > >
> > > Sorry for the long delay on this thread. Recently I heard about
> requests
> > > on how to deal with STRUCT without flattening it again in BeamSQL.
> Also I
> > > realized Flink has already disabled it in their codebase[1]. I did try
> to
> > > remove STRUCT flattening and run unit tests of calcite core to see how
> > many
> > > tests breaks: it was 25, which wasn't that bad. So I would like to pick
> > up
> > > this effort again.
> > >
> > > Before I do it, I just want to ask if Calcite community supports this
> > > effort (or think if it is a good idea)?
> > >
> > > My current execution plan will be the following:
> > > 1. Add a new flag to FrameworkConfig to specify whether flattening
> > STRUCT.
> > > By default, it is yes.
> > > 2. When disabling struct flatterner, add more tests to test STRUCT
> > support
> > > in general. For example, test STRUCT support on projection, join
> > condition,
> > > filtering, etc.  If there is something breaks, try to fix it.
> > > 3. Check the 25 failed tests above and see why they have failed if
> struct
> > > flattener is gone. Duplicate those failed tests but have necessary
> fixes
> > to
> > > make sure they can pass without STRUCT flattening.
> > >
> > >
> > > [1]:
> > >
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkPlannerImpl.scala#L166
> > >
> > >
> > > -Rui
> > >
> > > On Wed, Sep 5, 2018 at 11:59 AM Julian Hyde  wrote:
> > >
> > >> It might not be minor, but it’s worth a try. At optimization time we
> > >> treat all fields as fields, regardless of whether they have complex
> > types
> > >> (maps, arrays, multisets, records) so there should not be too many
> > >> problems. The flattening was mainly for the benefit of the runtime.
> > >>
> > >>
> > >> > On Sep 5, 2018, at 11:32 AM, Rui Wang 
> > >> wrote:
> > >> >
> > >> > Thanks for your helpful response! It seems like disabling the
> > flattening
> > >> > will at least affect some rules in optimization. It might not be a
> > minor
> > >> > change.
> > >> >
> > >> >
> > >> > -Rui
> > >> >
> > >> > On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis <
> zabe...@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> >> Hi Rui,
> > >> >>
> > >> >> Disabling flattening in some cases seems reasonable.
> > >> >>
> > >> >> If I am not mistaken, even in the existing code it is not used all
> > the
> > >> time
> > >> >> so it makes sense to become configurable.
> > >> >> For example, Calcite prepared statements (CalcitePrepareImpl) are
> > >> using the
> > >> >> flattener only for DDL operations that create materialized views
> (and
> > >> this
> > >> >> is because this code at some point passes from the PlannerImpl).
> > >> >> On the other hand, any query that is using the Planner will also
> pass
> > >> from
> > >> >> the flattener.
> > >> >>
> > >> >> Disabling the flattener does not mean that all rules will work
> > without
> > >> >> problems. The Javadoc of the RelStructuredTypeFlattener at some
> point
> > >> says
> > >> >> "This approach has the benefit that real optimizer and codegen
> rules
> > >> never
> > >> >> have to deal with structured types.". Due to this, it is very
> likely
> > >> that
> > >> >> some rules were written based on the fact that there are no
> > structured
> > >> >> types.
> > >> >>
> > >> >> Best,
> > >> >> Stamatis
> > >> >>
> > >> >>
> > >> >> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde <
> > jh...@apache.org
> > >> >
> > >> >> έγραψε:
> > >> >>
> > >> >>> Flattening was introduced mainly because the original engine used
> > flat
> > >> >>> column-oriented storage. Now we have several ways to executing,
> > >> >>> including generating java code.
> > >> >>>
> > >> >>> Adding a mode to disable flattening might make sense.
> > >> >>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang
>  > >
> > >> >>> wrote:
> > >> 
> > >>  Hi Community,
> > >> 
> > >>  While trying to support Row type in Apache Beam SQL on top of
> > >> Calcite,
> > >> >> I
> > >>  realized flattening Row logic will make structure information of
> > Row
> > >> >> lost
> > >>  after Projections. There is a use case where users want to mix
> Beam
> > >>  programming model with Beam SQL together to process a dataset.
> The
> > >>  following is an example of the use case:
> > >> 
> > >>  dataset.apply(something user defined)
> > >> .apply(SELECT ...)
> > >> 

Re: Line endings for source files on Windows

2019-12-12 Thread Vladimir Sitnikov
Michael>The source code should be what unprocessed,
Michael>but since Windows

I do not get what you mean by "unprocessed".
Even Maven build had lots of exclude-include patterns, so it did "process"
the sources.

Then, the current source release contains LICENSE file that contains some
processing (because it needs to gather third-party licenses).

Michael>I would be
Michael>interested in exploring producing reproducible builds of
Michael>Calcite/Avatica

You must have missed that, but it is already implemented.
Feel free to try.

Note: one of the point of using `-Prelease` flag in the "RC-VOTE" mail is
to verify build reproducibility.
In other words, if people build from the same Git commit (or from source
release), they should end up with exactly the same SHAs provided they build
the same version (release vs snapshot) and they use similar javac (because
different javac versions might produce slightly different bytecode).

Vladimir


Re: Line endings for source files on Windows

2019-12-12 Thread Michael Mior
I agree with both points. The source code should be what unprocessed,
but since Windows and Linux/macOS users get different source code
anyway, I'm not opposed to having two archives with different line
endings. This is mostly unrelated, but at some point, I would be
interested in exploring producing reproducible builds of
Calcite/Avatica. If we did this, hopefully we could have identical
JARs produced regardless of source line endings.
--
Michael Mior
mm...@apache.org


Le jeu. 12 déc. 2019 à 10:47, Vladimir Sitnikov
 a écrit :
>
> >I think a source distribution should contain the raw, unprocessed source
> files.
>
> What do you mean by "raw"?
>
> If I check out the same repository on Windows and macOS, I would get
> **different** file contents for *.java files.
> Windows machine would check out files as CRLF, and macOS would checkout the
> same files as LF.
>
> It is something that is controlled with
> https://github.com/apache/calcite-avatica/blob/512bbee4aa24ef9fb8106d0286d1243679dce2d0/.gitattributes#L2
>
> Vladimir


[jira] [Created] (CALCITE-3599) Initial the digest of RexRangeRef to avoid null string

2019-12-12 Thread Chunwei Lei (Jira)
Chunwei Lei created CALCITE-3599:


 Summary: Initial the digest of RexRangeRef to avoid null string
 Key: CALCITE-3599
 URL: https://issues.apache.org/jira/browse/CALCITE-3599
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Chunwei Lei
Assignee: Chunwei Lei
 Attachments: image-2019-12-12-23-49-18-977.png

Currently, the digest of {{RexRangeRef}} is always {{null}} which is confusing 
when we try to debug the code. I suggest changing it to a more meaningful 
string such as {{offset(0)}}.

!image-2019-12-12-23-49-18-977.png|width=529,height=234!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Line endings for source files on Windows

2019-12-12 Thread Vladimir Sitnikov
>I think a source distribution should contain the raw, unprocessed source
files.

What do you mean by "raw"?

If I check out the same repository on Windows and macOS, I would get
**different** file contents for *.java files.
Windows machine would check out files as CRLF, and macOS would checkout the
same files as LF.

It is something that is controlled with
https://github.com/apache/calcite-avatica/blob/512bbee4aa24ef9fb8106d0286d1243679dce2d0/.gitattributes#L2

Vladimir


Re: Line endings for source files on Windows

2019-12-12 Thread Julian Hyde
I think a source distribution should contain the raw, unprocessed source files. 
 The contents of the .zip and .tar.gz should be identical. 

If people want to change line endings they can do it for themselves. Or use an 
appropriate git setting. 

> On Dec 12, 2019, at 1:03 AM, Francis Chuang  wrote:
> 
> In this commit, Vladimir brought to my attention that editors on Windows 
> will complain about line endings if there isn't a zip source release with 
> Windows line endings: 
> https://github.com/apache/calcite-avatica/commit/34bbcb63f9216d3a5bc29dae1981a55e335d30df#commitcomment-36393594
> 
> I don't really work on the Java source directly, do not have personal 
> experience with this. In CALCITE-2333[1], we stopped releasing zip archives 
> and have stuck to only producing a tar.gz source release.
> 
> Should we re-introduce a Zip archive with all files converted to Windows line 
> endings?
> 
> I use Windows as my main operating system and Goland (IntelliJ) and Notepad++ 
> as my editors and I exclusively used Linux line endings for my source files. 
> For Go source files, I've not had any issues with the compilation and for the 
> Java source files I open in IntelliJ or Notepad++, I've not run into any 
> issues.
> 
> [1] https://issues.apache.org/jira/browse/CALCITE-2333


Re: [CALCITE-3589]

2019-12-12 Thread 过 冰峰
Thank you so much, I am very excited now, thank the apache calcite community

在 2019/12/12 下午7:30,“Francis Chuang” 写入:

Hey,

I've added you as a contributor to the project and assigned you to the 
issue. Please go ahead and open a PR on Github for review.

Francis

On 12/12/2019 9:43 pm, 过 冰峰 wrote:
> Dear calcite developer community:
> 
> https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-3589. I am 
a member of Apache Kylin and a user of Apache Calcitite,This issue was 
mentioned by me, can i fix it by myself。
> 
> thanks
> 




Re: [CALCITE-3589]

2019-12-12 Thread Francis Chuang

Hey,

I've added you as a contributor to the project and assigned you to the 
issue. Please go ahead and open a PR on Github for review.


Francis

On 12/12/2019 9:43 pm, 过 冰峰 wrote:

Dear calcite developer community:

https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-3589. I am a 
member of Apache Kylin and a user of Apache Calcitite,This issue was mentioned 
by me, can i fix it by myself。

thanks



[jira] [Created] (CALCITE-3598) ClassCastException in MaterializationTest testJoinMaterialization8 and testJoinMaterialization9

2019-12-12 Thread Ruben Q L (Jira)
Ruben Q L created CALCITE-3598:
--

 Summary: ClassCastException in MaterializationTest 
testJoinMaterialization8 and testJoinMaterialization9
 Key: CALCITE-3598
 URL: https://issues.apache.org/jira/browse/CALCITE-3598
 Project: Calcite
  Issue Type: Bug
Affects Versions: 1.21.0
Reporter: Ruben Q L


Problem unveil by CALCITE-3535, and also separately by CALCITE-3576.
When CALCITE-3535 was committed, it made 
MaterializationTest#testJoinMaterialization8 and 
MaterializationTest#testJoinMaterialization9 change their execution plan from 
hashJoin to nestedLoopJoin. This caused an exception
{code}
java.lang.ClassCastException: java.lang.String$CaseInsensitiveComparator cannot 
be cast to java.lang.String
{code}
which seems unrelated to CALCITE-3535 (or CALCITE-3576), so the tests were 
temporarily disabled.
The goal of this ticket is to investigate the root cause of this issue and 
re-activate both tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[CALCITE-3589]

2019-12-12 Thread 过 冰峰
Dear calcite developer community:

https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-3589. I am a 
member of Apache Kylin and a user of Apache Calcitite,This issue was mentioned 
by me, can i fix it by myself。

thanks


Re: Quicksql

2019-12-12 Thread Alessandro Solimando
Adapters must be needed by data sources not supporting SQL, I think this is
what Juan Pan was asking for.

On Thu, 12 Dec 2019 at 04:05, Haisheng Yuan  wrote:

> Nope, it doesn't use any adapters. It just submits partial SQL query to
> different engines.
>
> If query contains table from single source, e.g.
> select count(*) from hive_table1, hive_table2 where a=b;
> then the whole query will be submitted to hive.
>
> Otherwise, e.g.
> select distinct a,b from hive_table union select distinct a,b from
> mysql_table;
>
> The following query will be submitted to Spark and executed by Spark:
> select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;
>
> spark_tmp_table1: select distinct a,b from hive_table
> spark_tmp_table2: select distinct a,b from mysql_table
>
> On 2019/12/11 04:27:07, "Juan Pan"  wrote:
> > Hi Haisheng,
> >
> >
> > > The query on different data source will then be registered as temp
> spark tables (with filter or join pushed in), the whole query is rewritten
> as SQL text over these temp tables and submitted to Spark.
> >
> >
> > Does it mean QuickSQL also need adaptors to make query executed on
> different data source?
> >
> >
> > > Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
> >
> >
> > Basically, i like and agree with Julian’s statement. It is a great idea
> which personally hope Calcite move towards.
> >
> >
> > Give my best wishes to Calcite community.
> >
> >
> > Thanks,
> > Trista
> >
> >
> >  Juan Pan
> >
> >
> > panj...@apache.org
> > Juan Pan(Trista), Apache ShardingSphere
> >
> >
> > On 12/11/2019 10:53,Haisheng Yuan wrote:
> > As far as I know, users still need to register tables from other data
> sources before querying it. QuickSQL uses Calcite for parsing queries and
> optimizing logical expressions with several transformation rules. The query
> on different data source will then be registered as temp spark tables (with
> filter or join pushed in), the whole query is rewritten as SQL text over
> these temp tables and submitted to Spark.
> >
> > - Haisheng
> >
> > --
> > 发件人:Rui Wang
> > 日 期:2019年12月11日 06:24:45
> > 收件人:
> > 主 题:Re: Quicksql
> >
> > The co-routine model sounds fitting into Streaming cases well.
> >
> > I was thinking how should Enumerable interface work with streaming cases
> > but now I should also check Interpreter.
> >
> >
> > -Rui
> >
> > On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:
> >
> > The goal (or rather my goal) for the interpreter is to replace
> > Enumerable as the quick, easy default convention.
> >
> > Enumerable is efficient but not that efficient (compared to engines
> > that work on off-heap data representing batches of records). And
> > because it generates java byte code there is a certain latency to
> > getting a query prepared and ready to run.
> >
> > It basically implements the old Volcano query evaluation model. It is
> > single-threaded (because all work happens as a result of a call to
> > 'next()' on the root node) and cannot handle branching data-flow
> > graphs (DAGs).
> >
> > The Interpreter operates uses a co-routine model (reading from queues,
> > writing to queues, and yielding when there is no work to be done) and
> > therefore could be more efficient than enumerable in a single-node
> > multi-core system. Also, there is little start-up time, which is
> > important for small queries.
> >
> > I would love to add another built-in convention that uses Arrow as
> > data format and generates co-routines for each operator. Those
> > co-routines could be deployed in a parallel and/or distributed data
> > engine.
> >
> > Julian
> >
> > On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
> >  wrote:
> >
> > What is the ultimate goal of the Calcite Interpreter?
> >
> > To provide some context, I have been playing around with calcite + REST
> > (see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest
> <
> > https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
> > detail of my experiments)
> >
> >
> > —Z
> >
> > On Dec 9, 2019, at 9:05 PM, Julian Hyde  wrote:
> >
> > Yes, virtualization is one of Calcite’s goals. In fact, when I created
> > Calcite I was thinking about virtualization + in-memory materialized
> views.
> > Not only the Spark convention but any of the “engine” conventions (Drill,
> > Flink, Beam, Enumerable) could be used to create a virtual query engine.
> >
> > See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
> >
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> > <
> >
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> > .
> >
> > Julian
> >
> >
> >
> > On Dec 9, 2019, 

Line endings for source files on Windows

2019-12-12 Thread Francis Chuang
In this commit, Vladimir brought to my attention that editors on Windows 
will complain about line endings if there isn't a zip source release 
with Windows line endings: 
https://github.com/apache/calcite-avatica/commit/34bbcb63f9216d3a5bc29dae1981a55e335d30df#commitcomment-36393594


I don't really work on the Java source directly, do not have personal 
experience with this. In CALCITE-2333[1], we stopped releasing zip 
archives and have stuck to only producing a tar.gz source release.


Should we re-introduce a Zip archive with all files converted to Windows 
line endings?


I use Windows as my main operating system and Goland (IntelliJ) and 
Notepad++ as my editors and I exclusively used Linux line endings for my 
source files. For Go source files, I've not had any issues with the 
compilation and for the Java source files I open in IntelliJ or 
Notepad++, I've not run into any issues.


[1] https://issues.apache.org/jira/browse/CALCITE-2333


Re: Re: Re: Re: Volcano's problem with trait propagation: current state and future

2019-12-12 Thread Vladimir Ozerov
Hi Haisheng,

I am trying to model the proposal based on ideas from Cascades. This
assumes top-down control of the optimization process, i.e. the parent
drives optimization of the child. But as a result of this top-down
propagation of optimization requests, the implementation rules are applied
bottom-up in a depth-first manner (aka "backward chaining"), resembling
HepMatchOrder.DEPTH_FIRST. This perhaps the most profound difference wrt
VolcanoPlanner, where the optimization process is not driven by parents. In
this case, we will never end up in a situation when the children nodes are
not implemented.

Let me try showing another example of pseudo-code, demonstrating the idea.

1) The optimization process starts with the initial request to the planner.
We generate initial equivalence sets for all the nodes.
Then we create the optimization request. Note that optimization requests
may be more complex, than "satisfy the given traits". E.g. for merge join
it could be "give me nodes with non-empty collation". Hence the need for a
class OptimizationRequest. Then we start the optimization from the root,
with an infinite cost.

RelNode optimize(RelNode root, RelTraitSet traits) {
RelSet rootSet = register(root);

OptimizationRequest req = OptimizationRequest.satisfies(traits);

List nodes = optimizeSet(rootSet, req, Cost.INFINITE);

return minimalCost(nodes);
}

2) This is the entry point for the optimization of a single
equivalence set. Returns the best plans for different trait sets.
First, we get cached result if possible. The optimization is performed only
if the result is absent for the given request or if the cached result
exceeds the maxCost. Then we generate the list of logical alternatives. For
most nodes, there will be only one alternative. The main exception to the
rule is join optimization. Note that we pass optimization req and cost even
for logical optimization, because logical optimization may require physical
implementations! An example is a bushy join planning where we consider a
small fraction of bushy plans based on input distributions, which requires
physical inputs (see MemSQL and SQL Server PDW papers). Next, we perform
physical optimization of logical alternatives, registering good plans in
the equivalence set along the way. Finally, we get the best plans for the
given request, one per trait set.

List optimizeSet(RelSet equivalenceSet, OptimizationRequest req,
Cost maxCost) {
List cachedResult = cachedResults.get(req);

if (cachedResult != null && cachedResult.getCost() <= maxCost) {
return cachedResult.
}

Set logicalNodes = optimizeLogical(eqiuvalenceSet, req,
maxCost);

for (Rel loglcalNode : logicalNodes) {
optimizePhysical(logicalNode, req, maxCost);
}

List result = equivalenceSet.getBestResults(req, maxCost);

if (result != null) {
cachedResults.put(req, result);
}

return result;
}

3) The logical optimization process creates the list of optimization rules
and fires them one by one. Aggressive caching and pruning are used here
(omitted for brevity). Rule execution knows the optimization context (req,
maxCost), so it could plan the optimization flow accordingly. Finally,
qualifying logical nodes are returned. Note that we do not use any global
"rule queue" here. The optimization process is fully under our control, and
every rule has a well-defined optimization context in which it is called.

Set optimizeLogical(RelSet equivalenceSet, OptimizationRequest
req, Cost maxCost) {
List rules = createLogicalRules(equivalenceSet);

for (RelOptRule rule : rules) {
rule.fire(req, maxCost);
}

return equivalenceSet.getLogicalRels(req, maxCost);
}

4) Physical optimization. Caching is omitted for brevity. Here we invoke
the implementation rules to produce physical nodes. This may include
enforcers, which are essentially a special flavor of implementation rule.

void optimizePhysical(RelNode logicalNode, OptimizationRequest req, Cost
maxCost) {
List rules = createPhysicalRules(logicalNode);

for (ImplementationRule rule : rules) {
rule.fire(logicalNode, req, maxCost);
}
}

5) An example of a HashJoin rule which accepts the optimization request.
Comments are inlined.

class HashJoinRule implements PhysicalRule {
@Override
void fire(LogicalNode logicalJoin, OptimizationRequest req, Cost
maxCost) {
// Get the minimal self cost of all physical joins.
Cost logicalCost = logicalJoin.getCost();

// Prepare optimization requests for the left and right parts based
on the parent request.
OptimizationRequest leftReq = splitOptimizationRequest(req, true /*
left */);
OptimizationRequest rightReq = splitOptimizationRequest(req, false
/* right */);

// Recursive call to the function from p.2, exploring the left
child. The cost is adjusted.
List leftNodes = optimizeSet(join.getInput(0), leftReq,
maxCost - logicalCost);


Re: Updating the Website

2019-12-12 Thread Francis Chuang
My plan is to get automated site builds up and running first, which 
should get rid of the most difficult/troublesome steps for updating the 
site.


We can then evolve/experiment with the site to improve the process further.

On 12/12/2019 6:28 pm, Stamatis Zampetakis wrote:

I guess it will require some effort to setup and automate the process for
supporting multiple versions but afterwards it may be easier to maintain.
If the only thing that a committer has to do to update the site is
committing to the master then there is not even need for a particular
workflow.

On Mon, Dec 9, 2019 at 10:31 PM Julian Hyde  wrote:


We might be inventing requirements here, in order to justify a “cool”
technical change.

I don’t think there is a strong requirement for multiple versions of the
site. (Sure, it would be nice.)

This thread started with Stamatis pointing out that it was complicated to
update the site. If we support multiple versions, will this actually make
things less complicated?

Julian




On Dec 9, 2019, at 1:23 PM, Stamatis Zampetakis 

wrote:


In the short term we should try to do our best to follow the existing
workflow.

In the medium term we shall hope that things will be easier with the
automated build of the website.

In the longer term, I would really prefer to migrate towards a solution
like the one proposed by Vladimir.
As I also mentioned in a previous email, there are many projects who
publish multiple versions of the doc and I find this very helpful.
People usually wait some time before updating their libraries to the

latest

release; in this and other cases it is helpful to have a couple versions

of

the doc available online.


On Sun, Dec 8, 2019 at 11:02 PM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:


Francis>There are also links to Avatica docs in
Francis>the side bar and it would be a bit strange to have them always
point to
Francis>the master version of Avatica.

gradle.properties references the Avatica version, so we could print the
appropriate links.

Michael>that need to be made that are independent of a particular

release

Michael>(e.g. adding a commiter)?
Michael>Would I go back and edit the previous
Michael>release branch?

No. You update committers on a master branch

Michael>Do we somehow label parts of the site as being
Michael>release-independent?

It makes little sense to discuss. The answer will be obvious once

someone

tries.

Michael>Even if this is the case, consider when we might
Michael>have to correct documentation errors from a revious release

The current ASF rule is to have rel/... tag for each release.
That is the site build script could use rel/vX.Y tags to get "released
versions".

Then there are at least two strategies.
a) If we want to update documentation for calcite-1.10.0, then we could
release calcite-v1.10.1.
b) If a "silent" update is required (e.g. fix typo), then we could

invent

"support/vX.Y" branches, and commit the fix to that branch.

Note: the current release process does not require a "release branch".
The build script does NOT create new commits to the source repository.
However, we could create one on-demand (e.g. in case we really need to
patch the old site version or back-port a fix)

Vladimir