[jira] [Created] (CALCITE-3583) Support serialized to json and deserialized from json for Exchange relation operator

2019-12-09 Thread Wang Yanlin (Jira)
Wang Yanlin created CALCITE-3583:


 Summary: Support serialized to json and deserialized from json for 
Exchange relation operator
 Key: CALCITE-3583
 URL: https://issues.apache.org/jira/browse/CALCITE-3583
 Project: Calcite
  Issue Type: Improvement
Reporter: Wang Yanlin


Currently, serialize Exchange relnode to json  will cause exception

{code:java}
// RelWriterTest
@Test public void testExchange() {
final FrameworkConfig config = RelBuilderTest.config().build();
final RelBuilder builder = RelBuilder.create(config);
final RelNode rel = builder
.scan("EMP")
.exchange(RelDistributions.hash(ImmutableList.of(0, 1)))
.build();
String relJson = RelOptUtil.dumpPlan("", rel,
SqlExplainFormat.JSON, SqlExplainLevel.EXPPLAN_ATTRIBUTES);
String s = deserializeAndDumpToTextFormat(getSchema(rel), relJson);
final String expected = ""
+ "LogicalExchange(distribution=[hash[0, 1]])\n"
+ "  LogicalTableScan(table=[[scott, EMP]])\n";
assertThat(s, isLinux(expected));
  }
{code}

got


{code:java}
java.lang.UnsupportedOperationException: type not serializable: hash[0, 1] 
(type org.apache.calcite.rel.RelDistributions.RelDistributionImpl)

at org.apache.calcite.rel.externalize.RelJson.toJson(RelJson.java:290)
at 
org.apache.calcite.rel.externalize.RelJsonWriter.put(RelJsonWriter.java:83)
at 
org.apache.calcite.rel.externalize.RelJsonWriter.explain_(RelJsonWriter.java:66)
at 
org.apache.calcite.rel.externalize.RelJsonWriter.done(RelJsonWriter.java:128)
at 
org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:299)
at org.apache.calcite.plan.RelOptUtil.dumpPlan(RelOptUtil.java:1981)
at 
org.apache.calcite.plan.RelWriterTest.testExchange(RelWriterTest.java:772)
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release apache-calcite-avatica-1.16.0 (release candidate 0)

2019-12-09 Thread Vladimir Sitnikov
Francis>In past releases (1.15.0-rc0), we also provided a hash of the
signature

The hash of the signature is the same as 'hash of the hash'.
It adds no safety, so we should not publish 'hash of signature'

Vladimir


Re: [VOTE] Release apache-calcite-avatica-1.16.0 (release candidate 0)

2019-12-09 Thread Vladimir Sitnikov
Julian>I’m curious why the instructions include ‘-Prelease’.

The instructions in the vote mail provide a way to reproduce the archive.
In other words, the build would produce exactly the same file names and
contents (with the same checksums)

Julian>People should be able to unpack and build the distribution and do a
’normal’ build

The instructions do not limit what the user could do with the sources.
They could test whatever they want.

Vladimir


Re: [VOTE] Release apache-calcite-avatica-1.16.0 (release candidate 0)

2019-12-09 Thread Vladimir Sitnikov
Kevin>fails on the zip when extracted.

zip has CRLF line endings, and tar.gz has LF line endings, so you should
select
the archive based on your needs and environment setup.

The archives are not identical on purpose.

Vladimir


Re: [VOTE] Release apache-calcite-avatica-1.16.0 (release candidate 0)

2019-12-09 Thread Francis Chuang
I believe we stopped releasing zips a while back in CALCITE-2333[1], so 
the gradle task needs to be updated to not build any zip files.


In past releases (1.15.0-rc0), we also provided a hash of the signature: 
https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-avatica-1.15.0-rc0/apache-calcite-avatica-1.15.0-src.tar.gz.asc.sha512. 
It appears this is missing as well.


According to 
https://github.com/vlsi/vlsi-release-plugins/tree/master/plugins/stage-vote-release-plugin, 
the -Prelease flag specifies if the build is a release or a snapshot. I 
am not sure what it does behind the scenes, but I have always used 
gradle test to run the tests in docker.


In the past, I've included a note for running tests in Docker so that 
it's not necessary to dig through the docs and howtos [2]. Should this 
be removed as well?


Francis

[1]https://issues.apache.org/jira/browse/CALCITE-2333
[2]https://lists.apache.org/thread.html/5d2403063ddc52697e9df5c7fd80f004972fcb487bbf76c2759eccaf%40%3Cdev.calcite.apache.org%3E

On 10/12/2019 12:26 pm, Julian Hyde wrote:

+0

I’m curious why the instructions include ‘-Prelease’. People should be able to 
unpack and build the distribution and do a ’normal’ build.

Sure, it is useful to be able to reproduce the release build, but it’s much 
more important that a normal build works.

And, related, I would not include build instructions in the vote email. The 
distribution should be self-describing.

Julian



On Dec 9, 2019, at 5:23 PM, Kevin Risden  wrote:

-1

"./gradlew build -Prelease -PskipSigning" fails on the zip when extracted.
Looks like it has windows line endings and doesn't pass checks.

Looks like we are publishing both tar.gz and zip now?
https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-avatica-1.16.0-rc0/
We
didn't do that in the past just the .tar.gz.

I'd prefer if we removed the zip publishing and went back to just tar.gz
which would alleviate the additional publishing and failing tests.

Checked the following:
* Commit hash passes tests with Docker and ./gradlew build -Prelease
-PskipSigning
* Checked signatures and hashes against tar.gz and zip
* Checked passes tests from tar.gz - ./gradlew build -Prelease -PskipSigning
* Checked tests in zip - this failed see above
* Checked staged Maven repo is complete

Kevin Risden


On Sun, Dec 8, 2019 at 5:17 PM Francis Chuang 
wrote:


Hi all,

I have created a build for Apache Calcite Avatica 1.16.0, release
candidate 0.

Thanks to everyone who has contributed to this release.

You can read the release notes here:

https://github.com/apache/calcite-avatica/blob/204d58849ecdf2ef639308edba74f416311f7d88/site/_docs/history.md

The commit to be voted upon:

https://gitbox.apache.org/repos/asf?p=calcite-avatica.git;a=commit;h=204d58849ecdf2ef639308edba74f416311f7d88

Its hash is 204d58849ecdf2ef639308edba74f416311f7d88

Tag:

https://gitbox.apache.org/repos/asf?p=calcite-avatica.git;a=tag;h=refs/tags/avatica-1.16.0-rc0

The artifacts to be voted on are located here:

https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-avatica-1.16.0-rc0
(revision 37139)

The hashes of the artifacts are as follows:

b54066d3b67e1f47d8f3af74466155350bfa92e938f0f442383efd8abb49993c8aee3aca258a9cc2ebb347a6b2f9473c05221da52dd56971478e7989952a7393
*apache-calcite-avatica-1.16.0-src.tar.gz

0739d77ad6bfebd903ddd9fb72d03540f91676ec967ea0a0941e7b428f4045b9d7dab8803c499d3d681fd9c28a79a5feeb850eadcda46055174efc5e459b3661
*apache-calcite-avatica-1.16.0-src.zip

A staged Maven repository is available for review at:

https://repository.apache.org/content/repositories/orgapachecalcite-1070/org/apache/calcite/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/francischuang.asc
https://www.apache.org/dist/calcite/KEYS

N.B.
To create the jars and test Apache Calcite Avatica: "./gradlew build
-Prelease -PskipSigning".

If you do not have a Java environment available, you can run the tests
using docker. To do so, install docker and docker-compose, then run
"docker-compose run test" from the root of the directory.

Please vote on releasing this package as Apache Calcite Avatica 1.16.0.

The vote is open for the next 72 hours and passes if a majority of at
least three +1 PMC votes are cast.

[ ] +1 Release this package as Apache Calcite 1.16.0
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...


Here is my vote:

+1 (binding)

Francis





[CANCEL] [VOTE] Release apache-calcite-avatica-1.16.0 (release candidate 0)

2019-12-09 Thread Francis Chuang
The vote has been cancelled due to issues with the release artifacts. We 
will fix this and make rc1 available for voting.


Re: Quicksql

2019-12-09 Thread Julian Hyde
Yes, virtualization is one of Calcite’s goals. In fact, when I created Calcite 
I was thinking about virtualization + in-memory materialized views. Not only 
the Spark convention but any of the “engine” conventions (Drill, Flink, Beam, 
Enumerable) could be used to create a virtual query engine.

See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite) 
https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework 
.

Julian



> On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana  wrote:
> 
> I recently contacted one of the active contributors asking about the
> purpose of the project and here's his reply:
> 
> From my understanding, Quicksql is a data virtualization platform. It can
>> query multiple data sources altogether and in a distributed way; Say, you
>> can write a SQL with a MySql table join with an Elasticsearch table.
>> Quicksql can recognize that, and then generate Spark code, in which it will
>> fetch the MySQL/ES data as a temporary table separately, and then join them
>> in Spark. The execution is in Spark so it is totally distributed. The user
>> doesn't need to aware of where the table is from.
>> 
> 
> I understand that the Spark convention Calcite has attempts to achieve the
> same goal, but it isn't fully implemented yet.
> 
> 
> On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde  wrote:
> 
>> Anyone know anything about Quicksql? It seems to be quite a popular
>> project, and they have an internal fork of Calcite.
>> 
>> https://github.com/Qihoo360/ 
>> 
>> 
>> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
>> <
>> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
>>> 
>> 
>> Julian
>> 
>> 



Re: Re: Re: Re: Volcano's problem with trait propagation: current state and future

2019-12-09 Thread Haisheng Yuan
Hi Vladimir,

Sorry for my late reply.
WRT join planning, it is not required to put join reordering rule into the HEP 
planner. It can also be put into Volcano planner. Indeed, it is not ideal for 
the join ordering rule to generate a single plan. We can create a nother rule 
to generate multiple alternatives and put the rule into Volcano planner. This 
way you can get what you want.

The pull-up trait is not the essence of on-demand trait request, the main idea 
is link [1]. 

>> 4.1) If the logical cost exceeds "maxCost", we stop and return. The whole 
>> logical subspace is pruned even before exploration.
In many cases, the search space you pruned is just the specific operator, 
because the child operator should be a MEMO group, other parent operators might 
still need to explore it, especially when the JoinReorderingRule only generate 
a single logical optimal join order.

>> 4.2) Returned physical children are already registered in proper set/subset, 
>> but are not used for any pattern-matching, and doesn't trigger more rule 
>> calls!
That is the problem of Calcite's default behaviour. Most of the rules' default 
INSTANCE provided by Calcite not only match logical operators but also physical 
operators. I am against that. I am not sure if you have created your own rule 
instances or not.

>> 4.3) Implementation rule checks the cost of the physical child.
During implementation rule, it is possiple that we are not able to calculate 
the cost yet. Depending on the rule match order, if it is top-down rule 
matching, the child operators are still logical. If it is bottom-up rule 
matching, the child operators are still not enforced, say we generate a 
MergeJoin with 2 children not sorted yet, how do we estimate the cost?

>> If it is greater than any other already observed child with the same traits
How can we observe it inside the implementation rule?

[1] 
http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e

- Haisheng

--
发件人:Vladimir Ozerov
日 期:2019年12月06日 18:00:01
收件人:Haisheng Yuan
抄 送:dev@calcite.apache.org (dev@calcite.apache.org)
主 题:Re: Re: Re: Volcano's problem with trait propagation: current state and 
future

"all we know is their collations" -> "all we know is their traits"

пт, 6 дек. 2019 г. в 12:57, Vladimir Ozerov :

Hi Haisheng,

Thank you for your response. Let me elaborate my note on join planning first - 
what I was trying to say is not that rules on their own have some deficiencies. 
What I meant is that with current planner implementation, users tend to 
separate join planning from the core optimization process like this in the 
pseudo-code below. As a result, only one join permutation is considered during 
physical planning, even though join rule may potentially generate multiple 
plans worth exploring:

RelNode optimizedLogicalNode = doJoinPlanning(logicalNode);
RelNode physicalNode = doPhysicalPlanning(optimizedLogicalNode);

Now back to the main question. I re-read your thread about on-demand trait 
propagation [1] carefully. I'd like to admit that when I was reading it for the 
first time about a month ago, I failed to understand some details due to poor 
knowledge of different optimizer architectures. Now I understand it much 
better, and we definitely concerned with exactly the same problem. I feel that 
trait pull-up might be a step in the right direction, however, it seems to me 
that it is not the complete solution. Let me try to explain why I think so.

The efficient optimizer should try to save CPU as much as possible because it 
allows us to explore more plans in a sensible amount of time. To achieve that 
we should avoid redundant operations, and detect and prune inefficient paths 
aggressively. As far as I understand the idea of trait pull-up, we essentially 
explore the space of possible physical properties of children nodes without 
forcing their implementation. But after that, the Calcite will explore that 
nodes again, now in order to execute implementation rules. I.e. we will do two 
dives - one to enumerate the nodes (trait pull-up API), and the other one to 
implement them (implementation rules), while in Cascades one dive should be 
sufficient since exploration invokes the implementation rules as it goes. This 
is the first issue I see.

The second one is more important - how to prune inefficient plans? Currently, 
nodes are implemented independently and lack of context doesn't allow us to 
estimates children's costs when implementing the parent, hence branch-and-bound 
is not possible. Can trait pull-up API "List 
deriveTraitSets(RelNode, RelMetadataQuery)" help us with this? If the children 
nodes are not implemented before the pull-up, all we know is their collations, 
but not their costs. And without costs, pruning is not possible. Please let me 
know if I missed something from the proposal.

The possible architect

Re: [VOTE] Release apache-calcite-avatica-1.16.0 (release candidate 0)

2019-12-09 Thread Julian Hyde
+0 

I’m curious why the instructions include ‘-Prelease’. People should be able to 
unpack and build the distribution and do a ’normal’ build.

Sure, it is useful to be able to reproduce the release build, but it’s much 
more important that a normal build works.

And, related, I would not include build instructions in the vote email. The 
distribution should be self-describing.

Julian


> On Dec 9, 2019, at 5:23 PM, Kevin Risden  wrote:
> 
> -1
> 
> "./gradlew build -Prelease -PskipSigning" fails on the zip when extracted.
> Looks like it has windows line endings and doesn't pass checks.
> 
> Looks like we are publishing both tar.gz and zip now?
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-avatica-1.16.0-rc0/
> We
> didn't do that in the past just the .tar.gz.
> 
> I'd prefer if we removed the zip publishing and went back to just tar.gz
> which would alleviate the additional publishing and failing tests.
> 
> Checked the following:
> * Commit hash passes tests with Docker and ./gradlew build -Prelease
> -PskipSigning
> * Checked signatures and hashes against tar.gz and zip
> * Checked passes tests from tar.gz - ./gradlew build -Prelease -PskipSigning
> * Checked tests in zip - this failed see above
> * Checked staged Maven repo is complete
> 
> Kevin Risden
> 
> 
> On Sun, Dec 8, 2019 at 5:17 PM Francis Chuang 
> wrote:
> 
>> Hi all,
>> 
>> I have created a build for Apache Calcite Avatica 1.16.0, release
>> candidate 0.
>> 
>> Thanks to everyone who has contributed to this release.
>> 
>> You can read the release notes here:
>> 
>> https://github.com/apache/calcite-avatica/blob/204d58849ecdf2ef639308edba74f416311f7d88/site/_docs/history.md
>> 
>> The commit to be voted upon:
>> 
>> https://gitbox.apache.org/repos/asf?p=calcite-avatica.git;a=commit;h=204d58849ecdf2ef639308edba74f416311f7d88
>> 
>> Its hash is 204d58849ecdf2ef639308edba74f416311f7d88
>> 
>> Tag:
>> 
>> https://gitbox.apache.org/repos/asf?p=calcite-avatica.git;a=tag;h=refs/tags/avatica-1.16.0-rc0
>> 
>> The artifacts to be voted on are located here:
>> 
>> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-avatica-1.16.0-rc0
>> (revision 37139)
>> 
>> The hashes of the artifacts are as follows:
>> 
>> b54066d3b67e1f47d8f3af74466155350bfa92e938f0f442383efd8abb49993c8aee3aca258a9cc2ebb347a6b2f9473c05221da52dd56971478e7989952a7393
>> *apache-calcite-avatica-1.16.0-src.tar.gz
>> 
>> 0739d77ad6bfebd903ddd9fb72d03540f91676ec967ea0a0941e7b428f4045b9d7dab8803c499d3d681fd9c28a79a5feeb850eadcda46055174efc5e459b3661
>> *apache-calcite-avatica-1.16.0-src.zip
>> 
>> A staged Maven repository is available for review at:
>> 
>> https://repository.apache.org/content/repositories/orgapachecalcite-1070/org/apache/calcite/
>> 
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/francischuang.asc
>> https://www.apache.org/dist/calcite/KEYS
>> 
>> N.B.
>> To create the jars and test Apache Calcite Avatica: "./gradlew build
>> -Prelease -PskipSigning".
>> 
>> If you do not have a Java environment available, you can run the tests
>> using docker. To do so, install docker and docker-compose, then run
>> "docker-compose run test" from the root of the directory.
>> 
>> Please vote on releasing this package as Apache Calcite Avatica 1.16.0.
>> 
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache Calcite 1.16.0
>> [ ]  0 I don't feel strongly about it, but I'm okay with the release
>> [ ] -1 Do not release this package because...
>> 
>> 
>> Here is my vote:
>> 
>> +1 (binding)
>> 
>> Francis
>> 



Re: [VOTE] Release apache-calcite-avatica-1.16.0 (release candidate 0)

2019-12-09 Thread Kevin Risden
-1

"./gradlew build -Prelease -PskipSigning" fails on the zip when extracted.
Looks like it has windows line endings and doesn't pass checks.

Looks like we are publishing both tar.gz and zip now?
https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-avatica-1.16.0-rc0/
We
didn't do that in the past just the .tar.gz.

I'd prefer if we removed the zip publishing and went back to just tar.gz
which would alleviate the additional publishing and failing tests.

Checked the following:
* Commit hash passes tests with Docker and ./gradlew build -Prelease
-PskipSigning
* Checked signatures and hashes against tar.gz and zip
* Checked passes tests from tar.gz - ./gradlew build -Prelease -PskipSigning
* Checked tests in zip - this failed see above
* Checked staged Maven repo is complete

Kevin Risden


On Sun, Dec 8, 2019 at 5:17 PM Francis Chuang 
wrote:

> Hi all,
>
> I have created a build for Apache Calcite Avatica 1.16.0, release
> candidate 0.
>
> Thanks to everyone who has contributed to this release.
>
> You can read the release notes here:
>
> https://github.com/apache/calcite-avatica/blob/204d58849ecdf2ef639308edba74f416311f7d88/site/_docs/history.md
>
> The commit to be voted upon:
>
> https://gitbox.apache.org/repos/asf?p=calcite-avatica.git;a=commit;h=204d58849ecdf2ef639308edba74f416311f7d88
>
> Its hash is 204d58849ecdf2ef639308edba74f416311f7d88
>
> Tag:
>
> https://gitbox.apache.org/repos/asf?p=calcite-avatica.git;a=tag;h=refs/tags/avatica-1.16.0-rc0
>
> The artifacts to be voted on are located here:
>
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-avatica-1.16.0-rc0
> (revision 37139)
>
> The hashes of the artifacts are as follows:
>
> b54066d3b67e1f47d8f3af74466155350bfa92e938f0f442383efd8abb49993c8aee3aca258a9cc2ebb347a6b2f9473c05221da52dd56971478e7989952a7393
> *apache-calcite-avatica-1.16.0-src.tar.gz
>
> 0739d77ad6bfebd903ddd9fb72d03540f91676ec967ea0a0941e7b428f4045b9d7dab8803c499d3d681fd9c28a79a5feeb850eadcda46055174efc5e459b3661
> *apache-calcite-avatica-1.16.0-src.zip
>
> A staged Maven repository is available for review at:
>
> https://repository.apache.org/content/repositories/orgapachecalcite-1070/org/apache/calcite/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/francischuang.asc
> https://www.apache.org/dist/calcite/KEYS
>
> N.B.
> To create the jars and test Apache Calcite Avatica: "./gradlew build
> -Prelease -PskipSigning".
>
> If you do not have a Java environment available, you can run the tests
> using docker. To do so, install docker and docker-compose, then run
> "docker-compose run test" from the root of the directory.
>
> Please vote on releasing this package as Apache Calcite Avatica 1.16.0.
>
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Calcite 1.16.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
> Francis
>


[jira] [Created] (CALCITE-3582) Make StructFlatterner configurable in PlannerImpl

2019-12-09 Thread Rui Wang (Jira)
Rui Wang created CALCITE-3582:
-

 Summary: Make StructFlatterner configurable in PlannerImpl
 Key: CALCITE-3582
 URL: https://issues.apache.org/jira/browse/CALCITE-3582
 Project: Calcite
  Issue Type: Task
Reporter: Rui Wang
Assignee: Rui Wang


There is a use case where users want to mix Beam programming model with Beam 
SQL together to process a dataset. The following is an example of the use case:

dataset.apply(something user defined)
.apply(SELECT ...)
.apply(something user defined)

As you can see, after the SQL statement is applied, the data structure should 
be preserved for further processing.


Make struct flattener configurable in PlannerImpl to allow disabling it when 
not needed can be a solution.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Re: Volcano's problem with trait propagation: current state and future

2019-12-09 Thread Stamatis Zampetakis
It's been some time that I didn't look into the code but the most recent
Hive paper [1] mostly talks about Calcite in the query optimization section
so I have to say I am a bit surprised.

[1] https://arxiv.org/pdf/1903.10970.pdf

On Mon, Dec 9, 2019 at 6:21 PM Vladimir Ozerov  wrote:

> After looking at Hive implementation I have the impression that it doesn't
> use Apache Calcite for physical planning, hence it doesn't have the
> problems mentioned in this topic.
>
> вс, 8 дек. 2019 г. в 18:55, Vladimir Ozerov :
>
> > Hi Stamatis,
> >
> > Thank you for the idea about Hive. I looked at it some time ago and the
> > codebase was substantially more complex to understand for me than in
> other
> > projects, so I gave up. I'll try to do the analysis again.
> > I'd like to mention that I also had a thought that maybe the
> > implementation of a top-down optimization is not a concern of
> > VolcanoPlanner, and the brand new planner may play well here. But from a
> > practical perspective, of course, I keep a hope that we will find a less
> > intrusive way to introduce efficient physical optimization into
> > VolcanoPlanner :-)
> >
> > Regards,
> > Vladimir.
> >
> > вс, 8 дек. 2019 г. в 12:42, Stamatis Zampetakis :
> >
> >> Thanks Vladimir for this great summary. It is really helpful to know how
> >> the different projects use the optimizer and it certainly helps to
> >> identify
> >> limitations on our implementation.
> >>
> >> I cannot provide any valuable feedback at the moment since I have to
> find
> >> some time to read more carefully your analysis.
> >>
> >> In the meantime, I know that Hive is also using Calcite for quite some
> >> time
> >> now so maybe you can get some new ideas (or complete your background
> >> study)
> >> by looking in their code.
> >>
> >> @Haisheng: I think many people did appreciate the discussion for pull up
> >> traits so I wouldn't say that we abandoned it. I had the impression that
> >> we
> >> were waiting a design doc.
> >>
> >> In general it may not be feasible to cover all use cases with a single
> >> optimizer. I wouldn't find it bad to introduce another planner if there
> >> are
> >> enough reasons to do so.
> >>
> >> Best,
> >> Stamatis
> >>
> >>
> >> On Fri, Dec 6, 2019, 11:00 AM Vladimir Ozerov 
> wrote:
> >>
> >> > "all we know is their *collations*" -> "all we know is their *traits*"
> >> >
> >> > пт, 6 дек. 2019 г. в 12:57, Vladimir Ozerov :
> >> >
> >> > > Hi Haisheng,
> >> > >
> >> > > Thank you for your response. Let me elaborate my note on join
> planning
> >> > > first - what I was trying to say is not that rules on their own have
> >> some
> >> > > deficiencies. What I meant is that with current planner
> >> implementation,
> >> > > users tend to separate join planning from the core optimization
> >> process
> >> > > like this in the pseudo-code below. As a result, only one join
> >> > permutation
> >> > > is considered during physical planning, even though join rule may
> >> > > potentially generate multiple plans worth exploring:
> >> > >
> >> > > RelNode optimizedLogicalNode = doJoinPlanning(logicalNode);
> >> > > RelNode physicalNode = doPhysicalPlanning(optimizedLogicalNode);
> >> > >
> >> > > Now back to the main question. I re-read your thread about on-demand
> >> > trait
> >> > > propagation [1] carefully. I'd like to admit that when I was reading
> >> it
> >> > for
> >> > > the first time about a month ago, I failed to understand some
> details
> >> due
> >> > > to poor knowledge of different optimizer architectures. Now I
> >> understand
> >> > it
> >> > > much better, and we definitely concerned with exactly the same
> >> problem. I
> >> > > feel that trait pull-up might be a step in the right direction,
> >> however,
> >> > it
> >> > > seems to me that it is not the complete solution. Let me try to
> >> explain
> >> > why
> >> > > I think so.
> >> > >
> >> > > The efficient optimizer should try to save CPU as much as possible
> >> > because
> >> > > it allows us to explore more plans in a sensible amount of time. To
> >> > achieve
> >> > > that we should avoid redundant operations, and detect and prune
> >> > inefficient
> >> > > paths aggressively. As far as I understand the idea of trait
> pull-up,
> >> we
> >> > > essentially explore the space of possible physical properties of
> >> children
> >> > > nodes without forcing their implementation. But after that, the
> >> Calcite
> >> > > will explore that nodes again, now in order to execute
> implementation
> >> > > rules. I.e. we will do two dives - one to enumerate the nodes (trait
> >> > > pull-up API), and the other one to implement them (implementation
> >> rules),
> >> > > while in Cascades one dive should be sufficient since exploration
> >> invokes
> >> > > the implementation rules as it goes. This is the first issue I see.
> >> > >
> >> > > The second one is more important - how to prune inefficient plans?
> >> > > Currently, nodes are implemented independently and lack of context
> >> > 

Re: [Discuss] Make flattening on Struct/Row optional

2019-12-09 Thread Rui Wang
Hello,

Sorry for the long delay on this thread. Recently I heard about requests on
how to deal with STRUCT without flattening it again in BeamSQL. Also I
realized Flink has already disabled it in their codebase[1]. I did try to
remove STRUCT flattening and run unit tests of calcite core to see how many
tests breaks: it was 25, which wasn't that bad. So I would like to pick up
this effort again.

Before I do it, I just want to ask if Calcite community supports this
effort (or think if it is a good idea)?

My current execution plan will be the following:
1. Add a new flag to FrameworkConfig to specify whether flattening STRUCT.
By default, it is yes.
2. When disabling struct flatterner, add more tests to test STRUCT support
in general. For example, test STRUCT support on projection, join condition,
filtering, etc.  If there is something breaks, try to fix it.
3. Check the 25 failed tests above and see why they have failed if struct
flattener is gone. Duplicate those failed tests but have necessary fixes to
make sure they can pass without STRUCT flattening.


[1]:
https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkPlannerImpl.scala#L166


-Rui

On Wed, Sep 5, 2018 at 11:59 AM Julian Hyde  wrote:

> It might not be minor, but it’s worth a try. At optimization time we treat
> all fields as fields, regardless of whether they have complex types (maps,
> arrays, multisets, records) so there should not be too many problems. The
> flattening was mainly for the benefit of the runtime.
>
>
> > On Sep 5, 2018, at 11:32 AM, Rui Wang  wrote:
> >
> > Thanks for your helpful response! It seems like disabling the flattening
> > will at least affect some rules in optimization. It might not be a minor
> > change.
> >
> >
> > -Rui
> >
> > On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis 
> > wrote:
> >
> >> Hi Rui,
> >>
> >> Disabling flattening in some cases seems reasonable.
> >>
> >> If I am not mistaken, even in the existing code it is not used all the
> time
> >> so it makes sense to become configurable.
> >> For example, Calcite prepared statements (CalcitePrepareImpl) are using
> the
> >> flattener only for DDL operations that create materialized views (and
> this
> >> is because this code at some point passes from the PlannerImpl).
> >> On the other hand, any query that is using the Planner will also pass
> from
> >> the flattener.
> >>
> >> Disabling the flattener does not mean that all rules will work without
> >> problems. The Javadoc of the RelStructuredTypeFlattener at some point
> says
> >> "This approach has the benefit that real optimizer and codegen rules
> never
> >> have to deal with structured types.". Due to this, it is very likely
> that
> >> some rules were written based on the fact that there are no structured
> >> types.
> >>
> >> Best,
> >> Stamatis
> >>
> >>
> >> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde 
> >> έγραψε:
> >>
> >>> Flattening was introduced mainly because the original engine used flat
> >>> column-oriented storage. Now we have several ways to executing,
> >>> including generating java code.
> >>>
> >>> Adding a mode to disable flattening might make sense.
> >>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang 
> >>> wrote:
> 
>  Hi Community,
> 
>  While trying to support Row type in Apache Beam SQL on top of Calcite,
> >> I
>  realized flattening Row logic will make structure information of Row
> >> lost
>  after Projections. There is a use case where users want to mix Beam
>  programming model with Beam SQL together to process a dataset. The
>  following is an example of the use case:
> 
>  dataset.apply(something user defined)
> .apply(SELECT ...)
> .apply(something user defined)
> 
>  As you can see, after the SQL statement is applied, the data structure
>  should be preserved for further processing.
> 
>  The most straightforward way to me is to make Struct fattening
> optional
> >>> so
>  I could choose to disable it and the Row structure is preserved. Can I
> >>> ask
>  if it is feasible to make it happen? What could happen if Calcite just
>  doesn't flatten Struct in flattener? (I tried to disable it but had
>  exceptions in optimizer. I wasn't sure if that were some minor thing
> to
> >>> fix
>  or Struct flattening was a design choice so the impact of change was
> >>> huge)
> 
>  Additionally, if there is a way to keep the information that I can use
> >> to
>  reconstruct the Row after projections, it might be ok as well. Does
> >> this
>  idea exist in Calcite? If it does not exist, how is this idea compared
> >>> with
>  disabling Struct flattening?
> 
>  Thanks,
>  Rui
> >>>
> >>
>
>


Re: [VOTE] [CALCITE-3559] Drop HydromaticFileSetCheck, upgrade Checkstyle

2019-12-09 Thread Stamatis Zampetakis
I mean that there are a few people who are a bit skeptical with the change
so it seems that more convincing elements are needed.

>From my side, I would like to note that the Checkstyle related problems
that I encountered are reproducible (see CALCITE-3581 [1]).
I cannot yet explain why but with the PR proposed by Vladimir the problem
seems to disappear.

[1] https://issues.apache.org/jira/browse/CALCITE-3581


On Sat, Dec 7, 2019 at 8:00 AM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Stamatis> I still think we can solve this here by discussing a bit more
>
> What do you mean by that?
>
> Stamatis> Actualy, I think that it was stuck every single time that there
> was an
> error.
>
> No idea. It does not stuck every time, and I have seen a lot of
> checkstyle-triggered build failures.
>


[jira] [Created] (CALCITE-3581) Gradle build hangs if there Checkstyle violations

2019-12-09 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created CALCITE-3581:


 Summary: Gradle build hangs if there Checkstyle violations
 Key: CALCITE-3581
 URL: https://issues.apache.org/jira/browse/CALCITE-3581
 Project: Calcite
  Issue Type: Bug
Affects Versions: 1.22.0
 Environment: 

Gradle 6.0.1


Build time:   2019-11-18 20:25:01 UTC
Revision: fad121066a68c4701acd362daf4287a7c309a0f5

Kotlin:   1.3.50
Groovy:   2.5.8
Ant:  Apache Ant(TM) version 1.10.7 compiled on September 1 2019
JVM:  1.8.0_212 (Oracle Corporation 25.212-b10)
OS:   Linux 5.0.0-36-generic amd64

Reporter: Stamatis Zampetakis


The problem is reproducible in the environment mentioned below and it suffices 
to create a simple checkstyle violation and then launching the following 
command:

{noformat}
./gradlew clean style
{noformat}

The stacktrace of GradleWrapperMain is shown in the attached file. 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Quicksql

2019-12-09 Thread Muhammad Gelbana
I recently contacted one of the active contributors asking about the
purpose of the project and here's his reply:

>From my understanding, Quicksql is a data virtualization platform. It can
> query multiple data sources altogether and in a distributed way; Say, you
> can write a SQL with a MySql table join with an Elasticsearch table.
> Quicksql can recognize that, and then generate Spark code, in which it will
> fetch the MySQL/ES data as a temporary table separately, and then join them
> in Spark. The execution is in Spark so it is totally distributed. The user
> doesn't need to aware of where the table is from.
>

I understand that the Spark convention Calcite has attempts to achieve the
same goal, but it isn't fully implemented yet.


On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde  wrote:

> Anyone know anything about Quicksql? It seems to be quite a popular
> project, and they have an internal fork of Calcite.
>
> https://github.com/Qihoo360/ 
>
>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> <
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> >
>
> Julian
>
>


Re: Updating the Website

2019-12-09 Thread Julian Hyde
We might be inventing requirements here, in order to justify a “cool” technical 
change.

I don’t think there is a strong requirement for multiple versions of the site. 
(Sure, it would be nice.)

This thread started with Stamatis pointing out that it was complicated to 
update the site. If we support multiple versions, will this actually make 
things less complicated?

Julian



> On Dec 9, 2019, at 1:23 PM, Stamatis Zampetakis  wrote:
> 
> In the short term we should try to do our best to follow the existing
> workflow.
> 
> In the medium term we shall hope that things will be easier with the
> automated build of the website.
> 
> In the longer term, I would really prefer to migrate towards a solution
> like the one proposed by Vladimir.
> As I also mentioned in a previous email, there are many projects who
> publish multiple versions of the doc and I find this very helpful.
> People usually wait some time before updating their libraries to the latest
> release; in this and other cases it is helpful to have a couple versions of
> the doc available online.
> 
> 
> On Sun, Dec 8, 2019 at 11:02 PM Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
> 
>> Francis>There are also links to Avatica docs in
>> Francis>the side bar and it would be a bit strange to have them always
>> point to
>> Francis>the master version of Avatica.
>> 
>> gradle.properties references the Avatica version, so we could print the
>> appropriate links.
>> 
>> Michael>that need to be made that are independent of a particular release
>> Michael>(e.g. adding a commiter)?
>> Michael>Would I go back and edit the previous
>> Michael>release branch?
>> 
>> No. You update committers on a master branch
>> 
>> Michael>Do we somehow label parts of the site as being
>> Michael>release-independent?
>> 
>> It makes little sense to discuss. The answer will be obvious once someone
>> tries.
>> 
>> Michael>Even if this is the case, consider when we might
>> Michael>have to correct documentation errors from a revious release
>> 
>> The current ASF rule is to have rel/... tag for each release.
>> That is the site build script could use rel/vX.Y tags to get "released
>> versions".
>> 
>> Then there are at least two strategies.
>> a) If we want to update documentation for calcite-1.10.0, then we could
>> release calcite-v1.10.1.
>> b) If a "silent" update is required (e.g. fix typo), then we could invent
>> "support/vX.Y" branches, and commit the fix to that branch.
>> 
>> Note: the current release process does not require a "release branch".
>> The build script does NOT create new commits to the source repository.
>> However, we could create one on-demand (e.g. in case we really need to
>> patch the old site version or back-port a fix)
>> 
>> Vladimir
>> 



Re: Updating the Website

2019-12-09 Thread Stamatis Zampetakis
In the short term we should try to do our best to follow the existing
workflow.

In the medium term we shall hope that things will be easier with the
automated build of the website.

In the longer term, I would really prefer to migrate towards a solution
like the one proposed by Vladimir.
As I also mentioned in a previous email, there are many projects who
publish multiple versions of the doc and I find this very helpful.
People usually wait some time before updating their libraries to the latest
release; in this and other cases it is helpful to have a couple versions of
the doc available online.


On Sun, Dec 8, 2019 at 11:02 PM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Francis>There are also links to Avatica docs in
> Francis>the side bar and it would be a bit strange to have them always
> point to
> Francis>the master version of Avatica.
>
> gradle.properties references the Avatica version, so we could print the
> appropriate links.
>
> Michael>that need to be made that are independent of a particular release
> Michael>(e.g. adding a commiter)?
> Michael>Would I go back and edit the previous
> Michael>release branch?
>
> No. You update committers on a master branch
>
> Michael>Do we somehow label parts of the site as being
> Michael>release-independent?
>
> It makes little sense to discuss. The answer will be obvious once someone
> tries.
>
> Michael>Even if this is the case, consider when we might
> Michael>have to correct documentation errors from a revious release
>
> The current ASF rule is to have rel/... tag for each release.
> That is the site build script could use rel/vX.Y tags to get "released
> versions".
>
> Then there are at least two strategies.
> a) If we want to update documentation for calcite-1.10.0, then we could
> release calcite-v1.10.1.
> b) If a "silent" update is required (e.g. fix typo), then we could invent
> "support/vX.Y" branches, and commit the fix to that branch.
>
> Note: the current release process does not require a "release branch".
> The build script does NOT create new commits to the source repository.
> However, we could create one on-demand (e.g. in case we really need to
> patch the old site version or back-port a fix)
>
> Vladimir
>


Re: License header vs package-info.java vs whitespace

2019-12-09 Thread Stamatis Zampetakis
In package-info.java files we need to add the Javadoc right above the
package declaration which is not the case for normal files.
I think that having an extra line between the header and the Javadoc
slightly (distinguishing the header from the beginning of the
documentation) improves the readability of the class.
The same pattern (extra line in package-info.java) appears in many other
projects so I always though its expected.

Having said that, we can also live without it so I'm fine with whatever
happens in the end.

Best,
Stamatis

On Mon, Dec 9, 2019 at 7:58 PM Rui Wang  wrote:

> >The inconsistency with style across different files is sad
>
> Yep. And to be clear I am not against an effort to unify the file style.
>
>
> Two tests attached as the following:
>
> (this is the test that replace the blank line with random string, and
> spotless just throw exception)
> $git diff
> diff --git
>
> a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
>
> b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
> index 29e0d30112..d46742bf45 100644
> ---
>
> a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
> +++
>
> b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
> @@ -15,6 +15,6 @@
>   * See the License for the specific language governing permissions and
>   * limitations under the License.
>   */
> -
> +dsfdsfdsafds
>  /** BeamSQL provides a new interface to run a SQL statement with Beam. */
>  package org.apache.beam.sdk.extensions.sql;
>
> $./gradlew :sdks:java:extensions:sql:spotlessCheck
> > Task :sdks:java:extensions:sql:spotlessJava FAILED
> Step 'google-java-format' found problem in
>
> 'sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java':
> null
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
>
>
>
> (this is the test that insert one more blank line so in total there are two
> blank lines)
> $git diff
> diff --git
>
> a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
>
> b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
> index 29e0d30112..c0049ac2cb 100644
> ---
>
> a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
> +++
>
> b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
> @@ -16,5 +16,6 @@
>   * limitations under the License.
>   */
>
> +
>  /** BeamSQL provides a new interface to run
>  package org.apache.beam.sdk.extensions.sql;
>
> $./gradlew :sdks:java:extensions:sql:spotlessCheck
> > Task :sdks:java:extensions:sql:spotlessJava FAILED
> FAILURE: Build failed with an exception.
> * What went wrong:
> Execution failed for task ':sdks:java:extensions:sql:spotlessJava'.
> > The following files had format violations:
>
>
> sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
>   @@ -16,6 +16,5 @@
>·*·limitations·under·the·License.
>·*/
>
>   -
>
>  /**·BeamSQL·provides·a·new·interface·to·run·a·SQL·statement·with·Beam.·*/
>package·org.apache.beam.sdk.extensions.sql;
>   Run 'gradlew spotlessApply' to fix these violations.
>
>
> -Rui
>
> On Mon, Dec 9, 2019 at 10:44 AM Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
>
> > >has one blank line after copyright
> >
> > Beam seems to have mixed style as well:
> > blank line before "package":
> >
> >
> https://github.com/apache/beam/blob/166c6de33f2491c4c9bd27511cc71e33f8d2a894/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring.groovy#L18
> > no blank before "package":
> >
> >
> https://github.com/apache/beam/blob/a2b0ad14f1525d1a645cb26f5b8ec45692d9d54e/examples/java/src/main/java/org/apache/beam/examples/subprocess/configuration/SubProcessConfiguration.java#L17-L18
> >
> > The inconsistency with style across different files is sad.
> >
> > >I did a test.
> >
> > Did you use package-info.java for the test?
> >
> > Here's my test:
> > $ git diff
> > diff --git
> > a/kafka/src/main/java/org/apache/calcite/adapter/kafka/package-info.java
> > b/kafka/src/main/java/org/apache/calcite/adapter/kafka/package-info.java
> > index 8805de348..77f66752d 100644
> > ---
> > a/kafka/src/main/java/org/apache/calcite/adapter/kafka/package-info.java
> > +++
> > b/kafka/src/main/java/org/apache/calcite/adapter/kafka/package-info.java
> > @@ -1,7 +1,6 @@
> >  /*
> >   * Licensed to the Apache Software Foundation (ASF) under one or more
> > - * contributor license agreements.  See the NOTICE file distributed with
> > - * this wor

Re: License header vs package-info.java vs whitespace

2019-12-09 Thread Rui Wang
>The inconsistency with style across different files is sad

Yep. And to be clear I am not against an effort to unify the file style.


Two tests attached as the following:

(this is the test that replace the blank line with random string, and
spotless just throw exception)
$git diff
diff --git
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
index 29e0d30112..d46742bf45 100644
---
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
+++
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
@@ -15,6 +15,6 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-
+dsfdsfdsafds
 /** BeamSQL provides a new interface to run a SQL statement with Beam. */
 package org.apache.beam.sdk.extensions.sql;

$./gradlew :sdks:java:extensions:sql:spotlessCheck
> Task :sdks:java:extensions:sql:spotlessJava FAILED
Step 'google-java-format' found problem in
'sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java':
null
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav



(this is the test that insert one more blank line so in total there are two
blank lines)
$git diff
diff --git
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
index 29e0d30112..c0049ac2cb 100644
---
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
+++
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
@@ -16,5 +16,6 @@
  * limitations under the License.
  */

+
 /** BeamSQL provides a new interface to run
 package org.apache.beam.sdk.extensions.sql;

$./gradlew :sdks:java:extensions:sql:spotlessCheck
> Task :sdks:java:extensions:sql:spotlessJava FAILED
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':sdks:java:extensions:sql:spotlessJava'.
> The following files had format violations:

sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/package-info.java
  @@ -16,6 +16,5 @@
   ·*·limitations·under·the·License.
   ·*/

  -

 /**·BeamSQL·provides·a·new·interface·to·run·a·SQL·statement·with·Beam.·*/
   package·org.apache.beam.sdk.extensions.sql;
  Run 'gradlew spotlessApply' to fix these violations.


-Rui

On Mon, Dec 9, 2019 at 10:44 AM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> >has one blank line after copyright
>
> Beam seems to have mixed style as well:
> blank line before "package":
>
> https://github.com/apache/beam/blob/166c6de33f2491c4c9bd27511cc71e33f8d2a894/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring.groovy#L18
> no blank before "package":
>
> https://github.com/apache/beam/blob/a2b0ad14f1525d1a645cb26f5b8ec45692d9d54e/examples/java/src/main/java/org/apache/beam/examples/subprocess/configuration/SubProcessConfiguration.java#L17-L18
>
> The inconsistency with style across different files is sad.
>
> >I did a test.
>
> Did you use package-info.java for the test?
>
> Here's my test:
> $ git diff
> diff --git
> a/kafka/src/main/java/org/apache/calcite/adapter/kafka/package-info.java
> b/kafka/src/main/java/org/apache/calcite/adapter/kafka/package-info.java
> index 8805de348..77f66752d 100644
> ---
> a/kafka/src/main/java/org/apache/calcite/adapter/kafka/package-info.java
> +++
> b/kafka/src/main/java/org/apache/calcite/adapter/kafka/package-info.java
> @@ -1,7 +1,6 @@
>  /*
>   * Licensed to the Apache Software Foundation (ASF) under one or more
> - * contributor license agreements.  See the NOTICE file distributed with
> - * this work for additional information regarding copyright ownership.
> + asdf lkasjhdf lkasjdhf lkasjhdlk h
>   * The ASF licenses this file to you under the Apache License, Version 2.0
>   * (the "License"); you may not use this file except in compliance with
>   * the License.  You may obtain a copy of the License at
>
> $ gw :kafka:spotlessCheck
> Building Apache Calcite 1.22.0-SNAPSHOT
> ...
> BUILD SUCCESSFUL in 1s
> 2 actionable tasks: 1 executed, 1 up-to-date
>
> Vladimir
>


Re: License header vs package-info.java vs whitespace

2019-12-09 Thread Vladimir Sitnikov
>has one blank line after copyright

Beam seems to have mixed style as well:
blank line before "package":
https://github.com/apache/beam/blob/166c6de33f2491c4c9bd27511cc71e33f8d2a894/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring.groovy#L18
no blank before "package":
https://github.com/apache/beam/blob/a2b0ad14f1525d1a645cb26f5b8ec45692d9d54e/examples/java/src/main/java/org/apache/beam/examples/subprocess/configuration/SubProcessConfiguration.java#L17-L18

The inconsistency with style across different files is sad.

>I did a test.

Did you use package-info.java for the test?

Here's my test:
$ git diff
diff --git
a/kafka/src/main/java/org/apache/calcite/adapter/kafka/package-info.java
b/kafka/src/main/java/org/apache/calcite/adapter/kafka/package-info.java
index 8805de348..77f66752d 100644
--- a/kafka/src/main/java/org/apache/calcite/adapter/kafka/package-info.java
+++ b/kafka/src/main/java/org/apache/calcite/adapter/kafka/package-info.java
@@ -1,7 +1,6 @@
 /*
  * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
+ asdf lkasjhdf lkasjdhf lkasjhdlk h
  * The ASF licenses this file to you under the Apache License, Version 2.0
  * (the "License"); you may not use this file except in compliance with
  * the License.  You may obtain a copy of the License at

$ gw :kafka:spotlessCheck
Building Apache Calcite 1.22.0-SNAPSHOT
...
BUILD SUCCESSFUL in 1s
2 actionable tasks: 1 executed, 1 up-to-date

Vladimir


Re: License header vs package-info.java vs whitespace

2019-12-09 Thread Rui Wang
In Apache Beam we actually using the style you mentioned: package-info.java
has one blank line after copyright.

I did a test. It turns out Spotless think it's a violation if there are
more than one blank line. Spotless can also fix it by deleting extra blank
lines and keep only one line. Another interesting finding is Spotless
accepts no blank line as well without complaining.


-Rui

On Mon, Dec 9, 2019 at 3:37 AM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Hi,
>
> I've noticed that package-info.java files have different copyright
> "formatting" than all the other Java files.
>
> The difference is as follows: package-info.java has a blank line after the
> copyright, and all the other files do not have that blank line.
>
> Sample blank like:
>
> https://github.com/apache/calcite/blob/c7c8ab17c3f4785f81dc02184034ebeb5e1d4ae1/core/src/main/java/org/apache/calcite/schema/impl/package-info.java#L17
> Typical Java file (no blank like):
>
> https://github.com/apache/calcite/blob/c7c8ab17c3f4785f81dc02184034ebeb5e1d4ae1/core/src/main/java/org/apache/calcite/schema/impl/StarTable.java#L16-L17
>
> I'm inclined to remove the blank lines to align package-info.java with the
> other .java files. Hopefully, it does not hurt aesthetically.
>
> Note the motivation there is not nit-picking, and I've discovered it when I
> was implementing an automated check for the headers.
>
> It turned out, the current Spotless does not support `package-info.java`
> files (it excludes the files no matter what), that is why we see no
> violations so far.
> I'm planning to replace Spotless with
> https://github.com/autostyle/autostyle to
> make the verifier and formatter more robust.
>
> Vladimir
>


Re: Re: Re: Volcano's problem with trait propagation: current state and future

2019-12-09 Thread Vladimir Ozerov
After looking at Hive implementation I have the impression that it doesn't
use Apache Calcite for physical planning, hence it doesn't have the
problems mentioned in this topic.

вс, 8 дек. 2019 г. в 18:55, Vladimir Ozerov :

> Hi Stamatis,
>
> Thank you for the idea about Hive. I looked at it some time ago and the
> codebase was substantially more complex to understand for me than in other
> projects, so I gave up. I'll try to do the analysis again.
> I'd like to mention that I also had a thought that maybe the
> implementation of a top-down optimization is not a concern of
> VolcanoPlanner, and the brand new planner may play well here. But from a
> practical perspective, of course, I keep a hope that we will find a less
> intrusive way to introduce efficient physical optimization into
> VolcanoPlanner :-)
>
> Regards,
> Vladimir.
>
> вс, 8 дек. 2019 г. в 12:42, Stamatis Zampetakis :
>
>> Thanks Vladimir for this great summary. It is really helpful to know how
>> the different projects use the optimizer and it certainly helps to
>> identify
>> limitations on our implementation.
>>
>> I cannot provide any valuable feedback at the moment since I have to find
>> some time to read more carefully your analysis.
>>
>> In the meantime, I know that Hive is also using Calcite for quite some
>> time
>> now so maybe you can get some new ideas (or complete your background
>> study)
>> by looking in their code.
>>
>> @Haisheng: I think many people did appreciate the discussion for pull up
>> traits so I wouldn't say that we abandoned it. I had the impression that
>> we
>> were waiting a design doc.
>>
>> In general it may not be feasible to cover all use cases with a single
>> optimizer. I wouldn't find it bad to introduce another planner if there
>> are
>> enough reasons to do so.
>>
>> Best,
>> Stamatis
>>
>>
>> On Fri, Dec 6, 2019, 11:00 AM Vladimir Ozerov  wrote:
>>
>> > "all we know is their *collations*" -> "all we know is their *traits*"
>> >
>> > пт, 6 дек. 2019 г. в 12:57, Vladimir Ozerov :
>> >
>> > > Hi Haisheng,
>> > >
>> > > Thank you for your response. Let me elaborate my note on join planning
>> > > first - what I was trying to say is not that rules on their own have
>> some
>> > > deficiencies. What I meant is that with current planner
>> implementation,
>> > > users tend to separate join planning from the core optimization
>> process
>> > > like this in the pseudo-code below. As a result, only one join
>> > permutation
>> > > is considered during physical planning, even though join rule may
>> > > potentially generate multiple plans worth exploring:
>> > >
>> > > RelNode optimizedLogicalNode = doJoinPlanning(logicalNode);
>> > > RelNode physicalNode = doPhysicalPlanning(optimizedLogicalNode);
>> > >
>> > > Now back to the main question. I re-read your thread about on-demand
>> > trait
>> > > propagation [1] carefully. I'd like to admit that when I was reading
>> it
>> > for
>> > > the first time about a month ago, I failed to understand some details
>> due
>> > > to poor knowledge of different optimizer architectures. Now I
>> understand
>> > it
>> > > much better, and we definitely concerned with exactly the same
>> problem. I
>> > > feel that trait pull-up might be a step in the right direction,
>> however,
>> > it
>> > > seems to me that it is not the complete solution. Let me try to
>> explain
>> > why
>> > > I think so.
>> > >
>> > > The efficient optimizer should try to save CPU as much as possible
>> > because
>> > > it allows us to explore more plans in a sensible amount of time. To
>> > achieve
>> > > that we should avoid redundant operations, and detect and prune
>> > inefficient
>> > > paths aggressively. As far as I understand the idea of trait pull-up,
>> we
>> > > essentially explore the space of possible physical properties of
>> children
>> > > nodes without forcing their implementation. But after that, the
>> Calcite
>> > > will explore that nodes again, now in order to execute implementation
>> > > rules. I.e. we will do two dives - one to enumerate the nodes (trait
>> > > pull-up API), and the other one to implement them (implementation
>> rules),
>> > > while in Cascades one dive should be sufficient since exploration
>> invokes
>> > > the implementation rules as it goes. This is the first issue I see.
>> > >
>> > > The second one is more important - how to prune inefficient plans?
>> > > Currently, nodes are implemented independently and lack of context
>> > doesn't
>> > > allow us to estimates children's costs when implementing the parent,
>> > hence
>> > > branch-and-bound is not possible. Can trait pull-up API
>> > "List
>> > > deriveTraitSets(RelNode, RelMetadataQuery)" help us with this? If the
>> > > children nodes are not implemented before the pull-up, all we know is
>> > their
>> > > collations, but not their costs. And without costs, pruning is not
>> > > possible. Please let me know if I missed something from the proposal.
>> > >
>> > > The possible architecture I had in

License header vs package-info.java vs whitespace

2019-12-09 Thread Vladimir Sitnikov
Hi,

I've noticed that package-info.java files have different copyright
"formatting" than all the other Java files.

The difference is as follows: package-info.java has a blank line after the
copyright, and all the other files do not have that blank line.

Sample blank like:
https://github.com/apache/calcite/blob/c7c8ab17c3f4785f81dc02184034ebeb5e1d4ae1/core/src/main/java/org/apache/calcite/schema/impl/package-info.java#L17
Typical Java file (no blank like):
https://github.com/apache/calcite/blob/c7c8ab17c3f4785f81dc02184034ebeb5e1d4ae1/core/src/main/java/org/apache/calcite/schema/impl/StarTable.java#L16-L17

I'm inclined to remove the blank lines to align package-info.java with the
other .java files. Hopefully, it does not hurt aesthetically.

Note the motivation there is not nit-picking, and I've discovered it when I
was implementing an automated check for the headers.

It turned out, the current Spotless does not support `package-info.java`
files (it excludes the files no matter what), that is why we see no
violations so far.
I'm planning to replace Spotless with https://github.com/autostyle/autostyle to
make the verifier and formatter more robust.

Vladimir