Re: Offset and fetch

2022-11-06 Thread Haisheng Yuan
There is a similar story behind.
https://issues.apache.org/jira/browse/CALCITE-4058

It appears difficult to achieve consensus in the community. However, I don't 
think there is any harm to do it in your own project.

On 2022/11/07 06:19:21 Sasha Syrotenko wrote:
> Hi, Calcite community!
> I have a story offset and fetch fields in the Sort operator - I want to
> extract them into
> a separate Limit rel (in my project), but I wonder -- does it make sense.
> Will I make a mistake, if I do this extraction, in your opinion? Is there
> any story behind,
> which may refrain me to do that?
> Thank you!
> 
> Sincerely, Sasha S.
> 


Re: Returning the Intermediate Representation of the rewrite rules

2022-11-06 Thread Haisheng Yuan
No, there is no API to return rule output.
But there is a class RuleEventLogger, which can log the input and output of the 
rule.

On 2022/11/02 16:53:53 "G.O. Barbulescu" wrote:
> Dear Apache Calcite development team,
> 
> I am currently working on a research project in which I am considering Apache 
> Calcite for the parsing stage. I was wondering if there exists an API to 
> return the intermediate representation (relational tree) for the pre-defined 
> SQL rewrite rules?
> For instance, a relational tree for push-filter-though-join rule in format 
> (LHS relational tree => RHS relational tree).
> The motivation behind this is to have a set of rewrite rules in the same 
> format (IR) as the output of opTree.explain() to build new optimizers.
> 
> Thanks for your time.
> 
> Many thanks,
> George Barbulescu
> 
> 
> 


Re: Question Regarding Volcano Planner

2022-08-28 Thread Haisheng Yuan
You need to first investigate whether the transformation rules or operators are 
correctly implemented or not. Typically there is a bug if you see the planner 
fails to terminate.

There is cancelFlag variable and checkCancel() method in VolcanoPlanner.
You can pass your own CancelFlag implementation to Context when constructing 
planner [1] so that you can interrupt the planning when it times out or the 
cost the greater or lower than some threshold.

[1] 
https://github.com/apache/calcite/blob/e2f949d5d6cff79cbe8565bc3e85f437dee9fd7e/core/src/main/java/org/apache/calcite/plan/AbstractRelOptPlanner.java#L102

On 2022/08/26 20:29:15 Pranav Deshpande wrote:
> Dear Apache Calcite Dev Team,
> Sometimes my volcano planner fails to terminate because there are many
> plans to search for (millions of them).
> 
> Is there a way to prune the search if a specific amount of time has elapsed
> or if the same net cost is obtained for the Xth time, etc.
> 
> Requesting the community's guidance on this.
> 
> Thanks & Regards,
> Pranav
> 


Re: [ANNOUNCE] Andrei Sereda joins Calcite PMC

2022-08-12 Thread Haisheng Yuan
Congratulations, Andrei!

On 2022/08/12 19:29:56 Michael Mior wrote:
> Congratulations and welcome Andrei!
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> On Fri, Aug 12, 2022 at 1:48 PM Ruben Q L  wrote:
> 
> > I am pleased to announce that Andrei has accepted an invitation to join
> > the Calcite PMC. Andrei has been a consistent and helpful figure in
> > the Calcite community for which we are very grateful. We look forward to
> > the continued contributions and support.
> >
> > Please join me in congratulating Andrei!
> >
> > Ruben (on behalf of the Calcite PMC)
> >
> 


[ANNOUNCE] New committer: Jing Zhang

2022-07-03 Thread Haisheng Yuan
Apache Calcite's Project Management Committee (PMC) has invited Jing Zhang
to
become a committer, and we are pleased to announce that she has accepted.

Since Dec 2017, Jing has been an active and continuous contributor to the
Apache
Calcite project. She has pushed high quality patches, fixing and improving
code
around streaming plan.

Jing, welcome, thank you for your contributions, and we look forward to your
further interactions with the community! If you wish, please feel free to
tell
us more about yourself and what you are working on.

Haisheng (on behalf of the Apache Calcite PMC)


[ANNOUNCE] New committer: Benchao Li

2022-07-03 Thread Haisheng Yuan
Apache Calcite's Project Management Committee (PMC) has invited Benchao Li
to
become a committer, and we are pleased to announce that he has accepted.

Benchao has pushed a lot of high quality patches, fixing and improving code
around plan simplification and rules. Apart from code contributions, he has
been regularly involved in the discussion and helping out others in the
mailing
list.

Benchao, welcome, thank you for your contributions, and we look forward to
your
further interactions with the community! If you wish, please feel free to
tell
us more about yourself and what you are working on.

Haisheng (on behalf of the Apache Calcite PMC)


Re: [DISCUSS] How we choose a PMC chair

2022-07-03 Thread Haisheng Yuan
+1 to the improvement.

On 2022/07/04 00:13:58 Francis Chuang wrote:
> +1 I think this is a good idea. We have a lot of capable PMC members and 
> it would be of great benefit to the project if all of them were 
> considered during the PMC chair selection process.
> 
> On 4/07/2022 9:46 am, Julian Hyde wrote:
> > As you know, Calcite has a tradition of choosing a new PMC chair (VP)
> > each year, around the anniversary of the project's graduation[1][2]. I
> > think this is a great tradition, but I'd like to discuss an
> > improvement to that process.
> > 
> > (I'm starting the conversation now - several months after the previous
> > vote, and several months before the next - so that it's clear that I
> > am not criticizing the process or the outcome or previous votes.)
> > 
> > I've noticed that the outgoing chair sends an email on dev@ saying
> > words to the following effect:
> > 
> >I think Xyz would be a great person to succeed me.
> >What do you all think?
> > 
> > (I fear that I may have started this tradition when, at the end of my
> > tenure as first chair, I approached Jesus and asked him whether he'd
> > be prepared to do the job[3]. Mea culpa.)
> > 
> > After such an outright endorsement, especially on a public list, it
> > would be churlish for someone to reply "Actually, I think Abc would be
> > better." As a result, it's rather difficult to have an open debate,
> > and the candidate selected by the outgoing chair tends to win
> > unopposed.
> > 
> > I suggest that the outgoing chair says something like
> > 
> >It's time to change the PMC chair.
> >Please send nominations to private@ and the PMC will discuss and vote.
> > 
> > That would allow for several nominations, allow people to give reasons
> > why they prefer a candidate (without disparaging other candidates),
> > and lead to a more informed outcome.
> > 
> > What do you think? Are there any other aspects of the election process
> > we should change?
> > 
> > Julian
> > 
> > [1] https://lists.apache.org/thread/rmj9qm9wlol3nb7z4phddoljbgvypkrt
> > [2] https://lists.apache.org/thread/5tzb8w655pj2vo9omz20th5jnbn9zww7
> > [3] https://lists.apache.org/thread/y4wjdj5h1y3sypnlmhpoz9r6bkk3cv6o
> 


Re: [ANNOUNCE] Vladimir Ozerov joins Calcite PMC

2022-05-25 Thread Haisheng Yuan
Congratulations Vladimir!

On 2022/05/25 06:27:16 Michael Mior wrote:
> Congratulations Vladimir!
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> Le mar. 24 mai 2022 à 16:47, Ruben Q L  a écrit :
> 
> > I am pleased to announce that Vladimir has accepted an invitation to join
> > the Calcite PMC. Vladimir has been a consistent and helpful figure in the
> > Calcite community for which we are very grateful. We look forward to the
> > continued contributions and support.
> >
> > Please join me in congratulating Vladimir!
> >
> > - Ruben (on behalf of the Calcite PMC)
> >
> 


Re: [ANNOUNCE] Chunwei Lei joins Calcite PMC

2022-05-25 Thread Haisheng Yuan
Congratulations, Chunwei!

On 2022/05/25 06:26:35 Michael Mior wrote:
> Congratulations and thank you Chunwei!
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> Le mar. 24 mai 2022 à 16:47, Ruben Q L  a écrit :
> 
> > I am pleased to announce that Chunwei has accepted an invitation to join
> > the
> >  Calcite PMC. Chunwei has been a consistent and helpful figure in the
> > Calcite community for which we are very grateful. We look forward to the
> > continued contributions and support.
> >
> > Please join me in congratulating Chunwei!
> >
> > - Ruben (on behalf of the Calcite PMC)
> >
> 


Re: Draft: board report for 2022 Q1

2022-04-01 Thread Haisheng Yuan
It looks good to me, thanks for preparing the draft, Ruben.

Haisheng

On 2022/04/01 11:17:19 Ruben Q L wrote:
> Hello,
> 
> Below these lines you can find a draft of this quarter's board report. I
> plan to submit it
> at the beginning of next week.
> Please let me know if you have any additions or corrections.
> 
> 
> ## Description:
> Apache Calcite is a highly customizable framework for parsing and planning
> queries on data in a wide variety of formats. It allows database-like
> access,
> and in particular a SQL interface and advanced query optimization, for data
> not residing in a traditional database.
> 
> Avatica is a sub-project within Calcite and provides a framework for
> building
> local and remote JDBC and ODBC database drivers. Avatica has an independent
> release schedule and its own repository.
> 
> ## Issues:
> There are no issues requiring board attention.
> 
> ## Membership Data:
> Apache Calcite was founded 2015-10-22 (6 years ago).
> There are currently 56 committers and 23 PMC members in this project.
> The Committer-to-PMC ratio is roughly 7:3.
> 
> Community changes, past quarter:
> - No new PMC members. Last addition was Ruben Q L on 2020-08-10.
> - No new committers. Last addition was Alessandro Solimando on 2021-12-17.
> 
> ## Project Activity:
> Calcite 1.30.0 was released on 2022-03-20, it contains contributions from 29
> authors, and resolves 36 issues. Among others, it is worth highlighting the
> following: Babel parser support MySQL NULL-safe equal operator <=>; support
> SQL hints for temporal table join; fluent test fixtures so that dependent
> projects can write parser, validator and rules tests; and fix for
> vulnerability issue CVE-2021-27568.
> 
> Apache Calcite Avatica Go 5.1.0 was released on 2022-03-27, it is a
> maintenance release of Avatica Go with dependency updates and some minor bug
> fixes.
> 
> ## Community Health:
> The overall activity in the community has increased significantly on the dev
> email list, which had a 137% increase in traffic in the past quarter (841
> emails compared to 354). Many of these threads were new or recently joined
> community members asking technical questions or requesting code reviews for
> their first contributions, which suggests that Apache Calcite is being used
> more broadly by increasingly more projects. On the contrary, the activity on
> Jira and Github has decreased (-36% / -34% issues opened / closed in the
> former; -24% / -23% PRs opened / closed in the latter). This could be
> explained in part by the development pause during winter vacation and/or a
> particularly long code freeze period that was required in order to produce
> several RCs before Calcite 1.30.0 was released.
> 
> Apart from that, there was a
> small virtual key signing party; and it is being discussed to organize these
> events on a more regular basis to expand the web of trust within the
> project.
> 
> The number of non-committer (contributor) commits per month:
> +-+-+-+
> |year |month| contributor_commits |
> +-+-+-+
> | 2022| 1   | 16  |
> | 2022| 2   | 10  |
> | 2022| 3   | 4   |
> +-+-+-+
> 
> The number of active reviewers per month:
> +-+-+-+
> |year |month|  active_reviewers   |
> +-+-+-+
> | 2022| 1   | 7   |
> | 2022| 2   | 5   |
> | 2022| 3   | 3   |
> +-+-+-+
> 
> Top reviewers in the last 3 months:
> +--+-+
> |  committer   | reviews |
> +--+-+
> | Julian Hyde| 8   |
> | Stamatis Zampetakis   | 8   |
> | NobiGo  | 6   |
> | Alessandro Solimando | 2   |
> | rubenada  | 1   |
> | Feng Zhu  | 1   |
> | XuQianJin-Stars| 1   |
> | yuzhao.cyz | 1   |
> | Andrei Sereda <25229979+asereda...@users.noreply.github.com> | 1   |
> | Jacques Nadeau   | 1   |
> +--+-+
> 


Re: [DISCUSS] Github PR link to JIRA issue

2022-04-01 Thread Haisheng Yuan
Yes, that is what I meant.

I was expecting to see the link to JIRA in PR's commit page:
https://github.com/apache/calcite/pull/2752/commits

I am lazy to navigate to JIRA manually. :)

Haisheng

On 2022/04/01 01:04:19 Julian Hyde wrote:
> Ah, I see. In the page for a commit (e.g. [1]) the text “[CALCITE-5064]” 
> appears as a blue hyperlink whereas the page for the corresponding pull 
> request (e.g. [2]) the “[CALCITE-5064]” text is not a hyperlink.
> 
> Is that what you meant, Haisheng?
> 
> Julian
> 
> 
> [1] 
> https://github.com/apache/calcite/commit/d85b2a602a547290bd5be0bba68092b702400731
>  
> <https://github.com/apache/calcite/commit/d85b2a602a547290bd5be0bba68092b702400731>
> 
> [2] https://github.com/apache/calcite/pull/2752 
> <https://github.com/apache/calcite/pull/2752> 
> 
> > On Mar 31, 2022, at 11:05 AM, Michael Mior  wrote:
> > 
> > Stamatis,
> > 
> > Unless I'm misunderstanding, Haisheng was referring to links in the
> > opposite direction. That is, linking to JIRA issues from GitHub. The
> > setting you reference is for creating a link from JIRA to GitHub.
> > 
> > --
> > Michael Mior
> > mm...@apache.org
> > 
> > 
> > Le jeu. 31 mars 2022 à 13:53, Stamatis Zampetakis  a
> > écrit :
> > 
> >> It's already done via .asf.yaml [1].
> >> 
> >> Best,
> >> Stamatis
> >> 
> >> [1]
> >> 
> >> https://github.com/apache/calcite/blob/88cc385f98c551c1aca7ffab101934f1c34fdffd/.asf.yaml#L35
> >> 
> >> On Thu, Mar 31, 2022 at 7:19 PM Michael Mior  wrote:
> >> 
> >>> Not sure what might have changed, but here's the GitHub documentation on
> >>> the feature. If this isn't working as expected, I would contact INFRA to
> >>> make sure things are correctly configured. (Apparently in the future,
> >> this
> >>> may be done via .asf.yaml)
> >>> 
> >>> 
> >>> 
> >> https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/managing-repository-settings/configuring-autolinks-to-reference-external-resources
> >>> 
> >>> 
> >>> --
> >>> Michael Mior
> >>> mm...@apache.org
> >>> 
> >>> 
> >>> Le mer. 30 mars 2022 à 13:46, Haisheng Yuan  a écrit :
> >>> 
> >>>> Hi all,
> >>>> 
> >>>> Previously, the JIRA issue e.g. [CALCITE-6789] in Calcite github PR and
> >>>> commit message will be automatically linked to the JIRA site. Now there
> >>> is
> >>>> no link anymore.
> >>>> 
> >>>> Does anyone know what happened? What can we do to add the link back?
> >>>> 
> >>>> Thanks,
> >>>> Haisheng Yuan
> >>>> 
> >>> 
> >> 
> 
> 


Re: [DISCUSS] Github PR link to JIRA issue

2022-03-30 Thread Haisheng Yuan
Hi Thomas,

I didn't make myself clear. 6789 is just a random number, should be 
[CALCITE-], as Julian mentioned.

The problem happens on github PR. e.g. 
in https://github.com/apache/calcite/pull/2757, the commit message title is
[CALCITE-4401] SqlJoin toString throws RuntimeException

Previously "[CALCITE-4401]" was linked to Caclite JIRA website automatically. 
Now it is not linked anymore.
I have to manually visit the JIRA website.
 
Anyway, hope it is just a sporadic issue.

On 2022/03/30 18:04:00 Thomas Rebele wrote:
> When I visit https://issues.apache.org/jira/browse/CALCITE-6789, I get an
> error message:
> 
> You can't view this issue
> It may have been deleted or you don't have permission to view it.
> 
> I can view other tickets, though (e.g.,
> https://issues.apache.org/jira/browse/CALCITE-5008).
> Maybe the link has been removed because the ticket is not available?
> 
> Cordialement / Best Regards,
> *Thomas Rebele, PhD* | R Developer | Germany | www.tibco.com
> 
> 
> On Wed, Mar 30, 2022 at 7:57 PM Julian Hyde  wrote:
> 
> > It seems to happen sporadically. Maybe the commit message of the PR needs
> > to start with exactly “[CALCITE-]”.
> >
> > > On Mar 30, 2022, at 10:45 AM, Haisheng Yuan  wrote:
> > >
> > > Hi all,
> > >
> > > Previously, the JIRA issue e.g. [CALCITE-6789] in Calcite github PR and
> > > commit message will be automatically linked to the JIRA site. Now there
> > is
> > > no link anymore.
> > >
> > > Does anyone know what happened? What can we do to add the link back?
> > >
> > > Thanks,
> > > Haisheng Yuan
> >
> >
> 


[DISCUSS] Github PR link to JIRA issue

2022-03-30 Thread Haisheng Yuan
Hi all,

Previously, the JIRA issue e.g. [CALCITE-6789] in Calcite github PR and
commit message will be automatically linked to the JIRA site. Now there is
no link anymore.

Does anyone know what happened? What can we do to add the link back?

Thanks,
Haisheng Yuan


Re: [VOTE] Release Apache Calcite 1.30.0 (release candidate 3)

2022-03-17 Thread Haisheng Yuan
Thanks Liya for being release manager. This is a lot of work.

- Checksum and signature: OK
- Checked release note: OK
- Gradle Test: OK

+1 (binding)

Best,
Haisheng

On 2022/03/16 03:36:10 Fan Liya wrote:
> Hi all,
> 
> I have created a build for Apache Calcite 1.30.0, release
> candidate 3.
> 
> Thanks to everyone who has contributed to this release.
> 
> You can read the release notes here:
> https://github.com/apache/calcite/blob/calcite-1.30.0-rc3/site/_docs/history.md
> 
> The commit to be voted upon:
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=cd5229b9a371e20de207231d55bcbddf1ff39ec2
> 
> Its hash is cd5229b9a371e20de207231d55bcbddf1ff39ec2
> 
> Tag:
> https://github.com/apache/calcite/tree/calcite-1.30.0-rc3
> 
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.30.0-rc3
> (revision 53122)
> 
> The hashes of the artifacts are as follows:
> 6c9f65bcc75a05c226dbb0a74495e76761d738f6923086e2943fface1b8864697dff6d0de7aa0cb9b06f4f6aebe1322b5ecb829428f84d08c51dfb5773034040
> *apache-calcite-1.30.0-src.tar.gz
> 
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachecalcite-1153/org/apache/calcite/
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/COMMITTER_ID.asc
> https://www.apache.org/dist/calcite/KEYS
> 
> To create the jars and test Apache Calcite: "gradle build"
> (requires an appropriate Gradle/JDK installation)
> 
> Please vote on releasing this package as Apache Calcite 1.30.0.
> 
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Calcite 1.30.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
> 
> Here is my vote:
> 
> +1 (non-binding)
> 
> Best,
> Liya Fan
> 


Re: [Help] UNION queries with one or more SELECTs containing a JOIN are significantly slower after 1.24

2022-03-11 Thread Haisheng Yuan
Hi Zack,

Looks like it is a regression.
Are you able to provide a reproducible test case? You can log a JIRA along with 
the test case, so people can do the root cause analysis.

Thanks,
Haisheng Yuan

On 2022/03/12 01:26:53 "Gramana, Zachary (GE Digital)" wrote:
> Hello all!
> 
> I would appreciate some help/guidance with troubleshooting a big performance 
> regression we saw on our UNION queries after updating our application from 
> 1.24 to 1.29. Many of the tables in the queries are wide (> 50 columns) so we 
> use a custom push-down Project class. These custom Projects are being 
> eliminated in 1.29 UNION queries--though not when replacing UNION with UNION 
> ALL.
> 
> The examples below illustrate minimal queries that reproduce the issue our 
> test database.
> 
> When compared to the UNION query times using 1.24, the prepared UNION 
> statements take 2-2.5 times longer in 1.29. The prepare time is around 2 
> times longer as well.
> 
> In the examples below, the UNION and UNION ALL queries generate the same 
> result set in our test database. All Calcite tables have default collations 
> ("ISO-8859-1$en_US"). Several are rather wide, with many > 50 columns.
> 
> In 1.29 for a simple query, both UNION and UNION ALL produce the same query 
> plan, with the exception of EnumerableUnion's `all` parameter:
> 
> 1.29
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION ALL
> SELECT Id FROM t2
> -
> EnumerableUnion(all=[true])
>   MyEnumerableConverter
> MyProject(Id=[$0])
>   MyTableScan(table=[[MySchema, t1]])
>   MyEnumerableConverter
> MyProject(Id=[$0])
>   MyTableScan(table=[[MySchema, t2]])
> 
> 1.29
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION
> SELECT Id FROM t2
> -
> EnumerableUnion(all=[false])
>   MyEnumerableConverter
> MyProject(Id=[$0])
>   MyTableScan(table=[[MySchema, t1]])
>   MyEnumerableConverter
> MyProject(Id=[$0])
>   MyTableScan(table=[[MySchema, t2]])
> 
> The next examples add a JOIN to one of the tables in order to make this a 
> slightly more complicated query and trigger the issue. Unlike in 1.24, the 
> plans in 1.29 for UNION and UNION ALL now vary significantly:
> 
> 1.24 (like the above queries,  the UNION and UNION ALL plans differ only by 
> the `all` parameter, so the UNION ALL plan for 1.24 is omitted)
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION
> SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
> -
> EnumerableUnion(all=[false])
>   MyEnumerableConverter
> MyProject(Id=[$0])
>   MyTableScan(table=[[MySchema, t1]])
>   EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1])
> EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
>   EnumerableSort(sort0=[$0], dir0=[ASC])
> EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT 
> NOT NULL], t3_Id0=[$t263])
>   MyEnumerableConverter
> MyTableScan(table=[[MySchema, t2]])
>   EnumerableSort(sort0=[$0], dir0=[ASC])
> MyEnumerableConverter
>   MyProject(Id=[$0])
> MyTableScan(table=[[MySchema, t3]])
> 
> 1.29 (note this UNION ALL plan is nearly identical plan to the above `UNION` 
> plan in 1.24)
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION ALL
> SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
> -
> EnumerableUnion(all=[true])
>   MyEnumerableConverter
> MyProject(Id=[$0])
>   MyTableScan(table=[[MySchema, t1]])
>   EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1])
> EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
>   EnumerableSort(sort0=[$0], dir0=[ASC])
> EnumerableCalc(expr#0..262=[{inputs}], expr#263=[CAST($t178):BIGINT 
> NOT NULL], t3_Id0=[$t263])
>   MyEnumerableConverter
> MyTableScan(table=[[MySchema, t2]])
>   EnumerableSort(sort0=[$0], dir0=[ASC])
> MyEnumerableConverter
>   MyProject(Id=[$0])
> MyTableScan(table=[[MySchema, t3]])
> 
> 1.29 (note the loss of the `MyProject` node above t3's MyTableScan, and now 
> the inputs for the post-MergeJoin EnumerableCalc node are very wide compared 
> to before)
> EXPLAIN PLAN FOR
> SELECT Id FROM t1
> UNION
> SELECT t3.Id FROM t2 JOIN t3 ON (t3.Id = t2.t3_Id)
> ---

Re: Allow Cascades driver invoking "derive" on the nodes produced by "passThrough"

2022-03-07 Thread Haisheng Yuan
Hi Vladimir,

If I understand it correctly, your concern is still about [1], right?
If it is, I think I have answered it in [2], perhaps it didn't jump into your 
inbox.

I am still not convinced that enforcing derivation on all rels is the only way 
to solve your concern.

If it is the other problem, can you give us a concrete example? So we can 
discuss and solve it.

Thanks,
Haisheng Yuan

[1] https://lists.apache.org/thread/4rcvlk1oprbnbgbnwl2s735p99h4vj80
[2] https://lists.apache.org/thread/s17tsj3pmccfrfvydnmv76btc3voxohj

On 2022/02/13 09:40:05 Roman Kondakov wrote:
> Hi Alessandro,
> 
> this problem was already discussed on dev-list [1] and we have a ticket 
> for this [2].
> 
> My concern is that many projects use Calcite as a Lego kit: they took 
> internal components of Calcite and combine them for building a custom 
> planning and execution pipeline. And sometimes downstream projects need 
> to change the default behavior of internal components to fit their 
> requirements or overcome the bug. So the idea of keeping even internal 
> components of Calcite "more public" is rather a good thing than the bad 
> one from my point of view.
> 
> Thank you.
> 
> [1] https://lists.apache.org/thread/cykl74dcphgow4790fwoc8frsjglz7n1
> 
> [2] https://issues.apache.org/jira/browse/CALCITE-4542
> 
> --
> Roman Kondakov
> 
> 
> On 11.02.2022 19:15, Alessandro Solimando wrote:
> > Hello everyone,
> > @Vladimir, +1 on the change introducing "enforceDerive()".
> >
> > @Roman, could you walk us through the limitations you found that forced you
> > to copy-paste the whole class?
> >
> > Maybe there is some middle ground for your problem(s) too, similar in
> > spirit to what Vladimir proposed for the other limitation.
> >
> > I am not against making the class more public if necessary, but it would be
> > nice to have a discussion here before going down that path.
> > If the discussion leads to a better design of the original class, all
> > projects would benefit from that.
> >
> > Best regards,
> > Alessandro
> >
> > On Fri, 11 Feb 2022 at 04:14, Roman Kondakov 
> > wrote:
> >
> >> Hi Vladimir,
> >>
> >> +1 for making the rule driver more public. We've faced similar problems
> >> in the downstream project. The solution was to copy and paste the
> >> TopDownRuleDrive code with small fixes since it was not possible to
> >> override the default behavior.
> >>
> >> --
> >> Roman Kondakov
> >>
> >>
> >> On 11.02.2022 02:50, Vladimir Ozerov wrote:
> >>> Hi,
> >>>
> >>> In the Cascades driver, it is possible to propagate the requests top-down
> >>> using the "passThrough", method and then notify parents bottom-up about
> >> the
> >>> concrete physical implementations of inputs using the "derive" method.
> >>>
> >>> In some optimizers, the valid parent node cannot be created before the
> >>> trait sets of inputs are known. An example is a custom distribution trait
> >>> that includes the number of shards in the system. The parent operator
> >> alone
> >>> may guess the distribution keys, but cannot know the number of input
> >>> shards. To mitigate this, you may create a "template" node with an
> >> infinite
> >>> cost from within the optimization rule that will propagate the
> >>> passThrough/drive calls but would never participate in the final plan.
> >>>
> >>> Currency, the top-down driver designed in a way that the nodes created
> >> from
> >>> the "passThrough" method are not notified on the "derive" stage. This
> >> leads
> >>> to the incomplete exploration of the search space. For example, the rule
> >>> may produce the node "A1.template" that will be converted into a normal
> >>> "A1" node in the derive phase. However, if the parent operator produced
> >>> "A2.template" from "A1.template" using pass-through mechanics, the
> >>> "A2.template" will never be notified about the concrete input traits,
> >>> possibly losing the optimal plan. This is especially painful in
> >> distributed
> >>> engines, where the number of shards is important for the placement of
> >>> Shuffle operators.
> >>>
> >>> It seems that the problem could be solved with relatively low effort. The
> >>> "derive" is not invoked on the nodes 

[ANNOUNCE] New Calcite PMC chair: Ruben Q L

2022-01-19 Thread Haisheng Yuan
Calcite community members,

I am pleased to announce that we have a new PMC chair and VP as per our
tradition of rotating the chair once a year. I have resigned, and Ruben Q L
was duly elected by the PMC and approved unanimously by the Board.

Please join me in congratulating Ruben!

Best,
Haisheng Yuan


Re: [DISCUSS] Draft board report for Jan 2022

2022-01-08 Thread Haisheng Yuan
I will update the report accordingly.

Thanks,
Haisheng

On 2022/01/07 19:39:57 Julian Hyde wrote:
> I would add that we are rotating the PMC chair, continuing our annual 
> rotation tradition. Thanks for serving, Haisheng.
> 
> The statement 'overall the discussion is good for community development and 
> health’ is a bromide. Some of the discussions reached consensus, others 
> merely reached a conclusion. But one major and long-time committer resigned 
> from the project. The best we can say is that we are trying to learn from the 
> situation.
> 
> Julian
>  
> 
> > On Jan 7, 2022, at 9:32 AM, Haisheng Yuan  wrote:
> > 
> > Attached below is a draft of this month's board report. I plan to submit it
> > on Jan 11.
> > Please let me know if you have additions or corrections.
> > 
> > ## Description:
> > Apache Calcite is a highly customizable framework for parsing and planning
> > queries on data in a wide variety of formats. It allows database-like
> > access,
> > and in particular a SQL interface and advanced query optimization, for data
> > not
> > residing in a traditional database.
> > 
> > Avatica is a sub-project within Calcite and provides a framework for
> > building
> > local and remote JDBC and ODBC database drivers. Avatica has an independent
> > release schedule and its own repository.
> > 
> > ## Issues:
> > There are no issues requiring board attention.
> > 
> > ## Membership Data:
> > Apache Calcite was founded 2015-10-21 (6 years ago)
> > There are currently 56 committers and 23 PMC members in this project.
> > The Committer-to-PMC ratio is roughly 7:3.
> > 
> > Community changes, past quarter:
> > - No new PMC members. Last addition was Ruben Q L on 2020-08-09.
> > - Alessandro Solimando was added as committer on 2021-12-17.
> > - Xiong Duan was added as committer on 2021-10-18.
> > 
> > ## Project Activity:
> > Calcite 1.28.0 was released on 2021-10-19, with new features including
> > the UNIQUE sub-query predicate, the MODE aggregate function,
> > PERCENTILE_CONT and PERCENTILE_DISC inverse distribution functions,
> > an Exasol dialect for the JDBC adapter, and improvements to
> > materialized view recognition.
> > 
> > Calcite 1.29.0 was released on 2021-12-26, which upgrades log4j2 to
> > 2.17.0 to fix security vulnerabilities.
> > 
> > Calcite Avatica 1.19.0 was released on 2021-10-11, which adds support
> > for BIT and NULL data types, fixes issues with values of type ARRAY.
> > 
> > Calcite Avatica 1.20.0 was released on 2021-12-13, which upgrades Log4j2 to
> > version 2.15.0 (to address CVE-2021-44228), and makes the SPNEGO
> > protocol much more efficient.
> > 
> > ## Community Health:
> > The overall activity in the community has increased slightly in the past
> > few months, specifically 20% more commits, 8% more closed PRs on GitHub.
> > 
> > There are some discussions about the proposal of changing workflow, e.g.
> > github issues vs JIRAs, merging Avatica with Calcite, people argued with
> > different opinions, but overall the discussion is good for community
> > development and health.
> > 
> > The number of non-committer (contributor) commits per month:
> > +-+-+-+
> > |year |month| contributor_commits |
> > +-+-+-+
> > | 2021| 10  | 14  |
> > | 2021| 11  | 2   |
> > | 2021| 12  | 8   |
> > +-+-+-+
> > 
> > The number of active reviewers per month:
> > +-+-+-+
> > |year |month|  active_reviewers   |
> > +-+-+-+
> > | 2021| 10  | 7   |
> > | 2021| 11  | 2   |
> > | 2021| 12  | 5   |
> > +-+-+-----+
> > 
> > Top reviewers in the last 3 months:
> > +---+-+
> > | committer |   reviews   |
> > +---+-+
> > | Julian Hyde  | 7   |
> > | Stamatis Zampetakis  | 4   |
> > | NobiGo  | 3   |
> > | Jesus Camacho Rodriguez  | 3   |
> > | rubenada  | 2   |
> > | chunwei <37774589+chunwei...@users.noreply.github.com> | 1
> >|
> > | Haisheng Yuan  | 1   |
> > | chunwei.lcw  | 1   |
> > | Wang Yanlin <1989yanlinw...@163.com> | 1   |
> > | Jacques Nadeau  | 1   |
> > +---+-+
> > 
> > Thanks,
> > Haisheng Yuan
> 
> 


[DISCUSS] Draft board report for Jan 2022

2022-01-07 Thread Haisheng Yuan
Attached below is a draft of this month's board report. I plan to submit it
on Jan 11.
Please let me know if you have additions or corrections.

## Description:
Apache Calcite is a highly customizable framework for parsing and planning
queries on data in a wide variety of formats. It allows database-like
access,
and in particular a SQL interface and advanced query optimization, for data
not
residing in a traditional database.

Avatica is a sub-project within Calcite and provides a framework for
building
local and remote JDBC and ODBC database drivers. Avatica has an independent
release schedule and its own repository.

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Calcite was founded 2015-10-21 (6 years ago)
There are currently 56 committers and 23 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:3.

Community changes, past quarter:
- No new PMC members. Last addition was Ruben Q L on 2020-08-09.
- Alessandro Solimando was added as committer on 2021-12-17.
- Xiong Duan was added as committer on 2021-10-18.

## Project Activity:
Calcite 1.28.0 was released on 2021-10-19, with new features including
the UNIQUE sub-query predicate, the MODE aggregate function,
PERCENTILE_CONT and PERCENTILE_DISC inverse distribution functions,
an Exasol dialect for the JDBC adapter, and improvements to
materialized view recognition.

Calcite 1.29.0 was released on 2021-12-26, which upgrades log4j2 to
2.17.0 to fix security vulnerabilities.

Calcite Avatica 1.19.0 was released on 2021-10-11, which adds support
for BIT and NULL data types, fixes issues with values of type ARRAY.

Calcite Avatica 1.20.0 was released on 2021-12-13, which upgrades Log4j2 to
version 2.15.0 (to address CVE-2021-44228), and makes the SPNEGO
protocol much more efficient.

## Community Health:
The overall activity in the community has increased slightly in the past
few months, specifically 20% more commits, 8% more closed PRs on GitHub.

There are some discussions about the proposal of changing workflow, e.g.
github issues vs JIRAs, merging Avatica with Calcite, people argued with
different opinions, but overall the discussion is good for community
development and health.

The number of non-committer (contributor) commits per month:
+-+-+-+
|year |month| contributor_commits |
+-+-+-+
| 2021| 10  | 14  |
| 2021| 11  | 2   |
| 2021| 12  | 8   |
+-+-+-+

The number of active reviewers per month:
+-+-+-+
|year |month|  active_reviewers   |
+-+-+-+
| 2021| 10  | 7   |
| 2021| 11  | 2   |
| 2021| 12  | 5   |
+-+-+-+

Top reviewers in the last 3 months:
+---+-+
| committer |   reviews   |
+---+-+
| Julian Hyde  | 7   |
| Stamatis Zampetakis  | 4   |
| NobiGo  | 3   |
| Jesus Camacho Rodriguez  | 3   |
| rubenada  | 2   |
| chunwei <37774589+chunwei...@users.noreply.github.com> | 1
    |
| Haisheng Yuan  | 1   |
| chunwei.lcw  | 1   |
| Wang Yanlin <1989yanlinw...@163.com> | 1   |
| Jacques Nadeau  | 1   |
+---+-+

Thanks,
Haisheng Yuan


Re: [VOTE] Release apache-calcite-1.29.0 (release candidate 0)

2021-12-21 Thread Haisheng Yuan
Thanks Rui for taking care of this release.

- Verified GPG signature - OK
- Verified SHA512 - OK
- Checked release notes - OK
- Ran gradle tests  - OK

Environment:
macOS Big Sur, Java 17+35-LTS-2724
Gradle 7.3

+1 (binding)

Thanks,
Haisheng Yuan

On 2021/12/21 01:27:01 Rui Wang wrote:
> Hi all,
> 
> I have created a build for Apache Calcite 1.29.0, release
> candidate 0.
> 
> Thanks to everyone who has contributed to this release.
> 
> You can read the release notes here:
> https://github.com/apache/calcite/blob/calcite-1.29.0-rc0/site/_docs/history.md
> 
> The commit to be voted upon:
> https://github.com/apache/calcite/commit/cbfe0609edcc4a843d71497f159e3687a834119e
> 
> Its hash is cbfe0609edcc4a843d71497f159e3687a834119e
> 
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.29.0-rc0/
> (revision 51667)
> 
> The hashes of the artifacts are as follows:
> 7a59a30d0cab7b5edfcb71b960fc932de77b92f6e5863a99c721b61915805933e717a209dca880e46092d595920ba24485474f0c3b6715d152c900c0d95348b3
> *apache-calcite-1.29.0-src.tar.gz
> 
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachecalcite-1127/org/apache/calcite/
> 
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/amaliujia.asc
> https://www.apache.org/dist/calcite/KEYS
> 
> To create the jars and test Apache Calcite: "gradle build" (requires an
> appropriate Gradle/JDK installation).
> 
> Please vote on releasing this package as Apache Calcite 1.29.0.
> 
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Calcite 1.27.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
> 


[DISCUSS] State of the project 2021

2021-12-10 Thread Haisheng Yuan
Hi Calcite community members,

It has been 6 years since Calcite graduated to a top level Apache project.
I am so excited to witness how vivid the community has become and how far
we have come.

We have seen 2 releases so far for Calcite this year (with another release
v1.29.0 ongoing), with each release containing a large amount of changes.
Specifically, there are many new features introduced: the InnoDB Adaptor,
three-value logic for SEARCH operator, Enumerable MergeUnion operator, the
UNIQUE sub-query predicate... as well as the improvements to materialized
view recognition. We won't see Calcite's continuous improvements without
the community members' collaboration and dedication.

In terms of Calcite Avatica, we had 2 releases for Avatica and 0 releases
for Avatica Go.  Avatica consistently has fewer contributors and pull
requests than Calcite, maybe because it has fewer users and the code hasn't
changed aggressively. But more contributors and code reviewers are highly
welcome and appreciated.

I am glad to see that we had several technical meetups, not to mention at
ApacheCon and StangeLoop, to talk about the use cases and improvements that
we have done to Calcite. Many thanks to those who presented talks to give
more people opportunities to know and use Calcite, and more importantly, to
enrich the Calcite community.

It's been a great privilege to be able to serve as Calcite's PMC chair in
2021. It's also been a big learning experience for me and I am grateful to
those who trusted me and helped me in the journey. I am also extremely
proud of having this opportunity to work with Calcite contributors and
committers.

Last but not least, we need to discuss who should be the new PMC chair of
Calcite after I step down in December. I would like to nominate Ruben
Quesada Lopez as the first candidate in the vote. Ruben has been a great
contributor to the project. He has shown a lot of care for the Calcite
project and the community, which I feel would make him a great PMC chair.

To repeat questions from previous years:
1) What else are we doing well in the project?
2) What areas do we need to do better?

Please take some time to share your thoughts!

Thanks,
Haisheng


Re: Re: [DISCUSS] Apache Calcite Online Meetup January 2022

2021-11-22 Thread Haisheng Yuan
+1 for online meetup.

- Haisheng

--
Sender:Julian Hyde
Date:2021/11/23 00:58:25
Recipient:
Theme:Re: [DISCUSS] Apache Calcite Online Meetup January 2022

+1

Thanks for suggesting this, Stamatis. 

Julian

> On Nov 22, 2021, at 7:12 AM, Zhe Hu  wrote:
> 
> Got it. Thanks!
> |
> 
> 
> |
>  Replied Message 
> | From | xiong duan |
> | Date | 11/22/2021 23:09 |
> | To | dev |
> | Subject | Re: [DISCUSS] Apache Calcite Online Meetup January 2022 |
> Yes. You can find the relative talks in
> http://calcite.apache.org/community/#talks .
> 
> Zhe Hu  于2021年11月22日周一 下午10:50写道:
> 
>> +1 for online meetup.
>> And I’m curious about if there is any playback for this meetup, because
>> somebody might couldn't attend due to timezone variety or any other reasons.
>> 
>> 
>> 
>> 
>> 
>> 
>> On 11/22/2021 22:26,xiong duan wrote:
>> Great idea. +1 for an online meetup.
>> 
>> Justin Swanhart  于2021年11月20日周六 上午12:42写道:
>> 
>> +1 for an online meetup
>> 
>> On Fri, Nov 19, 2021, 11:32 AM Jacques Nadeau  wrote:
>> 
>> WRT to talk structure, I propose we do 5 or 10 minute lightning talks as
>> opposed to something longer. It ensures a wider variety of topics, better
>> attendance and you never have to sit that long through something you're
>> not
>> interested in...
>> 
>> On Thu, Nov 18, 2021 at 11:47 PM Vladimir Sitnikov <
>> sitnikov.vladi...@gmail.com> wrote:
>> 
>> Are the people willing to give a talk around Calcite?
>> 
>> I may present a talk regarding Gradle, and the upcoming migration to
>> the
>> idiomatic structure:
>> https://issues.apache.org/jira/browse/CALCITE-4832
>> https://github.com/apache/calcite/pull/2566
>> 
>> https://docs.gradle.org/7.3/userguide/structuring_software_products.html
>> 
>> What do people think in terms of timing?
>> 
>> I think the new build structure is more like 30-45 min talk
>> 
>> Vladimir
>> 
>> 
>> 
>> 
> 
> 


Re: [ANNOUNCE] New committer: Xiong Duan

2021-10-23 Thread Haisheng Yuan
Congrats, Xiong!

On 2021/10/23 21:23:59, Francis Chuang  wrote: 
> Congratulations!
> 
> On 24/10/2021 12:03 am, Stamatis Zampetakis wrote:
> > Apache Calcite's Project Management Committee (PMC) has invited Xiong Duan
> > to
> > become a committer, and we are pleased to announce that they have accepted.
> > 
> > Xiong has pushed a lot of high quality patches, fixing and improving code
> > around
> > aggregations and sub-queries,  in a rather short period of time. Apart from
> > code
> > contributions, Xiong has been regularly reviewing PRs in GitHub and helping
> > out
> > others in various JIRA issues.
> > 
> > Xiong, welcome, thank you for your contributions, and we look forward to
> > your
> > further interactions with the community! If you wish, please feel free to
> > tell
> > us more about yourself and what you are working on.
> > 
> > Stamatis (on behalf of the Apache Calcite PMC)
> > 
> 


Re: [VOTE] Release Apache Calcite 1.28.0 (release candidate 0)

2021-10-17 Thread Haisheng Yuan
Thank you for taking care of this release, Julian.

1. Checked checksum and signature: OK
2. Ran Gradle test: OK
3. Checked release notes: OK

Java env: openjdk version "1.8.0_292", MacOS 10.15.7.

+1 (binding)

Thanks,
Haisheng Yuan

On 2021/10/15 18:01:11, Julian Hyde  wrote: 
> Hi all,
> 
> I have created a build for Apache Calcite 1.28.0, release
> candidate 0.
> 
> Thanks to everyone who has contributed to this release.
> 
> You can read the release notes here:
> https://github.com/apache/calcite/blob/calcite-1.28.0-rc0/site/_docs/history.md
> 
> The commit to be voted upon:
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=dec167ac18272c0cd8be477d6b162d7a31a62114
> 
> Its hash is dec167ac18272c0cd8be477d6b162d7a31a62114
> 
> Tag:
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=tag;h=refs/tags/calcite-1.28.0-rc0
> 
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.28.0-rc0
> (revision 50450)
> 
> The hashes of the artifacts are as follows:
> d61c4935d7d3b66425ff6e0c5e6e506b56f734d562f0fc6e69dc1d484da1d92d7d922930fd7b545abe0dc7b0178f6847ff4239ad52c52c8fa155668aa3d3fc38
> *apache-calcite-1.28.0-src.tar.gz
> 
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachecalcite-1121/org/apache/calcite/
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/jhyde.asc
> https://www.apache.org/dist/calcite/KEYS
> 
> Please vote on releasing this package as Apache Calcite 1.28.0.
> 
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Calcite 1.28.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
> 
> Here is my vote:
> 
> +1 (binding)
> 


Re: [DISCUSS] Draft board report for Oct 2021

2021-10-06 Thread Haisheng Yuan
Thanks Stamatis and Francis for the addition, I will update the report 
accordingly.

Cheers,
Haisheng Yuan

On 2021/10/06 21:04:22, Stamatis Zampetakis  wrote: 
> LGTM and +1 for the longer description as Francis mentioned.
> 
> This is the report for Q3 (July-September) so I think Avatica 1.19 should
> anyways go to the next one.
> 
> ## Project Activity
> 
> +At BOSS 2021, Copenhagen, Denmark, August 16, 2021, Julian Hyde & Stamatis
> Zampetakis gave
> a tutorial about Calcite.
> 
> ## Community Health:
> 
> Stats from git log for Q3 in case you find them useful:
> 
> Number of active reviewers per month
> +-+-+-+
> |year |month|  active_reviewers   |
> +-+-+-+
> | 2021| 7   | 6   |
> | 2021| 8   | 2   |
> | 2021| 9   | 8   |
> +-+-+-+
> 
> Top reviewers in Q3 2021
> +---+-+
> | committer |   reviews   |
> +---+-+
> | Julian Hyde  | 14  |
> | Stamatis Zampetakis  | 7   |
> | Wang Yanlin <1989yanlinw...@163.com> | 4   |
> | Haisheng Yuan  | 3   |
> | Feng Zhu  | 3   |
> | Jesus Camacho Rodriguez  | 2   |
> | liyafan82  | 2   |
> | Vladimir Sitnikov  | 1   |
> | yuzhao.cyz  | 1   |
> | Haisheng Yuan <15352793+hsy...@users.noreply.github.com> | 1
>   |
> +---+-+
> 
> 
> On Wed, Oct 6, 2021 at 12:52 PM Francis Chuang 
> wrote:
> 
> > Looks good to me! Avatica 1.19 is currently being voted upon, so if it
> > finalizes before the 12th, it should be added to the report.
> >
> > I'd suggest using the longer description for the project:
> >
> > Apache Calcite is a highly customizable framework for parsing and
> > planning queries on data in a wide variety of formats. It allows
> > database-like access, and in particular a SQL interface and advanced
> > query optimization, for data not residing in a traditional database.
> >
> > Avatica is a sub-project within Calcite and provides a framework for
> > building local and remote JDBC and ODBC database drivers. Avatica has an
> > independent release schedule and its own repository.
> >
> > Francis
> >
> > On 6/10/2021 6:15 pm, Haisheng Yuan wrote:
> > > Attached below is a draft of this month's board report. I plan to submit
> > it on Oct 12. Please let me know if you have additions or corrections.
> > >
> > > ## Description:
> > > The mission of Calcite is the creation and maintenance of software
> > related to
> > > Dynamic data management framework
> > >
> > > ## Issues:
> > > There are no issues requiring board attention.
> > >
> > > ## Membership Data:
> > > Apache Calcite was founded 2015-10-21 (6 years ago)
> > > There are currently 53 committers and 23 PMC members in this project.
> > > The Committer-to-PMC ratio is roughly 7:3.
> > >
> > > Community changes, past quarter:
> > > - No new PMC members. Last addition was Ruben Q L on 2020-08-09.
> > > - No new committers. Last addition was Vladimir Ozerov on 2021-06-23.
> > >
> > > ## Project Activity:
> > > No new version was released.
> > >
> > > At ApacheCon 2021, September 22, 2021, Vladimir Ozerov gave a talk on
> > building
> > > modern SQL query optimizers with Apache Calcite.
> > >
> > > At Strange Loop 2021, September 30, 2021, Julian Hyde gave a talk on
> > Morel,
> > > a functional query language.
> > >
> > > ## Community Health:
> > > The overall activity in the community has increased slightly in the past
> > > few months, specifically 47% more opened PRs, 25% more closed PRs on
> > GitHub.
> > > But the number of active reviewers still remains small.
> > >
> > > Thanks,
> > > Haisheng Yuan
> > >
> >
> 


Re: [ANNOUNCE] New committer: Zhaohui Xu

2021-10-06 Thread Haisheng Yuan
Congratulations, Zhaohui, well deserved!

Haisheng

On 2021/10/06 21:14:00, Francis Chuang  wrote: 
> Congratulations!
> 
> On 7/10/2021 7:47 am, Stamatis Zampetakis wrote:
> > Apache Calcite's Project Management Committee (PMC) has invited Zhaohui Xu
> > to
> > become a committer, and we are pleased to announce that they have accepted.
> > 
> > Numbers speak for themselves and Zhaohui has over 30 commits already in
> > master
> > and more than 20 open pull requests waiting to get in. Great record so far
> > including
> > (but not limited to) improvements and fixes in the view based
> > rewriting modules,
> > JSON serialization, metadata, and field trimming.
> > 
> > Zhaohui, welcome, thank you for your contributions, and we look forward to
> > your
> > further interactions with the community! If you wish, please feel free to
> > tell
> > us more about yourself and what you are working on.
> > 
> > Stamatis (on behalf of the Apache Calcite PMC)
> > 
> 


[DISCUSS] Draft board report for Oct 2021

2021-10-06 Thread Haisheng Yuan
Attached below is a draft of this month's board report. I plan to submit it on 
Oct 12. Please let me know if you have additions or corrections.

## Description:
The mission of Calcite is the creation and maintenance of software related to 
Dynamic data management framework

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Calcite was founded 2015-10-21 (6 years ago)
There are currently 53 committers and 23 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:3.

Community changes, past quarter:
- No new PMC members. Last addition was Ruben Q L on 2020-08-09.
- No new committers. Last addition was Vladimir Ozerov on 2021-06-23.

## Project Activity:
No new version was released.

At ApacheCon 2021, September 22, 2021, Vladimir Ozerov gave a talk on building
modern SQL query optimizers with Apache Calcite.

At Strange Loop 2021, September 30, 2021, Julian Hyde gave a talk on Morel,
a functional query language.

## Community Health:
The overall activity in the community has increased slightly in the past
few months, specifically 47% more opened PRs, 25% more closed PRs on GitHub.
But the number of active reviewers still remains small.

Thanks,
Haisheng Yuan


Re: [DISCUSS] Next releases

2021-09-24 Thread Haisheng Yuan
Sounds good!

On 2021/09/25 01:13:26, Julian Hyde  wrote: 
> I propose to start a vote for Avatica 1.19 one week from now, and a
> vote for Calcite 1.28 two weeks from now. I'll be RM for both. How
> does that timing sound?
> 
> Julian
> 
> On Fri, Sep 24, 2021 at 10:12 AM James Starr  wrote:
> >
> > When is calcite planning on going to 2.0?
> >
> > On Fri, Sep 24, 2021 at 3:35 AM Stamatis Zampetakis 
> > wrote:
> >
> > > From the last discussion [1], the list was the following for Calcite:
> > >
> > > Julian Hyde for 1.28.0
> > > Rui Wang for 1.29.0
> > > Andrei Sereda for 1.30.0
> > > Liya Fan for 1.31.0
> > >
> > > [1]
> > >
> > > https://lists.apache.org/thread.html/r86eead4dc703200c0f7182632bad79198d9c3fab006358c6e7dae8e8%40%3Cdev.calcite.apache.org%3E
> > >
> > > On Fri, Sep 24, 2021 at 6:36 AM Julian Hyde  wrote:
> > >
> > > > We're about due for releases of Avatica and Calcite. Who's next up on
> > > > the release manager schedule?
> > > >
> > >
> 


Re: Looking for a committer to merge: CALCITE-4652 (fix AggregateExpandDistinctAggregatesRule when SUM type is expanded)

2021-08-12 Thread Haisheng Yuan
I will merge it.

On 2021/08/12 09:17:07, Taras Ledkov  wrote: 
> Hi Calcite Devs.
> 
> The patch for CALCITE-4652 [1] (see PR#2439 [2]) is reviewed and ready 
> for merge.
> I'm looking for a committer to merge the patch.
> 
> [1]. https://issues.apache.org/jira/browse/CALCITE-4652
> [2]. https://github.com/apache/calcite/pull/2439
> 
> On 10.08.2021 15:10, Taras Ledkov wrote:
> > Hi Calcite Devs.
> >
> > I just remind about review/merge the patch for the issue CALCITE-4652 
> > [1], see PR#2439 [2].
> > I've fixed the patch according with Julian comments. Also PR contains 
> > two 'LGTM' comments.
> > Is the patch ready for merge?
> >
> > [1]. https://issues.apache.org/jira/browse/CALCITE-4652
> > [2]. https://github.com/apache/calcite/pull/2439
> >
> > On 12.07.2021 15:05, xiong duan wrote:
> >> Hi. Ledkov. I'll do some code reviews in the next two days.
> >>
> >> Taras Ledkov  于2021年7月12日周一 下午7:58写道:
> >>
> >>> Hi,
> >>>
> >>> Please review the patch for the issue CALCITE-4652 [1], see PR#2439 
> >>> [2].
> >>>
> >>> I tried to draw attention to the issue in the topic:
> >>> "[HELP] Return type of the SUM aggregate function and
> >>> AggregateExpandDistinctAggregatesRule​",
> >>> but did not receive any answer, so I do not give a link to the 
> >>> discussion.
> >>>
> >>> Stamatis advised me to send a reminder to the devlist.
> >>>
> >>> [1]. https://issues.apache.org/jira/browse/CALCITE-4652
> >>> [2]. https://github.com/apache/calcite/pull/2439
> >>>
> >>> -- 
> >>> Taras Ledkov
> >>> Mail-To: tled...@gridgain.com
> >>>
> >>>
> -- 
> Taras Ledkov
> Mail-To: tled...@gridgain.com
> 
> 


[jira] [Created] (CALCITE-4712) Add RelHashDistribution

2021-07-31 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-4712:
--

 Summary: Add RelHashDistribution
 Key: CALCITE-4712
 URL: https://issues.apache.org/jira/browse/CALCITE-4712
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Haisheng Yuan


Add RelHashDistribution. The hash distribution should have the following 
properties:
1. ImmutableIntList keys; // distribution keys
2. ImmutableList equivKeys; // equivalent keys for each 
distribution key
3. int bucketNum; // number of buckets or shards
4. boolean nullsColocated; // are NULLS colocated?
5. String hashFunc; // name or identity of hash function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Draft board report for Jul 2021

2021-07-07 Thread Haisheng Yuan
Sure, I will update the draft.

Thanks for the suggestion, Francis.

Haisheng

On 2021/07/07 11:26:24, Francis Chuang  wrote: 
> +1 Looks great!
> 
> I'd suggest using the longer description of the project:
> 
> Apache Calcite is a highly customizable framework for parsing and 
> planning queries on data in a wide variety of formats. It allows 
> database-like access, and in particular a SQL interface and advanced 
> query optimization, for data not residing in a traditional database.
> 
> Avatica is a sub-project within Calcite and provides a framework for
> building local and remote JDBC and ODBC database drivers. Avatica has an 
> independent release schedule and its own repository.
> 
> On 7/07/2021 1:19 pm, Haisheng Yuan wrote:
> > Attached below is a draft of this month's board report. I plan to submit it 
> > on
> > Jul 13. Please let me know if you have any additions or corrections.
> > 
> > ## Description:
> > The mission of Calcite is the creation and maintenance of software related 
> > to
> > Dynamic data management framework
> > 
> > ## Issues:
> > There are no issues requiring board attention.
> > 
> > ## Membership Data:
> > Apache Calcite was founded 2015-10-21 (6 years ago)
> > There are currently 53 committers and 23 PMC members in this project.
> > The Committer-to-PMC ratio is roughly 7:3.
> > 
> > Community changes, past quarter:
> > - No new PMC members. Last addition was Ruben Q L on 2020-08-09.
> > - Vladimir Ozerov was added as committer on 2021-06-23.
> > 
> > ## Project Activity:
> > Calcite Avatica 1.18.0 was released on 2021-05-18. Calcite 1.27.0 was
> > released on 2021-06-03, with important features such as InnoDB adapter,
> > Three-valued logic for SEARCH operator and MergeUnion operator in
> > Enumerable convention.
> > 
> > On Apr 26th, we had online meeting to discuss the proposal of extending
> > Calcite to an incremental query optimizer.
> > 
> > ## Community Health:
> > The overall activity in the community has slightly decreased in the past
> > few months without this being worrisome. Specifically we have less active
> > reviewers in the past 3 months. We need more committers to spend some
> > time to review open pull requests.
> > 
> > the number of non-committer (contributor) commits per month
> > +-+-+-+
> > |year |month| contributor_commits |
> > +-+-+-+
> > | 2021| 4   | 14  |
> > | 2021| 5   | 4   |
> > | 2021| 6   | 5   |
> > +-+-+-+
> > 
> > top reviewers in the last 3 months
> > +---+-+
> > | committer |   reviews   |
> > +---+-+
> > | Julian Hyde  | 13      |
> > | Stamatis Zampetakis  | 5   |
> > | rubenada  | 1   |
> > | amaliujia  | 1   |
> > | ForwardXu  | 1   |
> > | yuzhao.cyz  | 1   |
> > | liyafan82  | 1   |
> > +---+-+
> > 
> > Thanks,
> > Haisheng Yuan
> > 
> 


Re: Proposal to extend Calcite into a incremental query optimizer

2021-07-07 Thread Haisheng Yuan
Hi Botong,

We haven't heard from you for a while.
Feel free to reach out if you get stuck or need help on rebasing code.

Thanks,
Haisheng

On 2021/05/15 00:54:02, Botong Huang  wrote: 
> Hi all,
> 
> Thank you all for the interest, and thanks Julian for the update!
> 
> I am having problems uploading the pdf files into the jira CALCITE-4568
> , so I attached the
> slides in our code base:
> https://github.com/alibaba/cost-based-incremental-optimizer/blob/main/Tempura_Calcite_presentation.pdf
> 
> The slides contain a walking example of how Tempura expands its memo. The
> current version of the code also has two e2e unit tests at
> TvrOptimizationTest.java and TvrExecutionTest.java. Please feel free to
> start playing with them, and feel free to reach out and possibly schedule
> another meeting if needed.
> 
> As agreed in the meeting, we will rebase our code to a newer version of
> Calcite.
> 
> Best,
> Botong
> 
> On Thu, May 13, 2021 at 12:47 PM Julian Hyde  wrote:
> 
> > During the meeting we agreed to start progressing this contribution in the
> > usual Apache Way, with conversations on the dev list and in the
> > https://issues.apache.org/jira/browse/CALCITE-4568 <
> > https://issues.apache.org/jira/browse/CALCITE-4568> JIRA case. So, it
> > should be easy for you to participate.
> >
> > Botong said he would share the slides. (He might be unwilling to make them
> > public, because they are his presentation for a conference that has not
> > happened yet. Reach out to him one-to-one.)
> >
> > Next step is for someone on the Alibaba side to create a PR that is
> > rebased on the latest Calcite master, and add a comment to the JIRA case.
> > Then we can discuss what needs to be done for that PR. Code quality, adding
> > comments, breaking up into smaller commits, additional tests, renaming
> > packages/classes, restructuring into plugins are all possibilities.
> >
> > Our side of the bargain, as committers, is that we should review in a
> > timely manner, and not move the goal posts — if the contributors make the
> > changes we request then we will land this code in master in a reasonable
> > amount of time.
> >
> > We also discussed incremental view maintenance (IVM). Tempura solves a
> > more general problem (finding the optimal K steps to maintain a
> > materialized view as data arrives in K points in time) but if we set K=2,
> > we can generate a plan for how to update a materialized view given a delta
> > table. The plan will be different based on cost - e.g. whether the delta
> > table is small or large. This is a problem that many of our users would
> > like to solve. It will exercise much of Tempura’s code base, and encourage
> > contributions.
> >
> > In my opinion, we should do IVM at launch. It should be the main example
> > we use in conference talks, blog posts, etc. When people understand that
> > case, we can explain how we generalize from K=2 to arbitrary K.
> >
> > Julian
> >
> >
> > > On May 13, 2021, at 9:51 AM, Rui Wang  wrote:
> > >
> > > I apologize that I had a wrong impression on the meeting time (I thought
> > it
> > > should be on Thursday but it is Wednesday). I can follow up your meeting
> > > records if you have any.
> > >
> > >
> > > -Rui
> > >
> > > On Tue, May 11, 2021 at 8:17 PM Botong Huang  wrote:
> > >
> > >> Hi all,
> > >>
> > >> This is a reminder that we are going to have our second discussion
> > meeting
> > >> tomorrow at 10-11pm PST. Please find the link below, everyone is
> > welcome to
> > >> join!
> > >>
> > >> Join Zoom Meeting
> > >> https://uci.zoom.us/j/91986206610
> > >> <
> > >>
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91986206610=D=calendar=2=AOvVaw24sxPtI6hbukCSo3nlQsbn
> > >>>
> > >>
> > >> Meeting ID: 919 8620 6610
> > >> One tap mobile
> > >> +16699006833 <(669)%20900-6833>,,91986206610# US (San Jose)
> > >> +12532158782 <(253)%20215-8782>,,91986206610# US (Tacoma)
> > >>
> > >> Dial by your location
> > >>+1 669 900 6833 <(669)%20900-6833> US (San Jose)
> > >>+1 253 215 8782 <(253)%20215-8782> US (Tacoma)
> > >>+1 346 248 7799 <(346)%20248-7799> US (Houston)
> > >>+1 301 715 8592 <(301)%20715-8592> US (Washington DC)
> > >>+1 312 626 6799 <(312)%20626-6799> US (Chicago)
> > >>+1 646 558 8656 <(646)%20558-8656> US (New York)
> > >> Meeting ID: 919 8620 6610
> > >> Find your local number: https://uci.zoom.us/u/acyXcc43Cd
> > >> <
> > >>
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FacyXcc43Cd=D=calendar=2=AOvVaw2W08kj_8hEx44dryeZlXb6
> > >>>
> > >>
> > >> Join by Skype for Business
> > >> https://uci.zoom.us/skype/91986206610
> > >> <
> > >>
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91986206610=D=calendar=2=AOvVaw3w0M0YYbcjPyBXzNpyyk0Z
> > >>>
> > >>
> > >> Thanks,
> > >> Botong
> > >>
> > >> On Wed, May 5, 2021 at 9:55 AM Botong Huang  wrote:
> > >>
> > >>> Hi Stamatis and all,
> 

[DISCUSS] Draft board report for Jul 2021

2021-07-06 Thread Haisheng Yuan
Attached below is a draft of this month's board report. I plan to submit it on
Jul 13. Please let me know if you have any additions or corrections.

## Description:
The mission of Calcite is the creation and maintenance of software related to 
Dynamic data management framework

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Calcite was founded 2015-10-21 (6 years ago)
There are currently 53 committers and 23 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:3.

Community changes, past quarter:
- No new PMC members. Last addition was Ruben Q L on 2020-08-09.
- Vladimir Ozerov was added as committer on 2021-06-23.

## Project Activity:
Calcite Avatica 1.18.0 was released on 2021-05-18. Calcite 1.27.0 was
released on 2021-06-03, with important features such as InnoDB adapter,
Three-valued logic for SEARCH operator and MergeUnion operator in
Enumerable convention.

On Apr 26th, we had online meeting to discuss the proposal of extending 
Calcite to an incremental query optimizer.

## Community Health:
The overall activity in the community has slightly decreased in the past
few months without this being worrisome. Specifically we have less active
reviewers in the past 3 months. We need more committers to spend some
time to review open pull requests.

the number of non-committer (contributor) commits per month
+-+-+-+
|year |month| contributor_commits |
+-+-+-+
| 2021| 4   | 14  |
| 2021| 5   | 4   |
| 2021| 6   | 5   |
+-+-+-+

top reviewers in the last 3 months
+---+-+
| committer |   reviews   |
+---+-+
| Julian Hyde  | 13  |
| Stamatis Zampetakis  | 5   |
| rubenada  | 1   |
| amaliujia  | 1   |
| ForwardXu  | 1   |
| yuzhao.cyz  | 1   |
| liyafan82  | 1   |
+---+-+

Thanks,
Haisheng Yuan



Re: Contributor rights

2021-06-30 Thread Haisheng Yuan
Hi Evgeniy,

I have added you to the contributor list.

Thanks,
Haisheng Yuan

On 2021/06/30 14:26:43, "stanilovsky evgeny"  
wrote: 
> Hi,
> 
> Could you please grant me contributor rights in Calcite JIRA? My username  
> is "zstan".
> 
> Thank you.
> Evgeniy.
> 


Re: Trait propagation guidelines

2021-06-25 Thread Haisheng Yuan
at in this case the derivation is needed on a node, which was
> created from the "passThrough" method, which is not supported at the
> moment. I mentioned in one of the previous emails that I was able to create
> the optimal plan by selectively excluding some nodes from the pass-through
> cache via a special marker.
> 
> Do you agree that the TopDownRuleDriver cannot handle the case I described?
> If not, how would you design the "passThrough" and "derive" routines to
> find the optimal plan described? Does MaxCompute handle such cases? I
> apologize if you already answered this, but I really cannot understand how
> we can find the optimal plan without derivation on a node Agg2_2, which
> holds the critical optimization request from the parent - to hash on [b,c]
> instead of [a,b,c].
> 
> Regards,
> Vladimir.
> 
> пн, 14 июн. 2021 г. в 20:06, Haisheng Yuan :
> 
> > > The observation is that parent operators sometimes do not know the exact
> > > traits they will have for the given child traits. Several examples:
> > > 1. PhysicalAggregate1 may request both HASH[b,c] or HASH[c,b]. Contrary
> > to
> > > the default Apache Calcite implementation, in many systems, these are two
> > > different distributions - which one should I request? To make things
> > worse,
> > > some operators may have strict requirements to the order (Join, Union),
> > > whilst others do not care about the order (Aggregate, Window).
> > select  group by b,c,a,g,z,d; if you have StreamAgg in non-distributed
> > system, what collation(s) do you request?
> > Just request either one. I already stated in the email [1], but seems like
> > you missed the 5th paragraph.
> >
> > > 2. In some systems, the distribution may also define the distribution
> > > function, e.g., a number of shards. A UNION DISTINCT of two tables with
> > the
> > > same sharding key, but the different numbers of shards must yield an
> > > exchange. The parent operator cannot know the number of shards of the
> > input
> > > in advance and cannot define the proper trait set in the "passThrough"
> > > method.
> > The parent operator doesn't need to know what number of shards to request,
> > just request hash distribution with shard number 0 or -1 or what ever to
> > indicate shard number not decided yet. Later the child operator will tell
> > parent operator the exact distribution through "derive".
> >
> > In Alibaba MaxCompute, we have customized hash distribution, which
> > contains number of buckets, hash function, null collation, we also support
> > range distribution, which contains range bucket boundaries. All of these
> > can work under current framework. With all that being said, distribution is
> > nothing special than collation, it all depends on whether you design the
> > operator "passthrough" and "derive" strategy correctly.
> >
> > [1]
> > https://lists.apache.org/thread.html/r36b25cbe4ca05fb1262c432ad9103f4126b654698481fca0d2a01fe7%40%3Cdev.calcite.apache.org%3E
> >
> > Thanks,
> > Haisheng Yuan
> >
> > On 2021/06/14 08:26:31, Vladimir Ozerov  wrote:
> > > Hi Haisheng,
> > >
> > > The observation is that parent operators sometimes do not know the exact
> > > traits they will have for the given child traits. Several examples:
> > > 1. PhysicalAggregate1 may request both HASH[b,c] or HASH[c,b]. Contrary
> > to
> > > the default Apache Calcite implementation, in many systems, these are two
> > > different distributions - which one should I request? To make things
> > worse,
> > > some operators may have strict requirements to the order (Join, Union),
> > > whilst others do not care about the order (Aggregate, Window).
> > > 2. In some systems, the distribution may also define the distribution
> > > function, e.g., a number of shards. A UNION DISTINCT of two tables with
> > the
> > > same sharding key, but the different numbers of shards must yield an
> > > exchange. The parent operator cannot know the number of shards of the
> > input
> > > in advance and cannot define the proper trait set in the "passThrough"
> > > method.
> > >
> > > We will miss the optimization opportunity in all these cases unless we
> > can
> > > clarify the real traits in the "derive" phase. But to do this, we need to
> > > know the original optimization request.
> > >
> > > Regards,
> > > Vladimir.
> > >
> > >
> > >

Re: [ANNOUNCE] New committer: Vladimir Ozerov

2021-06-23 Thread Haisheng Yuan
Congratulations and thanks for your contributions, Vladimir!

Regards,
Haisheng

On 2021/06/23 21:34:40, Stamatis Zampetakis  wrote: 
> Apache Calcite's Project Management Committee (PMC) has invited Vladimir
> Ozerov to
> become a committer, and we are pleased to announce that he has accepted.
> 
> Vladimir is among the few people who know very well the internal workings
> of the
> Calcite optimizer. He started and participated in many discussions about
> the core engine and contributed ideas and code for making it better.
> Moreover, Vladimir has blogged and talked about Calcite in various
> conferences and meetups giving publicity and showcasing the capabilities of
> the project.
> 
> Vladimir, welcome, thank you for your contributions, and we look forward to
> your
> further interactions with the community! If you wish, please feel free to
> tell
> us more about yourself and what you are working on.
> 
> Stamatis (on behalf of the Apache Calcite PMC)
> 


Re: Trait propagation guidelines

2021-06-14 Thread Haisheng Yuan
> The observation is that parent operators sometimes do not know the exact
> traits they will have for the given child traits. Several examples:
> 1. PhysicalAggregate1 may request both HASH[b,c] or HASH[c,b]. Contrary to
> the default Apache Calcite implementation, in many systems, these are two
> different distributions - which one should I request? To make things worse,
> some operators may have strict requirements to the order (Join, Union),
> whilst others do not care about the order (Aggregate, Window).
select  group by b,c,a,g,z,d; if you have StreamAgg in non-distributed 
system, what collation(s) do you request? 
Just request either one. I already stated in the email [1], but seems like you 
missed the 5th paragraph.

> 2. In some systems, the distribution may also define the distribution
> function, e.g., a number of shards. A UNION DISTINCT of two tables with the
> same sharding key, but the different numbers of shards must yield an
> exchange. The parent operator cannot know the number of shards of the input
> in advance and cannot define the proper trait set in the "passThrough"
> method.
The parent operator doesn't need to know what number of shards to request, just 
request hash distribution with shard number 0 or -1 or what ever to indicate 
shard number not decided yet. Later the child operator will tell parent 
operator the exact distribution through "derive".

In Alibaba MaxCompute, we have customized hash distribution, which contains 
number of buckets, hash function, null collation, we also support range 
distribution, which contains range bucket boundaries. All of these can work 
under current framework. With all that being said, distribution is nothing 
special than collation, it all depends on whether you design the operator 
"passthrough" and "derive" strategy correctly.

[1] 
https://lists.apache.org/thread.html/r36b25cbe4ca05fb1262c432ad9103f4126b654698481fca0d2a01fe7%40%3Cdev.calcite.apache.org%3E

Thanks,
Haisheng Yuan

On 2021/06/14 08:26:31, Vladimir Ozerov  wrote: 
> Hi Haisheng,
> 
> The observation is that parent operators sometimes do not know the exact
> traits they will have for the given child traits. Several examples:
> 1. PhysicalAggregate1 may request both HASH[b,c] or HASH[c,b]. Contrary to
> the default Apache Calcite implementation, in many systems, these are two
> different distributions - which one should I request? To make things worse,
> some operators may have strict requirements to the order (Join, Union),
> whilst others do not care about the order (Aggregate, Window).
> 2. In some systems, the distribution may also define the distribution
> function, e.g., a number of shards. A UNION DISTINCT of two tables with the
> same sharding key, but the different numbers of shards must yield an
> exchange. The parent operator cannot know the number of shards of the input
> in advance and cannot define the proper trait set in the "passThrough"
> method.
> 
> We will miss the optimization opportunity in all these cases unless we can
> clarify the real traits in the "derive" phase. But to do this, we need to
> know the original optimization request.
> 
> Regards,
> Vladimir.
> 
> 
> вс, 13 июн. 2021 г. в 22:17, Haisheng Yuan :
> 
> > How does it relate with "derive" to get the desired plan?
> >
> > Initially PhysicalAggregate1 requests HASH[b,c], PhysicalAggregate2
> > requests HASH[a,b,c]. PhysicalAggregate2 is called on "passthrough" by
> > passing HASH[b,c], then generate another PhysicalAggregate2 with trait
> > HASH[b,c]. You don't need the involvement of "derive".
> >
> > Haisheng Yuan
> >
> > On 2021/06/13 16:58:53, Vladimir Ozerov  wrote:
> > > Hi,
> > >
> > > I tried to apply different approaches, but eventually, I failed to
> > achieve
> > > my goals. It seems that the current implementation cannot handle the
> > > required scenario, as explained below.
> > >
> > > Consider the following tree:
> > > LogicalAggregate1[group=[b,c]]
> > >   LogicalAggregate2[group=[a,b,c]]
> > > LogicalInput
> > >
> > > I want to find the plan to do these two aggregations without an exchange
> > in
> > > between because they may have compatible distributions. Example:
> > > PhysicalAggregate1[group=[b,c]] // SHARDED[b,c]
> > >   PhysicalAggregate2[group=[a,b,c]] // SHARDED[b,c]
> > > Exchange// SHARDED[b,c]
> > >   PhysicalInput // SHARDED[?]
> > >
> > > The fundamental problem is that it is impossible to save the optimization
> > > request and resolve traits in th

Re: Possibly incorrect assertion in the TopDownRuleDriver.DeriveTrait.derive

2021-06-13 Thread Haisheng Yuan
I don't think the the assertion is wrong, it is expected. Equivalent relational 
operators should be in the same set. I think CALCITE-3981[1] fixed the 
canonical issue, I suspect the rel2subset mapping is not up to date, hence the 
assertion error. 

[1] https://issues.apache.org/jira/browse/CALCITE-3981

On 2021/06/13 18:35:02, Vladimir Ozerov  wrote: 
> Thanks, I created an issue [1] to improve the assertion.
> 
> [1] https://issues.apache.org/jira/browse/CALCITE-4650
> 
> пн, 7 июн. 2021 г. в 23:30, Haisheng Yuan :
> 
> > > Shouldn't we remove the assertion above?
> > Perhaps.
> >
> > Or perhaps the rel2Subset mapping is not up to date.
> >
> > Regards,
> > Haisheng Yuan
> >
> > On 2021/06/06 13:09:16, Vladimir Ozerov  wrote:
> > > Hi,
> > >
> > > When doing a trait derivation in the non-OMAKASE mode, the following
> > lines
> > > of code are invoked:
> > > 1: RelSubset relSubset = planner.register(newRel, rel);
> > > 2: assert relSubset.set == planner.getSubset(rel).set;
> > >
> > > The assertion on the second line may fail because the "newRel" is
> > assigned
> > > not the "rel" set, but "rel" *canonical set*, which might be different.
> > >
> > > As a workaround, we may change the derive mode to OMAKASE. In this case,
> > we
> > > do not hit the assertion and planning completes successfully.
> > >
> > > Shouldn't we remove the assertion above?
> > >
> > > Regards,
> > > Vladimir.
> > >
> >
> 


Re: Trait propagation guidelines

2021-06-13 Thread Haisheng Yuan
How does it relate with "derive" to get the desired plan?

Initially PhysicalAggregate1 requests HASH[b,c], PhysicalAggregate2 requests 
HASH[a,b,c]. PhysicalAggregate2 is called on "passthrough" by passing 
HASH[b,c], then generate another PhysicalAggregate2 with trait HASH[b,c]. You 
don't need the involvement of "derive".

Haisheng Yuan

On 2021/06/13 16:58:53, Vladimir Ozerov  wrote: 
> Hi,
> 
> I tried to apply different approaches, but eventually, I failed to achieve
> my goals. It seems that the current implementation cannot handle the
> required scenario, as explained below.
> 
> Consider the following tree:
> LogicalAggregate1[group=[b,c]]
>   LogicalAggregate2[group=[a,b,c]]
> LogicalInput
> 
> I want to find the plan to do these two aggregations without an exchange in
> between because they may have compatible distributions. Example:
> PhysicalAggregate1[group=[b,c]] // SHARDED[b,c]
>   PhysicalAggregate2[group=[a,b,c]] // SHARDED[b,c]
> Exchange// SHARDED[b,c]
>   PhysicalInput // SHARDED[?]
> 
> The fundamental problem is that it is impossible to save the optimization
> request and resolve traits in the "derive" phase afterward. What we need is
> to send the optimization request "SHARDED by [b,c] in any order" to
> PhysicalAggregate2, and use it in the derive phase so that the new
> PhysicalAggregate2 is created with [b,c] or [c,b], but strictly without
> [a]. Unfortunately, this doesn't work because the nodes emitted from the
> pass-through do not participate in the "derive" phase.
> 
> This could be fixed with a trivial change - to allow certain nodes emitted
> from the "passThrough" to participate in "derive". We can do that using a
> marker interface or an extension to a PhysicalRel interface. For example:
> interface PhysicalRel {
> boolean enforceDerive();
> }
> 
> When set to "true", the node would not be added to the pass-through cache.
> This way, we may use this node as *storage* for the optimization request.
> When the "derive" is called later, we know both the parent requirements and
> the child traits. This would be sufficient to solve my problem. I already
> tried to do this by disabling the pass-through cache completely and
> confirmed that the required plan is found.
> 
> Do you have any objections to such a change?
> 
> Regards,
> Vladimir.
> 
> сб, 29 мая 2021 г. в 11:59, Vladimir Ozerov :
> 
> > Hi Haisheng, Jinpeng
> >
> > I think we are more or less on the same page:
> >
> >1. The current implementation of Apache Calcite may generate wasteful
> >alternatives because rules lack the optimization context.
> >2. But the actual impact on efficiency is not clear.
> >
> > The (2) is essential to understand whether my efforts make any practical
> > sense. And so far, I have only a vague common sense and some simple
> > examples in mind, which is not sufficient to make any claims.
> >
> > Nevertheless, I've checked the source code of the original Columbia
> > optimizer. I was wrong in my original claim that Columbia doesn't pass
> > optimization context to rules. It does [1]. The context consists of
> > required traits and cost budget. In Apache Calcite terms, the context is
> > passed to both "RelRule.matches" and "RelRule.onMatch", so that the rule
> > may decide on the optimization strategy based on parent request. This is
> > exactly what I was trying to achieve in my system with some hacks around
> > derive/passThrough.
> >
> > Regarding the example with join, my proposal is not likely to make any
> > difference because the tables are not co-located on the join key, and hence
> > join may emit several distributions. Consider the different situation -
> > data is already collocated. Without the context, I will emit both 1-phase
> > and 2-phase aggregates because I do not know which distributions are
> > available below. With the context available, I can collect propagate
> > promising optimization requests from Aggregate rules (1-phase, 2-phase).
> > Then wait for input optimization and check what is returned. If only
> > [dist=a] is returned, I can skip the 2-phase aggregate completely.
> > Aggregate[group=a]
> >   Join[foo.a=bar.b]
> > Input(foo, dist=a)
> > Input(bar, dist=b)
> >
> > Another possible use case is join on several keys. By issuing a
> > context-aware optimization request [dist a1] from Aggregate to Join, we
> > can establish tight cost bounds on Aggregate and Join equivalence groups
> > very earl

Re: Requesting a Review on a PR

2021-06-11 Thread Haisheng Yuan
Hi Nick,

I believe it is https://github.com/apache/calcite/pull/2435, right?
People are reviewing now.

Thanks,
Haisheng Yuan

On 2021/06/11 17:12:26, Nick Riasanovsky  wrote: 
> I am a new contributor who opened a PR for an issue I opened on JIRA. Could
> I get some clarity on the process for getting someone to review my PR? I
> believe my PR should be ready.
> 
> Thanks,
> Nick Riasanovsky
> 


Re: Joining JIRA as a contributor

2021-06-10 Thread Haisheng Yuan
Hi Nick,

I have added you to the contributor list.

Haisheng

On 2021/06/11 01:37:59, Nick Riasanovsky  wrote: 
> Hi I'd like to join the JIRA contributor list. My Jira username is njriasan.
> 


Re: Optimizer (Hep) NOT scanning RelNodes of Subqqueries

2021-06-09 Thread Haisheng Yuan
Did you include the subquery related rules in the HepPlanner?

Haisheng

On 2021/06/09 17:59:44, Krishnakant Agrawal  wrote: 
> Hi All,
> 
> When running a HepOptimizer on top of a RelNode which has a subquery
> embedded in it, The Optimizer does not take the RelNode representing the
> subquery up for optimization.
> 
> Is this by design,  what's the correct way for the Subquery RelNode to be
> picked up for Optimization?
> 
> If this is a bug, what is the conceptually correct way to go about fixing
> this?
> 
> Thanks,
> KK
> 


Re: JIRA account as a contributor

2021-06-08 Thread Haisheng Yuan
Hi Dmitry,

I have added you as contributor.

Thanks,
Haisheng Yuan

On 2021/06/08 13:55:58, Dmitry Sysolyatin  wrote: 
> Hi !
> I would like to join to calcite team. I already fixed a small bug
> https://issues.apache.org/jira/browse/CALCITE-4630. But I can not do
> anything with JIRA ticket. My JIRA username is dmsysolyatin
> 


Re: Possibly incorrect assertion in the TopDownRuleDriver.DeriveTrait.derive

2021-06-07 Thread Haisheng Yuan
> Shouldn't we remove the assertion above?
Perhaps.

Or perhaps the rel2Subset mapping is not up to date.

Regards,
Haisheng Yuan

On 2021/06/06 13:09:16, Vladimir Ozerov  wrote: 
> Hi,
> 
> When doing a trait derivation in the non-OMAKASE mode, the following lines
> of code are invoked:
> 1: RelSubset relSubset = planner.register(newRel, rel);
> 2: assert relSubset.set == planner.getSubset(rel).set;
> 
> The assertion on the second line may fail because the "newRel" is assigned
> not the "rel" set, but "rel" *canonical set*, which might be different.
> 
> As a workaround, we may change the derive mode to OMAKASE. In this case, we
> do not hit the assertion and planning completes successfully.
> 
> Shouldn't we remove the assertion above?
> 
> Regards,
> Vladimir.
> 


[jira] [Created] (CALCITE-4638) Volcano top-down optimizer failed to recognize transformation rule correctly

2021-06-04 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-4638:
--

 Summary: Volcano top-down optimizer failed to recognize 
transformation rule correctly 
 Key: CALCITE-4638
 URL: https://issues.apache.org/jira/browse/CALCITE-4638
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Haisheng Yuan


In VolcanoPlanner.isTransformationRule, if the rule is a physical 
implementation rule, it may be recognized as a transformation rule incorrectly. 
See discussion in mailing list:
https://lists.apache.org/thread.html/r362da53656b80b727c72cc6e4b807c5b1eccb15a0aa6fec5f6bbe91e%40%3Cdev.calcite.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Release Managers

2021-06-04 Thread Haisheng Yuan
Thanks, Rui.

Now we have release managers:
Rui Wang for 1.29.0
Liya Fan for 1.30.0

Thanks,
Haisheng Yuan

On 2021/06/04 22:05:34, Rui Wang  wrote: 
> In another related thread I volunteered for releasing 1.29.0. I think I can
> still do it.
> 
> 
> -Rui
> 
> On Thu, Jun 3, 2021 at 7:09 PM Fan Liya  wrote:
> 
> > Hi Haisheng,
> >
> > I am interested in volunteering.
> >
> > Best,
> > Liya Fan
> >
> > On Fri, Jun 4, 2021 at 6:50 AM Julian Hyde  wrote:
> >
> > > Yes, I am release manager for 1.28. I’ll aim to release first week of
> > > August.
> > >
> > > Somewhat related to the call for release managers: we need active
> > > committers. (Stamatis, can you run your stats on PRs and active
> > > committers?) On May 21, I noted that PR backlog is building up, and
> > > appealed for 5 committers to look at 5 PRs each. But since then only 2
> > PRs
> > > have been committed (thanks Ruben & Stamatis). We need to do better.
> > >
> > > Julian
> > >
> > >
> > > > On Jun 3, 2021, at 3:33 PM, Haisheng Yuan  wrote:
> > > >
> > > > Hi all,
> > > >
> > > > In the next half year, I hope we can get back to the pace of release
> > for
> > > every 2 months. We need more release managers for the next few releases,
> > is
> > > there any one interested in volunteering to be release manager?
> > > > Currently Julian is the release manager for 1.28.0, we need 3 more for
> > > 1.29.0 ~ 1.31.0.
> > > >
> > > > Thanks,
> > > > Haisheng Yuan
> > >
> > >
> >
> 


[DISCUSS] Release Managers

2021-06-03 Thread Haisheng Yuan
Hi all,

In the next half year, I hope we can get back to the pace of release for every 
2 months. We need more release managers for the next few releases, is there any 
one interested in volunteering to be release manager?
Currently Julian is the release manager for 1.28.0, we need 3 more for 1.29.0 ~ 
1.31.0.

Thanks,
Haisheng Yuan


Re: Minified javascript in source releases

2021-06-02 Thread Haisheng Yuan
+1 for removing them entirely.

Thanks,
Haisheng Yuan

On 2021/06/02 13:36:05, Alessandro Solimando  
wrote: 
> Hi all,
> +1 for removing them as well, all the mentioned sw versions in [5,6]
> are extremely, extremely old.
> 
> Best regards,
> Alessandro
> 
> On Wed, 2 Jun 2021 at 14:38, Michael Mior  wrote:
> >
> > +1 for removing these entirely. I don't believe they are currently 
> > necessary.
> > --
> > Michael Mior
> > mm...@apache.org
> >
> > Le mer. 2 juin 2021 à 06:13, Stamatis Zampetakis  a 
> > écrit :
> > >
> > > Hi all,
> > >
> > > In the recent votes for Calcite [1] and Avatica [2] there were some -1
> > > votes regarding minified javascript files present in the source releases.
> > >
> > > The concerns are raised for the following files:
> > > site/js/html5shiv.min.js
> > > site/js/respond.min.js
> > >
> > > From discussions in other lists it seems that there are Apache source
> > > releases including minified files so I don't think we are violating any
> > > policy by including them but there are deferring opinions on this topic
> > > [3,4].
> > >
> > > To avoid bringing back the discussion to if minified javascript should be
> > > present or not I would like to propose one of the following alternatives:
> > >
> > > 1. It seems that both of the aforementioned scripts [5,6] are used for
> > > compatibility reasons with old browsers. I think most of the people are 
> > > not
> > > using these browsers anymore so I would suggest removing them altogether
> > > from the code.
> > >
> > > 2. If for some reason we need to keep them then we could directly use the
> > > non minified version. The difference of minified vs unminified for both
> > > files is ~6KB so I doubt it will have any real impact with the internet
> > > connections that we use nowadays.
> > >
> > > What do you think?
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1]
> > > https://lists.apache.org/thread.html/r049cf5b953eb5824ba37df21094dd7c1e92146d659db4735ad121593%40%3Cdev.calcite.apache.org%3E
> > > [2]
> > > https://lists.apache.org/thread.html/r074720fc062bde09a7e69dc77f77754c7fd4c22f1357514f6b2196d0%40%3Cdev.calcite.apache.org%3E
> > > [3]
> > > https://lists.apache.org/thread.html/rcddc30dd1f0c7f20709e09de7202d4d6885d6235a7fce1c1ab46e4ed%40%3Clegal-discuss.apache.org%3E
> > > [4]
> > > http://mail-archives.apache.org/mod_mbox/incubator-general/201805.mbox/%3c89a47c0b-e74e-4571-b282-dbe505291...@classsoftware.com%3e
> > > [5] https://github.com/scottjehl/Respond
> > > [6] https://github.com/aFarkas/html5shiv
> 


Re: [VOTE] Release apache-calcite-1.27.0 (release candidate 0)

2021-06-01 Thread Haisheng Yuan
Thank you for pushing the new release forward, Stamatis.

1. Checked checksum and signature: OK
2. Ran Gradle test: OK
3. Checked release notes: OK

Gradle 6.8.3, Java env: openjdk version "1.8.0_292", MacOS 10.15.7 (19H1030).

+1 (binding)

Thanks,
Haisheng Yuan

On 2021/06/01 22:28:40, Julian Hyde  wrote: 
> FYI, I have just discovered 
> https://issues.apache.org/jira/browse/CALCITE-4629 
> <https://issues.apache.org/jira/browse/CALCITE-4629> in 1.27 RC 0. I don’t 
> have a repro case. I don’t think it is a show-stopper so my vote remains +1.
> 
> > On Jun 1, 2021, at 2:59 PM, Julian Hyde  wrote:
> > 
> > +1
> > 
> > Downloaded, checked hashes and signature, compared tar.gz contents to git 
> > commit 60f07118f3. Confirmed that gradle-wrapper.jar is missing from 
> > tar.gz. Using instructions in site/_docs/howto.md, downloaded and installed 
> > gradle-6.8.3 from https://gradle.org/releases/ 
> > <https://gradle.org/releases/>, and compiled and ran tests using Ubuntu 
> > Linux, JDK 11.0.4, Gradle 6.8.1 and 6.8.3.
> > 
> > Checked LICENSE, NOTICE. Ran RAT.
> > 
> > Julian
> > 
> > 
> >> On May 31, 2021, at 8:28 AM, Stamatis Zampetakis  >> <mailto:zabe...@gmail.com>> wrote:
> >> 
> >> Hi all,
> >> 
> >> I have created a build for Apache Calcite 1.27.0, release
> >> candidate 0.
> >> 
> >> Thanks to everyone who has contributed to this release.
> >> 
> >> You can read the release notes here:
> >> https://github.com/apache/calcite/blob/calcite-1.27.0-rc0/site/_docs/history.md
> >>  
> >> <https://github.com/apache/calcite/blob/calcite-1.27.0-rc0/site/_docs/history.md>
> >> 
> >> The commit to be voted upon:
> >> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=60f07118f31776462ea35ffdaa1f46c633251f69
> >> 
> >> Its hash is 60f07118f31776462ea35ffdaa1f46c633251f69
> >> 
> >> The artifacts to be voted on are located here:
> >> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.27.0-rc0
> >> (revision 48026)
> >> 
> >> The hashes of the artifacts are as follows:
> >> 6e81b8b155cbece67d5b0c3f81ee3cf086da7e28b95d20ad8d819080b6caaf03f021af00339e9d70a2d7b1a0b27a86fac14ad94c0a2967a12991fb9170dee4d3
> >> *apache-calcite-1.27.0-src.tar.gz
> >> 
> >> A staged Maven repository is available for review at:
> >> https://repository.apache.org/content/repositories/orgapachecalcite-1115/org/apache/calcite/
> >> 
> >> Release artifacts are signed with the following key:
> >> https://people.apache.org/keys/committer/zabetak.asc
> >> https://www.apache.org/dist/calcite/KEYS
> >> 
> >> To create the jars and test Apache Calcite: "gradle build" (requires an
> >> appropriate Gradle/JDK installation).
> >> 
> >> Please vote on releasing this package as Apache Calcite 1.27.0.
> >> 
> >> The vote is open for the next 72 hours and passes if a majority of at
> >> least three +1 PMC votes are cast.
> >> 
> >> [ ] +1 Release this package as Apache Calcite 1.27.0
> >> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> >> [ ] -1 Do not release this package because...
> >> 
> >> 
> >> Here is my vote:
> >> 
> >> +1 (binding)
> >> 
> >> Stamatis
> > 
> 
> 


Re: Trait propagation guidelines

2021-05-28 Thread Haisheng Yuan
> 2) Optimization requests are basically sent to RelSet-s, not RelSubset-s,
> as we make pairwise comparisons between the requested RelSubset and other
> subsets in the set [5][6].

I agree with you. There could be some waste when the new delivered / required 
traitset is generated by "passThrough"/ "derive", in which case, we only need 
enforcer between the pair of subsets, instead of pairing with all other 
required / delivered subsets in the RelSet. i.e.
In the MEMO group, we have 2 required traitsets:
1) Hash[a] Sort[b]
2) Hash[b] Sort[c]

When we try to pass Hash[a] Sort[b] to one of physical operators say Project, 
we found that we can pass down Hash[a] down to its child, then we get a new 
physical Project with traitset Hash[a], we only need enforcer between Hash[a] 
and Hash[a]Sort[b], but currently in method "addConverters", we also generate 
enforcer between Hash[a] and Hash[b]Sort[c], which is not actually what we want.

I think it is definitely worth trying to optimize.

Regards,
Haisheng Yuan
On 2021/05/28 19:15:03, Haisheng Yuan  wrote: 
> Hi Vladimir,
> 
> The top-down optimizer does NOT require implementation rule to generate 1 to 
> 1 physical operator for a logical operator, as you can see, if you generate a 
> 2 phase physical aggregates for the logical aggregate in the implementation 
> rule, it still works. Window is special because we can reshuffle the 
> execution order of window functions, and that order makes a difference 
> according to different parent physical property request. A single converged 
> physical Window operator catered for this speciality. However as I said I 
> don't think it is a common scenario.
> 
> > the whole decision of whether to go with 1-phase or 2-phase
> > aggregate is a physical decision that should be made based on available (or
> > assumed) input traits.
> What is the problem of generating both 1-phase and 2-phase aggregates and 
> choose the best one based on the cost?
> 
> Let's see the following query:
> select a, min(b) from (select * from foo, bar where foo.a=bar.a) t group by a;
> suppose foo is randomly distributed fact table, and bar is randomly 
> distributed dimension table.
> Consider the 2 following plans:
> 1) 
> PhysicalAggregate
>+-- HashJoin
>   +--  HashDistribute by a
>  +-- TableScan on foo
>   +--  HashDistribute by a
>  +-- TableScan on bar
> 
> 2) 
> PhysicalAggregate(global)
>+--  HashDistribute by a
> + PhysicalAggregate(local)
> + HashJoin
>  +-- TableScan on foo
>  +--  Broadcast
>+-- TableScan on bar
> 
> Can you tell that the single phase aggregate plan is always better than the 2 
> phase aggregate plan?
> 
> > Therefore, the typical way to optimize
> > LogicalAggregate is to split in the physical phase (implementation rule,
> > pass-through, derive). Practical systems like Dremio [1] and Flink [2]
> > work this way.
> Dremio and Flink work this way doesn't mean it is a good way. Greenplum Orca 
> and Alibaba MaxCompute optimizer work in another way. In Flink and Dremio, 
> they have HashAggPRule to generate 1 phase HashAgg and 2 phase HashAgg, 
> SortAggPRule to generate 1 phase SortAgg and 2 phase SortAgg. However do you 
> think there is possibility that the global SortAgg combined with local 
> HashAgg, or the global HashAgg combined with local SortAgg may perform better 
> in difference cases? Are you going to generate all the 4 combinations in the 
> implementation rule? There are some cases we found we'd better to split the 
> aggregate into 3 phase aggregate [1], in which case, will the implementation 
> rule generate 3 HashAggs or 3 SortAggs, or all the 6 combinations?
> 
> In our system, we have 1 phase, 2 phase, 3 phase logical aggregate rules to 
> transform the LogicalAggregate to another kind of logical aggregate(s) with 
> phase info, say LogicalXXXAggregate, then our physical aggregate rules match 
> this kind of node to generate HashAgg or StreamAgg. Of course, in the logical 
> rules, we can add business logic to guess the possible traits delivered by 
> child nodes to determine whether the rule definitely won't generate a better 
> alternative and may decide to abort this transformation early. But I would 
> rather let the cost model decide.
> 
> Admittedly, the current top-down optimization is not pure on-demand request 
> oriented, because it will always generate a physical request regardless the 
> parent nodes' trait request. For example the following query in a 
> non-distributed environment:
&

Re: Trait propagation guidelines

2021-05-28 Thread Haisheng Yuan
Hi Vladimir,

The top-down optimizer does NOT require implementation rule to generate 1 to 1 
physical operator for a logical operator, as you can see, if you generate a 2 
phase physical aggregates for the logical aggregate in the implementation rule, 
it still works. Window is special because we can reshuffle the execution order 
of window functions, and that order makes a difference according to different 
parent physical property request. A single converged physical Window operator 
catered for this speciality. However as I said I don't think it is a common 
scenario.

> the whole decision of whether to go with 1-phase or 2-phase
> aggregate is a physical decision that should be made based on available (or
> assumed) input traits.
What is the problem of generating both 1-phase and 2-phase aggregates and 
choose the best one based on the cost?

Let's see the following query:
select a, min(b) from (select * from foo, bar where foo.a=bar.a) t group by a;
suppose foo is randomly distributed fact table, and bar is randomly distributed 
dimension table.
Consider the 2 following plans:
1) 
PhysicalAggregate
   +-- HashJoin
  +--  HashDistribute by a
 +-- TableScan on foo
  +--  HashDistribute by a
 +-- TableScan on bar

2) 
PhysicalAggregate(global)
   +--  HashDistribute by a
+ PhysicalAggregate(local)
+ HashJoin
 +-- TableScan on foo
 +--  Broadcast
   +-- TableScan on bar

Can you tell that the single phase aggregate plan is always better than the 2 
phase aggregate plan?

> Therefore, the typical way to optimize
> LogicalAggregate is to split in the physical phase (implementation rule,
> pass-through, derive). Practical systems like Dremio [1] and Flink [2]
> work this way.
Dremio and Flink work this way doesn't mean it is a good way. Greenplum Orca 
and Alibaba MaxCompute optimizer work in another way. In Flink and Dremio, they 
have HashAggPRule to generate 1 phase HashAgg and 2 phase HashAgg, SortAggPRule 
to generate 1 phase SortAgg and 2 phase SortAgg. However do you think there is 
possibility that the global SortAgg combined with local HashAgg, or the global 
HashAgg combined with local SortAgg may perform better in difference cases? Are 
you going to generate all the 4 combinations in the implementation rule? There 
are some cases we found we'd better to split the aggregate into 3 phase 
aggregate [1], in which case, will the implementation rule generate 3 HashAggs 
or 3 SortAggs, or all the 6 combinations?

In our system, we have 1 phase, 2 phase, 3 phase logical aggregate rules to 
transform the LogicalAggregate to another kind of logical aggregate(s) with 
phase info, say LogicalXXXAggregate, then our physical aggregate rules match 
this kind of node to generate HashAgg or StreamAgg. Of course, in the logical 
rules, we can add business logic to guess the possible traits delivered by 
child nodes to determine whether the rule definitely won't generate a better 
alternative and may decide to abort this transformation early. But I would 
rather let the cost model decide.

Admittedly, the current top-down optimization is not pure on-demand request 
oriented, because it will always generate a physical request regardless the 
parent nodes' trait request. For example the following query in a 
non-distributed environment:
select a, b, c, max(d) from foo group by a, b, c order by a desc;

It will first generate a StreamAgg[a ASC, b ASC, c ASC] no matter what the 
parent node requires, then the "passThrough" tells StreamAgg that parent 
requires [a DESC], we get a StreamAgg[a DESC, b ASC, c ASC]. It would be ideal 
if we only generate StreamAgg[a DESC, b ASC, c ASC] by request, but I don't 
think that will make much difference, the bottleneck relies on the join order 
enumeration and the Project related operation.

Regards,
Haisheng Yuan

[1] 
https://github.com/greenplum-db/gporca/blob/master/libgpopt/src/xforms/CXformSplitDQA.cpp

On 2021/05/28 09:17:45, Vladimir Ozerov  wrote: 
> Hi Jinpeng, Haisheng,
> 
> Thank you for your inputs. I really appreciate that. Let me try to address
> some of your comments and share some experience with the implementation of
> optimizers for a distributed engine I am currently working with.
> 
> First of all, I would argue that multiple logical operators do not have a
> 1-1 mapping to physical operators, and Window is not special here. For
> instance, LogicalAggregate doesn't have 1-1 mapping to physical aggregates
> because the physical implementation can be either 1-phase or 2-phase. It
> doesn't matter that the 2-phase aggregate is a composition of two 1-phase
> aggregates: the whole decision of whether to go with 1-phase or 2-phase
> aggregate is a physical decis

Re: Top-down optimizer cannot explore the search space because physical rule is treated as transformation rule

2021-05-28 Thread Haisheng Yuan
Great, that is the correct way to change it and that should be the default 
implementation.

On 2021/05/28 17:41:15, Vladimir Ozerov  wrote: 
> BTW, I tried to change the implementation to:
> 
>  1: protected boolean isTransformationRule(VolcanoRuleCall match) {
>  2:return match.getRule() instanceof TransformationRule;
>  3: }
> 
> It solved my problem - plans returned to normal. In the Apache Calcite
> repo, only 4 tests in the TopDowOptTest class failed due to a minor
> operator reordering.
> 
> пт, 28 мая 2021 г. в 20:37, Vladimir Ozerov :
> 
> > Hi Jinpeng,
> >
> > Thank you for the clarification. When I saw the code in question for the
> > first time, my first thought was that it was perhaps designed for gradual
> > migration. The main problem is that the current implementation discards
> > parts of the plan *silently*, which might be difficult to spot. I
> > only spotted the problem in my specific case because I had ~100 tests with
> > complex queries. Otherwise, I would happily proceed with the new rule
> > without knowing that I lost important parts of the search space.
> >
> > That said, I think we can do the following:
> >
> >1. Emit a warning if or even throw an exception if the transformation
> >rule produced a physical node. This should be trivial to implement - add 
> > an
> >expected node type to VolcanoRuleCall (e.g., "logical", "physical", 
> > "any").
> >The warning/exception should contain a proper fix suggestion - to 
> > override
> >the VolcanoPlanner.isTransformationRule.
> >2. Alternatively - do a breaking change. Apache Calcite doesn't have a
> >major release cadence. It is normal practice in many products to do
> >breaking changes in minor releases. Even popular products like Mongo or
> >DataStax do it regularly. We may inform the user in the first release and
> >change to "rule instanceof TransformationRule" in the next release.
> >
> > Does it make sense?
> >
> > Regards,
> > Vladimir.
> >
> > пт, 28 мая 2021 г. в 19:33, Jinpeng Wu :
> >
> >> Hi, Vladimir. Good catch! There could be some improvements here.
> >>
> >> Actually, this problem was discovered early when the top-down rule driver
> >> was designed. At that time, no rule was annotated as "TransformationRule".
> >> Moreover, it is impossible to ask every calcite user who designed their
> >> own
> >> rules to annotate the existing code. So the top-down rule driver was
> >> designed so that it can:
> >> 1. Work in chaos: even if there are no hints for rule types, it can still
> >> work. Some opportunities may be lost, but NO failures, NO exceptions, and
> >> NO worse than the original driver. People can migrate to the new driver
> >> without concern.
> >> 2. Be Improvable: Users can refactor their code, if they want, step by
> >> step. As rule types become more and more accurate, the system achieves
> >> more
> >> and more benefits
> >> 3. Be easy to customize: the default implementation is designed for the
> >> most common cases, so that most users can benefit from it without much
> >> effort. But it is not possible to fulfill all requirements as different
> >> systems could have very different patterns to define logical and
> >> physical. That's why the isTransformationRule method is put in
> >> VolcanoPlanner and marked as protected: overwriting it can be very simple.
> >>
> >> Moreover, losing some "derive" opportunities is not as serious as
> >> imagination. As I mentioned in previous discussions, parents are in charge
> >> of raising as many requirements as possible. During "derive", if specific
> >> traits were not built by children, it means that no parents were requiring
> >> that. And if parents finally require that traits in the latter
> >> optimization, passThrough methods get called and new physical nodes are
> >> generated and "derive" get called again.
> >> I tested it on millions of queries, with or without correct rule types, in
> >> my own product. The performance of group pruning varies a lot. But the
> >> output plans are almost the same. Only one obvious exception was
> >> discovered: the spool node. That's because spool nodes cannot "passThough"
> >> parent traits (it could have multiple parents and current framework cannot
> >> handle such a situation) while it can "derive" input traits.
> >>
> >> Of course, this conclusion may not apply to your product as we could have
> >> quite different rule sets. I am just sharing some of my experiences. Maybe
> >> the current implementation of "isTransformationRule" is not good enough.
> >> If
> >> you have any better solutions, please share them.
> >>
> >> Thanks,
> >> Jinpeng Wu
> >>
> >> On Fri, May 28, 2021 at 7:10 PM Vladimir Ozerov 
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I have an optimizer that uses top-down VolcanoPlanner and has a
> >> > ConverterRule for every LogicalNode. I have a new requirement when one
> >> of
> >> > the physical rules must emit several physical nodes instead of one. I
> >> tried
> >> 

Re: Introduction

2021-05-27 Thread Haisheng Yuan
Hi Andras,

Thank you for introducing yourself and dianemo DB to us. 
Welcome to Calcite community and look forward to conference to know more about 
your algorithm.

Thanks,
Haisheng Yuan

On 2021/05/27 16:59:00, András Gerlits  wrote: 
> Hello everyone,
> 
> I've been asked by the Getting Started page on the Calcite website to send
> a short introduction email about what I'm doing here.
> 
> I invented a new distributed transaction algorithm, upon which I've already
> built an ACID CRUD database, which can currently be used via an API. I've
> started working on creating the code for a new adapter, so that I can have
> a full-fledged SQL solution.
> 
> I'm incredibly grateful for all the effort that went into creating this
> project, which is exactly what I needed for my software to become a proper
> SQL database.
> 
> I very much hope that our startup can contribute back to this project once
> we're successfully monetizing our solution. If anyone is interested in our
> algorithm, I will be talking about it at the upcoming HydraConf conference.
> 
> https://hydraconf.com/2021/msk/talks/28w5b5fzcdc7czwlhcisbm/
> 
> I'm not linking our own website on purpose, as it needs a bit of work.
> Please feel free to reach out to me directly if you have any questions or
> suggestions.
> 
> Thanks,
> Andras Gerlits
> 


Re: Trait propagation guidelines

2021-05-27 Thread Haisheng Yuan
Getting back to your window query example:

> Consider the Window function:
> SELECT
>   AGG1 over (partition by a),
>   AGG2 over (partition by b),
>   AGG3 over (partition by c),
>   ...
> FROM input

Window is quite special because the logical vs physical operator count is not 1 
to 1, generally we generate a physical window operator for each window function 
with different partition column. That determines that once the physical 
operators are created, their order can't be changed. Hence your proposal of 
passing required traits to physical rule can mitigate the problem.

But things would be much easier if we define a different physical window 
operator. 
For the above query, we can generate the *Single* physical window operator like 
this:
PhysicalWindow[AGG1 over (partition by a), AGG2 over (partition by b), AGG3 
over (partition by c)]
or PhysicalWindow(a, b, c) for brevity.
How do we define the physical properties for it?
The operator delivers hash distribution on first window partition column a, but 
requires its child input to be hash distributed by its last window partition 
column c. 

If the parent operator request hash distribution on b, or c, the window 
operator will be called on "passthrough" method and generate PhysicalWindow(b, 
a, c), or PhysicalWindow(c, a, b). After final plan is generated, during post 
processing, we can replace the window operator with multiple layer nested 
window operators, and insert Exchange operators if necessary. But frankly 
speaking, I haven't seen any use cases of this kind in production. 

Regarding the rule alternative you proposed;
> class PhysicalAggregateRule extends PhysicalRule {
>  void onMatch(RelOptRuleCall call, *RelTraitSet requiredTraits*) {...

Consider the following plan:
InnerJoin (on a)
  +-- Agg (on b)
  +-- Scan

For the inner join, we can generate sort merge join and hash join. 
The sort merge join can request the following traits to Agg:
1) Singleton
2) hash distribution on a, sorted by a
The hash join can request the following traits to Agg:
1) Singleton
2) hash distribution on a
3) any distribution
4) broadcast distribution

The PhysicalAggregateRule will be called and executed 5 times, while generating 
the same physical aggregate candidates, unless we pass a whole list of required 
traits to the physical rule, which I have prototyped some time ago with the 
exact idea.

Regards,
Haisheng Yuan

On 2021/05/27 17:44:23, Haisheng Yuan  wrote: 
> >In distributed systems, an implementation rule may produce different
> >physical operators depending on the input traits. Examples are Aggregate,
> >Sort, Window.
> 
> No, in most cases, physical operators are generated regardless the input, 
> because the input traits are not know yet. Window might be an exception. 
> 
> >Since input traits are not known when the rule is fired, we must
> >generate *all possible combinations* of physical operators that we may
> >need. For LogicalAggregate, we must generate 1-phase and 2-phase
> >alternatives. For LogicalSort, we also have 1-phase and 2-phase
> >alternatives. Etc.
> 
> IMHO, 1 phase and 2 phase are just different logical alternatives, that is 
> also why I call it a logical rule to split the aggregate into a 2 phase 
> aggregate. But HashAggregate and StreamAggregate are indeed the different 
> physical alternatives for a LogicalAggregate.
> 
> 
> >   Unlike Aggregate or Sort, which may have only 1 or 2 phases, certain
> >   logical operators may have many physical alternatives. Consider the Window
> >   function:..
> 
> In window implementation rule, when building physical operator for Window 
> that has multiple window functions but with different partition columns, we 
> can infer the possible traits that can be delivered by input operators by 
> creating your own RelMetaData, hence multiple window combination with certain 
> order, but not exhausted enumeration. In fact, the window ordering problem 
> exists in every different kind of optimizer.
> 
> > As input traits are not known when the rule is fired, the nodes emitted
> > from the implementation rules most likely would not be used in the final
> > plan. 
> 
> That is quite normal, any operator generated by implementation rule might not 
> be used in the final plan, because there may be tens of thousands of 
> alternatives, we only choose the one with lowest cost. 
> 
> > For example, I can create a physical aggregate that demands
> > non-strict distribution {a,b} from its input, meaning that both [a,b] and
> > [b,a] is ok. However, in the final plan, we are obligated to have a strict
> > distribution - either [a, b] in that order, or [b, a] in that order -
> > otherwise, physical operators like Join and Union will not work. 
&

Re: Trait propagation guidelines

2021-05-27 Thread Haisheng Yuan
>In distributed systems, an implementation rule may produce different
>physical operators depending on the input traits. Examples are Aggregate,
>Sort, Window.

No, in most cases, physical operators are generated regardless the input, 
because the input traits are not know yet. Window might be an exception. 

>Since input traits are not known when the rule is fired, we must
>generate *all possible combinations* of physical operators that we may
>need. For LogicalAggregate, we must generate 1-phase and 2-phase
>alternatives. For LogicalSort, we also have 1-phase and 2-phase
>alternatives. Etc.

IMHO, 1 phase and 2 phase are just different logical alternatives, that is also 
why I call it a logical rule to split the aggregate into a 2 phase aggregate. 
But HashAggregate and StreamAggregate are indeed the different physical 
alternatives for a LogicalAggregate.


>   Unlike Aggregate or Sort, which may have only 1 or 2 phases, certain
>   logical operators may have many physical alternatives. Consider the Window
>   function:..

In window implementation rule, when building physical operator for Window that 
has multiple window functions but with different partition columns, we can 
infer the possible traits that can be delivered by input operators by creating 
your own RelMetaData, hence multiple window combination with certain order, but 
not exhausted enumeration. In fact, the window ordering problem exists in every 
different kind of optimizer.

> As input traits are not known when the rule is fired, the nodes emitted
> from the implementation rules most likely would not be used in the final
> plan. 

That is quite normal, any operator generated by implementation rule might not 
be used in the final plan, because there may be tens of thousands of 
alternatives, we only choose the one with lowest cost. 

> For example, I can create a physical aggregate that demands
> non-strict distribution {a,b} from its input, meaning that both [a,b] and
> [b,a] is ok. However, in the final plan, we are obligated to have a strict
> distribution - either [a, b] in that order, or [b, a] in that order -
> otherwise, physical operators like Join and Union will not work. 

It depends on your own satisfaction model and how do you coordinate property 
requirement among child operators. Unlike Orca optimizer, where there is exact 
match, partial satisfying, orderless match etc, Calcite's default 
implementation always require exact satisfying. But we can still make use of 
"passThrough" and "derive" to achieve our goal. i.e. the aggregate generated by 
implementation rule requires itself and its child to delivered distribution on 
[a,b], but the "derive" method tells Aggregate that [b,a] is available, it can 
generate another option to require [b,a] instead.

> In distributed engines, the nodes emitted from rules are basically "templates"
> that must be replaced with normal nodes.

There is no difference between distributed and non-distributed engines when 
dealing with this. In Orca and CockroachDB optimizer, the nodes emitted from 
rules are operators without physical properties, the optimizer then request 
physical properties in top-down manner, either recursively or stack, or state 
machine. Calcite is quite different. when the physical operator is generated by 
implementation rule, the physical operator must has its own traits, at the same 
time, the traits that it expects its child operators to deliver. So in Calcite, 
they are not "templates". The difference is there since Calcite's inception.

Regards,
Haisheng Yuan

On 2021/05/27 08:59:33, Vladimir Ozerov  wrote: 
> Hi Haisheng,
> 
> Thank you for your inputs. They are really helpful. Let me summarize your
> feedback in my own words to verify that I understand it correctly.
> 
>1. In distributed systems, an implementation rule may produce different
>physical operators depending on the input traits. Examples are Aggregate,
>Sort, Window.
>2. Since input traits are not known when the rule is fired, we must
>generate *all possible combinations* of physical operators that we may
>need. For LogicalAggregate, we must generate 1-phase and 2-phase
>alternatives. For LogicalSort, we also have 1-phase and 2-phase
>alternatives. Etc.
>3. If all combinations are generated, it is expected that "passThrough"
>and "derive" would be just trivial replacements of traits for most cases.
>This is why "passThroughTraits" and "deriveTraits" are recommended. A
>notable exception is TableScan that may emit alternative indexes in
>response to the pass-through requests.
> 
> If my understanding is correct, then there are several issues with this
> approach still.
> 
> 1. Unlike Aggregate 

Re: Trait propagation guidelines

2021-05-26 Thread Haisheng Yuan
Another point I would like to mention is that it is not recommended to override 
method "passThrough" and "derive" directly, override "passThroughTraits" and 
"deriveTraits" instead, so that we can make sure only the same type of physical 
node is created and no nested relnodes or additional RelSets are created, 
unless you know you have to create different type of nodes. For example, if the 
table foo has an btree index on column a, and the parent relnode is requesting 
ordering on column a, then we may consider to override "passThrough" of 
TableScan to return an IndexScan instead of a TableScan.

Regards,
Haisheng Yuan
On 2021/05/26 22:45:20, Haisheng Yuan  wrote: 
> Hi Vladimir,
> 
> 1. You need a logical rule to split the aggregate into a local aggregate and 
> global aggregate, for example:
> https://github.com/greenplum-db/gporca/blob/master/libgpopt/src/xforms/CXformSplitGbAgg.cpp
> Only implementation rules can convert a logical node to a physical node or 
> multiple physical nodes.
> After physical implementation, you have 2 physical alternatives: 
> 1) single phase global physical aggregate, 
> 2) 2 phase physical aggregate with local and global aggregate.
> It should be up to the cost to decide which one to choose.
> 
> 2. Given a desired traitset from parent node, the current relnode only needs 
> to generate a single relnode after passing down the traitset. Given a 
> traitset delivered by child node, the current relnode only derive a single 
> relnode. Quite unlike other optimizer, in Calcite's top-down optimizer, you 
> don't need to worry about issuing multiple optimization requests to inputs, 
> which is handled by Calcite framework secretly. i.e.
> SELECT a, b, min(c) from foo group by a, b;
> In many other optimizer, we probably need ask the aggregate to issue 3 
> distribution requests for tablescan on foo, which are 
> 1) hash distributed by a, 
> 2) hash distributed by b, 
> 3) hash distributed by a, b
> However in Calcite top-down optimizer, your physical implementation rule for 
> global aggregate only need generate a single physical node with hash 
> distribution by a, b. In case the table foo happens to be distributed by a, 
> or b, the derive() method will tell you there is an opportunity. This is the 
> feature that Calcite's top-down optimizer excels over other optimizers, 
> because this can dramatically reduce the search space while keeping the 
> optimal optimization opportunity.
> 
> 3. This is by design. Nodes produced from "passThrough" and "derive" and just 
> sibling physical node with different traitset, we only need the initial 
> physical nodes after implementation to avoid unnecessary operations. The 
> fundamental reason is, unlike Orca optimizer where physical node and physical 
> property are separate things, Calcite's logical/physical nodes contains 
> traitset. With regard to the latter question, can you give an example?
> 
> Regards,
> Haisheng Yuan
> 
> 
> On 2021/05/26 20:11:57, Vladimir Ozerov  wrote: 
> > Hi,
> > 
> > I tried to optimize a certain combination of operators for the distributed
> > engine and got stuck with the trait propagation in the top-down engine. I
> > want to ask the community for advice on whether the problem is solvable
> > with the current Apache Calcite implementation or not.
> > 
> > Consider the following logical tree:
> > 3: LogicalAggregate[group=[a], F2(c)]
> > 2:  LogicalAggregate[group=[a,b], F1(c)]
> > 1:LogicalScan[t]
> > 
> > Consider that these two aggregates cannot be merged or simplified for
> > whatever reason. We have only a set of physical rules to translate this
> > logical tree to a physical tree. Also, there could be any number of
> > other operators between these two aggregates. We omit them for clarity,
> > assuming that the distribution is not destroyed.
> > 
> > In the distributed environment, non-collocated aggregates are often
> > implemented in two phases: local pre-aggregation and final aggregation,
> > with an exchange in between. Consider that the Scan operator is hash
> > distributed by some key other than [a] or [b]. If we optimize operators
> > without considering the whole plan, we may optimize each operator
> > independently, which would give us the following plan:
> > 3: PhysicalAggregate[group=[a], F2_phase2(c)] //
> > HASH_DISTRIBUTED [a]
> > 3:   Exchange[a]  //
> > HASH_DISTRIBUTED [a]
> > 3: PhysicalAggregate[group=[a], F2_phase1(c)] //
> > HASH_DISTRIBUTED [a,b]
> > 2:   PhysicalAggregate[group=[a,b], F1_phase2(c)] //
> > HASH_D

Re: Trait propagation guidelines

2021-05-26 Thread Haisheng Yuan
Hi Vladimir,

1. You need a logical rule to split the aggregate into a local aggregate and 
global aggregate, for example:
https://github.com/greenplum-db/gporca/blob/master/libgpopt/src/xforms/CXformSplitGbAgg.cpp
Only implementation rules can convert a logical node to a physical node or 
multiple physical nodes.
After physical implementation, you have 2 physical alternatives: 
1) single phase global physical aggregate, 
2) 2 phase physical aggregate with local and global aggregate.
It should be up to the cost to decide which one to choose.

2. Given a desired traitset from parent node, the current relnode only needs to 
generate a single relnode after passing down the traitset. Given a traitset 
delivered by child node, the current relnode only derive a single relnode. 
Quite unlike other optimizer, in Calcite's top-down optimizer, you don't need 
to worry about issuing multiple optimization requests to inputs, which is 
handled by Calcite framework secretly. i.e.
SELECT a, b, min(c) from foo group by a, b;
In many other optimizer, we probably need ask the aggregate to issue 3 
distribution requests for tablescan on foo, which are 
1) hash distributed by a, 
2) hash distributed by b, 
3) hash distributed by a, b
However in Calcite top-down optimizer, your physical implementation rule for 
global aggregate only need generate a single physical node with hash 
distribution by a, b. In case the table foo happens to be distributed by a, or 
b, the derive() method will tell you there is an opportunity. This is the 
feature that Calcite's top-down optimizer excels over other optimizers, because 
this can dramatically reduce the search space while keeping the optimal 
optimization opportunity.

3. This is by design. Nodes produced from "passThrough" and "derive" and just 
sibling physical node with different traitset, we only need the initial 
physical nodes after implementation to avoid unnecessary operations. The 
fundamental reason is, unlike Orca optimizer where physical node and physical 
property are separate things, Calcite's logical/physical nodes contains 
traitset. With regard to the latter question, can you give an example?

Regards,
Haisheng Yuan


On 2021/05/26 20:11:57, Vladimir Ozerov  wrote: 
> Hi,
> 
> I tried to optimize a certain combination of operators for the distributed
> engine and got stuck with the trait propagation in the top-down engine. I
> want to ask the community for advice on whether the problem is solvable
> with the current Apache Calcite implementation or not.
> 
> Consider the following logical tree:
> 3: LogicalAggregate[group=[a], F2(c)]
> 2:  LogicalAggregate[group=[a,b], F1(c)]
> 1:LogicalScan[t]
> 
> Consider that these two aggregates cannot be merged or simplified for
> whatever reason. We have only a set of physical rules to translate this
> logical tree to a physical tree. Also, there could be any number of
> other operators between these two aggregates. We omit them for clarity,
> assuming that the distribution is not destroyed.
> 
> In the distributed environment, non-collocated aggregates are often
> implemented in two phases: local pre-aggregation and final aggregation,
> with an exchange in between. Consider that the Scan operator is hash
> distributed by some key other than [a] or [b]. If we optimize operators
> without considering the whole plan, we may optimize each operator
> independently, which would give us the following plan:
> 3: PhysicalAggregate[group=[a], F2_phase2(c)] //
> HASH_DISTRIBUTED [a]
> 3:   Exchange[a]  //
> HASH_DISTRIBUTED [a]
> 3: PhysicalAggregate[group=[a], F2_phase1(c)] //
> HASH_DISTRIBUTED [a,b]
> 2:   PhysicalAggregate[group=[a,b], F1_phase2(c)] //
> HASH_DISTRIBUTED [a,b]
> 2: Exchange[a, b] //
> HASH_DISTRIBUTED [a,b]
> 2:   PhysicalAggregate[group=[a,b], F1_phase1(c)] //
> HASH_DISTRIBUTED [d]
> 1: PhysicalScan[t]//
> HASH_DISTRIBUTED [d]
> 
> This plan is not optimal, because we re-hash inputs twice. A better plan
> that we want to get:
> 3: PhysicalAggregate[group=[a], F2(c)]// HASH_DISTRIBUTED
> [a]
> 2:   PhysicalAggregate[group=[a,b], F1_phase2(c)] // HASH_DISTRIBUTED
> [a]
> 2: Exchange[a]// HASH_DISTRIBUTED
> [a]
> 2:   PhysicalAggregate[group=[a,b], F1_phase1(c)] // HASH_DISTRIBUTED
> [d]
> 1: PhysicalScan[t]// HASH_DISTRIBUTED
> [d]
> 
> In this case, we take advantage of the fact that the distribution [a] is
> compatible with [a,b]. Therefore we may enforce only [a], instead of doing
> [a,b] and then [a]. Since exchange operators are very expensive, this
> optimization may

Re: Exposing multiple values of a trait from the operator

2021-05-25 Thread Haisheng Yuan
Negative, currently.

It is indeed complicated, but still acceptable IMO. If the community have 
consensus on this matter, we can create a JIRA to add the feature to trait and 
handle equivalent keys trait satisfaction in trait.satisfy() and column 
remapping in trait.apply(), hope that can ease some pain, if you happen to use 
Calcite's default distribution/collation implementation. 

Regards,
Haisheng Yuan

On 2021/05/25 17:45:32, Vladimir Ozerov  wrote: 
> Hi Haisheng,
> 
> Thank you for the advice. This is exactly how I designed distribution at
> the moment (the approach 2 from my original email) - as a List
> instead of just int[]. My main concern was the increased complexity of the
> trait propagation/derivation, as I have to manage these nested lists by
> hand. Nevertheless, it works well. So I hoped that there are better
> built-in approaches that I may use. If the answer is negative, I'll
> continue using the original approach, when multiple alternatives managed
> manually.
> 
> Regards,
> Vladimir.
> 
> вт, 25 мая 2021 г. в 20:30, Haisheng Yuan :
> 
> > Hi Vladimir,
> >
> > Glad to see you raised the question.
> >
> > Here is the advice:
> > Do not use RelMultipleTrait/RelCompositeTrait, which is fundamentally
> > flawed and has many bugs. It can't work properly no matter for top-down or
> > bottom-up.
> >
> > Instead, we need to add equivalent keys bitmap as the property of physical
> > trait like RelCollation, RelDistribution.
> >
> > For example:
> > class RelDistributionImpl {
> >   // list of distribution keys
> >   private ImmutableIntList keys;
> >
> >// list of equivalent bitset for each distribution key
> >   private ImmutableList equivBitSets;
> > }
> >
> > In the trait satisfy and column remapping, we also need to take equivalent
> > keys into consideration. Some of the work need to be done in Calcite core
> > framework.
> >
> > Greenplum Orca optimizer has similar strategy:
> >
> > https://github.com/greenplum-db/gporca/blob/master/libgpopt/include/gpopt/base/CDistributionSpecHashed.h#L44
> >
> > Regards,
> > Haisheng Yuan
> >
> > On 2021/05/25 15:37:32, Vladimir Ozerov  wrote:
> > > Hi,
> > >
> > > Consider the distributed SQL engine that uses a distribution property to
> > > model exchanges. Consider the following physical tree. To do the
> > > distributed join, we co-locate tuples using the equijoin key. Now the
> > Join
> > > operator has two equivalent distributions - [a1] and [b1]. It is critical
> > > to expose both distributions so that the top Aggregate can take advantage
> > > of the co-location.
> > >
> > > Aggregate[group=b1]
> > >   DistributedJoin[a.a1=b.b1]   // SHARDED[a1], SHARDED[b1]
> > > Input[a]   // SHARDED[a1]
> > > Input[b]   // SHARDED[b1]
> > >
> > > A similar example for the Project:
> > > Aggregate[group=$1]
> > >   Project[$0=a, $1=a] // SHARDED[$0], SHARDED[$1]
> > > Input // SHARDED[a]
> > >
> > > The question is how to model this situation properly?
> > >
> > > First, it seems that RelMultipleTrait and RelCompositeTrait were designed
> > > to handle this situation. However, I couldn't make them work with the
> > > top-down optimizer. The reason is that when we register a RelNode with a
> > > composite trait in MEMO, VolcanoPlanner flattens the composite trait into
> > > the default trait value in RelSet.add -> RelTraitSet.simplify. That is,
> > the
> > > trait [SHARDED[a], SHARDED[b]] will be converted to [ANY] so that the
> > > original traits could not be derived in the PhysicalNode.derive methods.
> > >
> > > Second, we may try to model multiple sharding keys in a single trait. But
> > > this complicates the implementation of PhysicalNode.passThrough/derive
> > > significantly.
> > > SHARDED[a1, a2], SHARDED[b1, b2] -> SHARDED[[a1, a2], [b1, b2]]
> > >
> > > Third, we may expose multiple traits using metadata. RelMdDistribution
> > > would not work, because it exposes only a single distribution. But a
> > custom
> > > handler may potentially fix that. However, it will not be integrated with
> > > the top-down optimizer still, which makes the idea questionable.
> > >
> > > To summarize, it seems that currently there is no easy way to handle
> > > composite traits with a top-down optimizer. I wonder whether someone from
> > > the devlist already solved similar issues in Apache Calcite or other
> > > optimizers. If so, what was the approach or best practices? Intuitively,
> > it
> > > seems that RelMultipleTrait/RelCompositeTrait approach might be the way
> > to
> > > go. But why do we replace the original composite trait set with the
> > default
> > > value in the RelTraitSet.simplify routine?
> > >
> > > Regards,
> > > Vladimir.
> > >
> >
> 


Re: Exposing multiple values of a trait from the operator

2021-05-25 Thread Haisheng Yuan
Hi Vladimir,

Glad to see you raised the question.

Here is the advice:
Do not use RelMultipleTrait/RelCompositeTrait, which is fundamentally flawed 
and has many bugs. It can't work properly no matter for top-down or bottom-up.

Instead, we need to add equivalent keys bitmap as the property of physical 
trait like RelCollation, RelDistribution.

For example:
class RelDistributionImpl {
  // list of distribution keys
  private ImmutableIntList keys; 

   // list of equivalent bitset for each distribution key
  private ImmutableList equivBitSets; 
}

In the trait satisfy and column remapping, we also need to take equivalent keys 
into consideration. Some of the work need to be done in Calcite core framework.

Greenplum Orca optimizer has similar strategy:
https://github.com/greenplum-db/gporca/blob/master/libgpopt/include/gpopt/base/CDistributionSpecHashed.h#L44

Regards,
Haisheng Yuan

On 2021/05/25 15:37:32, Vladimir Ozerov  wrote: 
> Hi,
> 
> Consider the distributed SQL engine that uses a distribution property to
> model exchanges. Consider the following physical tree. To do the
> distributed join, we co-locate tuples using the equijoin key. Now the Join
> operator has two equivalent distributions - [a1] and [b1]. It is critical
> to expose both distributions so that the top Aggregate can take advantage
> of the co-location.
> 
> Aggregate[group=b1]
>   DistributedJoin[a.a1=b.b1]   // SHARDED[a1], SHARDED[b1]
> Input[a]   // SHARDED[a1]
> Input[b]   // SHARDED[b1]
> 
> A similar example for the Project:
> Aggregate[group=$1]
>   Project[$0=a, $1=a] // SHARDED[$0], SHARDED[$1]
> Input // SHARDED[a]
> 
> The question is how to model this situation properly?
> 
> First, it seems that RelMultipleTrait and RelCompositeTrait were designed
> to handle this situation. However, I couldn't make them work with the
> top-down optimizer. The reason is that when we register a RelNode with a
> composite trait in MEMO, VolcanoPlanner flattens the composite trait into
> the default trait value in RelSet.add -> RelTraitSet.simplify. That is, the
> trait [SHARDED[a], SHARDED[b]] will be converted to [ANY] so that the
> original traits could not be derived in the PhysicalNode.derive methods.
> 
> Second, we may try to model multiple sharding keys in a single trait. But
> this complicates the implementation of PhysicalNode.passThrough/derive
> significantly.
> SHARDED[a1, a2], SHARDED[b1, b2] -> SHARDED[[a1, a2], [b1, b2]]
> 
> Third, we may expose multiple traits using metadata. RelMdDistribution
> would not work, because it exposes only a single distribution. But a custom
> handler may potentially fix that. However, it will not be integrated with
> the top-down optimizer still, which makes the idea questionable.
> 
> To summarize, it seems that currently there is no easy way to handle
> composite traits with a top-down optimizer. I wonder whether someone from
> the devlist already solved similar issues in Apache Calcite or other
> optimizers. If so, what was the approach or best practices? Intuitively, it
> seems that RelMultipleTrait/RelCompositeTrait approach might be the way to
> go. But why do we replace the original composite trait set with the default
> value in the RelTraitSet.simplify routine?
> 
> Regards,
> Vladimir.
> 


Re: Enumerable rules lost hint info

2021-05-20 Thread Haisheng Yuan
The hint can be used to specify the degree of parallelism (DOP), MIN/MAX memory 
allocated for the operator. In that case, we need to keep them in the physical 
operators. But I am not sure whether there are downstream projects that are 
using hints for physical resource.

On 2021/05/19 17:05:44, Julian Hyde  wrote: 
> Sure, there’s a slot to hold the hints. But is there a benefit to keeping 
> them in the physical algebra? There is certainly a cost to storing them, 
> propagating them, fixing them up as we apply rules.
> 
> The main source of hints is formatted comments in the SQL. As we get further 
> from the SQL source and the AST, the benefit decreases.
> 
> Danny,
> 
> Were you envisioning hints being propagated into physical algebra? There’s no 
> explicit mention in 
> https://docs.google.com/document/d/1mykz-w2t1Yw7CH6NjUWpWqCAf_6YNKxSc59gXafrNCs/edit
>  
> 
>  or https://issues.apache.org/jira/browse/CALCITE-482 
> .
> 
> Julian
> 
> 
> > On May 18, 2021, at 7:49 PM, JiaTao Tao  wrote:
> > 
> > Hi Julian
> > 
> > When hints were introduced, was that part of the plan?
> > 
> > "hint" is a field in the base class, so the type of the node doesn't matter
> > 
> > public abstract class Project extends SingleRel implements Hintable {
> >  //~ Instance fields 
> > 
> > 
> >  protected final ImmutableList exps;
> > 
> >  protected final ImmutableList hints;
> > 
> > 
> > And in RelBuilder#project, we just construct a project without
> > hints(ImmutableList.of()), so it will lose hints too.
> > 
> >  public RelBuilder project(Iterable nodes,
> >  Iterable fieldNames, boolean force) {
> >return project_(nodes, fieldNames, ImmutableList.of(), force);
> >  }
> > 
> > 
> > 
> > Regards!
> > 
> > Aron Tao
> > 
> > 
> > Julian Hyde  于2021年5月19日周三 上午12:30写道:
> > 
> >> Do we really need hints in physical operators? When hints were introduced,
> >> was that part of the plan?
> >> 
> >> Julian
> >> 
> >>> On May 17, 2021, at 11:50 PM, JiaTao Tao  wrote:
> >>> 
> >>> Hi
> >>> Recently, I'm integrating Calcite's hint into our system, and I've found
> >>> the Enumerable rules just throw the hint info(ImmutableList.of()) away:
> >>> 
> >>> public EnumerableProject(
> >>> RelOptCluster cluster,
> >>> RelTraitSet traitSet,
> >>> RelNode input,
> >>> List projects,
> >>> RelDataType rowType) {
> >>>   super(cluster, traitSet, ImmutableList.of(), input, projects,
> >> rowType);
> >>>   assert getConvention() instanceof EnumerableConvention;
> >>> }
> >>> 
> >>> 
> >>> Regards!
> >>> 
> >>> Aron Tao
> >> 
> 
> 


Re: Using Calcite as a Distributed Optimizer

2021-05-13 Thread Haisheng Yuan
Yes, definitely. Many distributed big data systems use Apache Calcite to 
optimize queries and generate distributed plans.

On 2021/05/13 23:16:10, Atri Sharma  wrote: 
> Thank you.
> 
> So my use case is such that I wish to use Calcite as a two phase optimizer
> -- Get a SQL query,  compile it and optimize it and convert it to a SQL
> fragment.
> 
> Then run the query on worker nodes, get results on master and merge results.
> 
> This question spans both Calcite and Avatica, but I wanted to understand if
> achieving the above is possible with Calcite today.
> 
> Atri
> 
> On Fri, 14 May 2021, 01:20 Julian Hyde,  wrote:
> 
> > Calcite has no user@ list.  So, ask away here!
> >
> > > On May 13, 2021, at 10:52 AM, Atri Sharma  wrote:
> > >
> > > Sorry, didn't realize I had sent to the dev list. I will send to the user
> > > list
> > >
> > > On Thu, 13 May 2021, 15:57 Atri Sharma,  wrote:
> > >
> > >> Hi All,
> > >>
> > >> Are there examples of using Calcite to compile and optimize queries to
> > >> be run on a set of nodes, and then merge partial results back?
> > >>
> > >> Atri
> > >>
> > >> --
> > >> Regards,
> > >>
> > >> Atri
> > >> l'apprenant
> > >>
> >
> >
> 


Re: How to match root node in a HEP Rule?

2021-05-03 Thread Haisheng Yuan
Hi Ian,

Is there any specific reason or use case that you have to match the root node 
and find the parent node in your customized rule?

Thanks,
Haisheng Yuan

On 2021/05/03 20:20:22, Julian Hyde  wrote: 
> > Is there a way to identify a node as being a root node during RelRule.match?
> 
> Not currently. One workaround would be to create 'class MyRoot extends
> SingleRel', add it as the root of the tree, and write your rules to
> match on it.
> 
> > Or more generally, is there a way to collect the parents of an arbitrary 
> > RelNode?
> 
> By design, the only way to find the parents of a RelNode are to write
> a rule that matches those parents. This ensures that rules will work
> in the Volcano planner, where a RelNode can have many parents.
> 
> Julian
> 
> On Mon, May 3, 2021 at 12:52 PM Ian Bertolacci
>  wrote:
> >
> > Hello,
> > I am trying to write a rule that matches (among other things) the root of a 
> > RelNode tree.
> >
> > Unfortunately, it seems that HepRuleCall.parents is only initialized and 
> > populated if the rule extends CommonRelSubExpr, which doesn’t really apply 
> > in my case.
> > (see: 
> > https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/hep/HepPlanner.java#L526)
> >
> > Is there a way to identify a node as being a root node during 
> > RelRule.match? Or more generally, is there a way to collect the parents of 
> > an arbitrary RelNode?
> >
> > Thanks.
> 


Re: Trait propagation in heterogeneous plans

2021-04-20 Thread Haisheng Yuan
Hi Vladimir,

> There are two problems here. First, the project operator potentially
> destroys any trait which depends on column order, such as distribution or
> collation. Therefore, EnumerableProject has an incorrect value of the
> distribution trait.

The enumerable convention is intended for in-memory, non-distributed 
environment.
Therefore, we only consider 2 traits: collation and convention. Other traits 
are not
guaranteed to work correctly. If you want it work with distribution, you have 
to create
your own operators, rules, either by extending or overriding, in which case, 
you will need
to remap distribution columns to get the correct distribution trait, just like 
how collation does.

> Second, which distribution should I assign to the CustomToEnumerable node?
> As I know that parent convention cannot handle the distribution properly,
> my natural thought is to set it to ANY.

You can assume CustomToEnumerable to be an Enforcer operator, like Sort, 
Exchange.
Sort only changes data collation, Exchange changes data distribution and 
collation, similarly 
CustomToEnumerable only change convention, but retains collation and 
distribution, I assume.
But in practice, it should be decided by the operator inventor and the 
underlying physical
implementation.

Hope that answers your question. Feel free to ask if you have more questions.

Thanks,
Haisheng Yuan

On 2021/03/27 08:43:15, Vladimir Ozerov  wrote: 
> Hi,
> 
> Apache Calcite supports heterogeneous optimization when nodes may have
> different conventions. The Enumerable rules propagate all traits from
> inputs. We have doubts whether this is correct or not.
> 
> Consider the following initial plan, which was created by Apache Calcite
> after sql-to-rel conversion and invocation of TranslatableTable.toRel. The
> table is in the CUSTOM convention. In this convention, there is an
> additional Distribution trait that tracks which attribute is used for
> sharding. It could be either SHARDED or ANY. The latter is the default
> distribution value which is used when the distribution is unknown. Suppose
> that the table is distributed by the attribute $0.
> LogicalProject [convention=NONE,   distribution=ANY]
>   CustomTable  [convention=CUSTOM, distribution=SHARDED($0)]
> 
> Now suppose that we run VolcanoPlanner with two rules: EnumerableProjectRule
> and converter rules that translate the CUSTOM node to ENUMERABLE node.
> First, the EnumerableProjectRule is executed. This rule propagates traits
> from the input, replacing only convention. Notice, how it propagated the
> distribution trait.
> EnumerableProject [convention=ENUMERABLE, distribution=SHARDED($0)]
>   CustomTable [convention=CUSTOM, distribution=SHARDED($0)]
> 
> Next, the converter will be invoked, yielding the following final plan:
> EnumerableProject[convention=ENUMERABLE, distribution=SHARDED($0)]
>   CustomToEnumerable [convention=ENUMERABLE, distribution=???]
> CustomTable  [convention=CUSTOM, distribution=SHARDED($0)]
> 
> There are two problems here. First, the project operator potentially
> destroys any trait which depends on column order, such as distribution or
> collation. Therefore, EnumerableProject has an incorrect value of the
> distribution trait.
> Second, which distribution should I assign to the CustomToEnumerable node?
> As I know that parent convention cannot handle the distribution properly,
> my natural thought is to set it to ANY. However, at least in the top-down
> optimizer, this will lead to CannotPlanException, unless I declare that [ANY
> satisfies SHARDED($0)], which is not the case: ANY is unknown distribution,
> so all distribution satisfies ANY, but not vice versa.
> 
> My question is - shouldn't we ensure that only the collation trait is
> propagated from child nodes in Enumerable rules? For example, in the
> EnumerableProjectRule instead of doing:
> input.getTraitSet()
>   .replace(EnumerableConvention.INSTANCE)
>   .replace()
> 
> we may do:
> RelOptCluster.traitSet().
>   .replace(EnumerableConvention.INSTANCE)
>   .replace()
> 
> This would ensure that all other traits are set to the default value. The
> generalization of this idea is that every convention has a set of supported
> traits. Every unsupported trait should be set to the default value.
> 
> I would appreciate your feedback on the matter.
> 
> Regards,
> Vladimir.
> 


Re: [DISCUSS] Draft board report for Apr 2021

2021-04-18 Thread Haisheng Yuan
Thanks a lot for your comments, Francis and Stamatis. I will delete the meeting 
of Apr 26th.

> I second Stamatis' suggestion to include the more detailed description 
> of the project that we used previously.
Do you mean "commits by non-committers" and "top-5 reviewers"? Do you know the 
tool or query that I can use to collect these metrics?

Thanks!
Haisheng Yuan

On 2021/04/18 22:50:54, Francis Chuang  wrote: 
> +1 looks great!
> I second Stamatis' suggestion to include the more detailed description 
> of the project that we used previously.
> 
> On 19/04/2021 8:10 am, Stamatis Zampetakis wrote:
> > Thanks for putting this together Haisheng. I have only a few small comments.
> > 
> > I like the more extended description that we had in previous reports. I
> > would keep the same.
> > 
> > The meeting of Apr 26th should be on the next report for Q2, not this one.
> > 
> > Usually the board asks for some insights apart from numbers in community
> > health section. I guess it's a bit late to include them right now but worth
> > keeping it in mind for the next report.
> > 
> > Best,
> > Stamatis
> > 
> > On Sat, Apr 17, 2021 at 12:53 AM Haisheng Yuan  wrote:
> > 
> >> Attached below is a draft of this month's board report. Please let me know
> >> if you have any additions or corrections.
> >>
> >> ## Description:
> >> The mission of Calcite is the creation and maintenance of software related
> >> to
> >> Dynamic data management framework
> >>
> >> ## Issues:
> >> There are no issues requiring board attention.
> >>
> >> ## Membership Data:
> >> Apache Calcite was founded 2015-10-21 (5 years ago)
> >> There are currently 52 committers and 23 PMC members in this project.
> >> The Committer-to-PMC ratio is roughly 7:3.
> >>
> >> Community changes, past quarter:
> >> - No new PMC members. Last addition was Ruben Q L on 2020-08-09.
> >> - Liya Fan was added as committer on 2021-02-09.
> >>
> >> ## Project Activity:
> >> Avatica 1.18 is being released.
> >> Calcite 1.27.0 will be released after Avatica 1.18.
> >>
> >> On Jan 20th, we had an online meetup for Calcite community with
> >> presentations covering Calcite concepts, recent work on streams, spatial
> >> query
> >> implementation.
> >>
> >> On Apr 26th, there will be online meeting to discuss the proposal of
> >> extending
> >> Calcite to an incremental query optimizer.
> >>
> >> ## Community Health:
> >> The overall activity in the community has slightly decreased in the past
> >> few
> >> months without this being worrisome.
> >>
> >> 117 JIRA tickets created and 55 JIRA tickets closed/resolved in the last 3
> >> months, and 80 commits in the past quarter, slight decrease comparing with
> >> last quarter.
> >>
> >> Thanks,
> >> Haisheng Yuan
> >>
> > 
> 


[DISCUSS] Draft board report for Apr 2021

2021-04-16 Thread Haisheng Yuan
Attached below is a draft of this month's board report. Please let me know if 
you have any additions or corrections.

## Description:
The mission of Calcite is the creation and maintenance of software related to 
Dynamic data management framework

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Calcite was founded 2015-10-21 (5 years ago)
There are currently 52 committers and 23 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:3.

Community changes, past quarter:
- No new PMC members. Last addition was Ruben Q L on 2020-08-09.
- Liya Fan was added as committer on 2021-02-09.

## Project Activity:
Avatica 1.18 is being released.
Calcite 1.27.0 will be released after Avatica 1.18.

On Jan 20th, we had an online meetup for Calcite community with
presentations covering Calcite concepts, recent work on streams, spatial query
implementation.

On Apr 26th, there will be online meeting to discuss the proposal of extending 
Calcite to an incremental query optimizer.

## Community Health:
The overall activity in the community has slightly decreased in the past few
months without this being worrisome.

117 JIRA tickets created and 55 JIRA tickets closed/resolved in the last 3
months, and 80 commits in the past quarter, slight decrease comparing with last 
quarter.

Thanks,
Haisheng Yuan


Re: [DISCUSS] Draft board report for Jan 2021

2021-01-12 Thread Haisheng Yuan
Stamatis,
Thanks a lot for the numbers you collected, I will add them into the report.

Julian,
Thanks for the remind, I will add the mention of new chair and our tradition of 
annual rotation.

Haisheng

On 2021/01/12 23:29:05, Stamatis Zampetakis  wrote: 
> Thanks for putting this together Haisheng, looks good. Some minor comments:
> 
> Project activity:  possibly mention the usage of Calcite in LinkedIn and
> the Coral project
> Community health: include the new metrics that we discussed during the
> previous board report
> 
> If I didn't do any mistake the numbers/names for the period between
> 2020-10-01 and 2021-01-01 are as follows:
> 
> "commits by non-committers" (used the query provided by Vladimir)
> +--+++-+
> | quarter_date | committers | non_committers |total|
> +--+++-+
> | 2020-10-01   | 118| 36 | 154 |
> 
> "top-5 reviewers"
> +-+---+
> |   reviews   | committer |
> +-+---+
> | 7   | Julian Hyde |
> | 7   | Stamatis Zampetakis |
> | 6   | Chunwei Lei |
> | 6   | Ruben Quesada Lopez |
> | 4   | Danny Chan |
> | 4   | Wang Yanlin |
> 
> Best,
> Stamatis
> 
> On Wed, Jan 13, 2021 at 12:23 AM Francis Chuang 
> wrote:
> 
> > +1 Thanks for sorting this out, Haisheng!
> >
> > Francis
> >
> > On 13/01/2021 10:20 am, Julian Hyde wrote:
> > > Looks good. I'd add a mention of the new chair, and the fact that we
> > > are continuing our tradition of annual rotation. (And our annual
> > > tradition of talking about the tradition, per
> > > https://whimsy.apache.org/board/minutes/Calcite.html#2020-01-15.)
> > >
> > > On Tue, Jan 12, 2021 at 2:01 PM Haisheng Yuan  wrote:
> > >>
> > >> Attached below is a draft of this month's board report. I plan to
> > submit it on
> > >> Jan 13 (Sorry for the late email). Please let me know if you have any
> > additions or corrections.
> > >>
> > >> ## Description:
> > >> Apache Calcite is a highly customizable framework for parsing and
> > planning
> > >> queries on data in a wide variety of formats. It allows database-like
> > access,
> > >> and in particular a SQL interface and advanced query optimization, for
> > data not
> > >> residing in a traditional database.
> > >>
> > >> Avatica is a sub-project within Calcite and provides a framework for
> > building
> > >> local and remote JDBC and ODBC database drivers. Avatica has an
> > independent
> > >> release schedule and its own repository.
> > >>
> > >> ## Issues:
> > >> There are no issues requiring board attention.
> > >>
> > >> ## Membership Data:
> > >> There are currently 51 committers and 23 PMC members in
> > >> this project.  The Committer-to-PMC ratio is roughly 7:3.
> > >> No new committers or PMC members were added in the last 3 months.
> > >>
> > >> ## Project Activity:
> > >> Calcite 1.26.0 was released on 2020-10-06, with important features such
> > as
> > >> supporting SEARCH operator and Sarg literal, adding PIVOT operator in
> > SQL,
> > >> adding support for JDK 15 and Guava version 29.0-jre.
> > >>
> > >> On Jan 20th, there will be online meetup for Calcite community with
> > >> presentations covering Calcite concepts, recent work on streams,
> > spatial query
> > >> implementation.
> > >>
> > >> ## Community Health:
> > >> The overall activity in the community has slightly decreased in the
> > past few
> > >> months without this being worrisome.
> > >>
> > >> 153 JIRA tickets created and 94 JIRA tickets closed/resolved in the
> > last 3
> > >> months, 20% decrease comparing with last quarter.
> > >>
> > >> The fewer number of design discussions and JIRA activities can be
> > explained by
> > >> the holiday season and low number of active committers.
> > >>
> > >>
> >
> 


Re: [ANNOUNCE] New Calcite PMC chair: Haisheng Yuan

2020-12-21 Thread Haisheng Yuan
Thanks everyone.
It is my great honor to be appointed to serve as the community's PMC chair, and 
thanks Stamatis for your hard work and contribution you have done for the 
Calcite community.

Regards,
Haisheng Yuan

On 2020/12/18 09:07:07, Ruben Q L  wrote: 
> Congratulations Haisheng!
> Thanks for your work Stamatis!
> 
> 
> On Fri, Dec 18, 2020 at 8:17 AM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
> 
> > Thanks Stamatis for your hard work and dedication, and congratulations to
> > Haisheng for this appointment!
> >
> > Best regards,
> > Alessandro
> >
> > On Fri, 18 Dec 2020 at 07:31, Xin Wang  wrote:
> >
> > > Congrats Haisheng! Thanks for your work, Stamatis!
> > >
> > >
> > > Fan Liya  于2020年12月18日周五 下午12:08写道:
> > >
> > > > Congratulations, Haisheng!
> > > > Looking forward to your great work in the coming year!
> > > >
> > > > Stamatis, thanks for your great work in the past year!
> > > >
> > > > Best,
> > > > Liya Fan
> > > >
> > > >
> > > > On Fri, Dec 18, 2020 at 11:11 AM Feng Zhu 
> > wrote:
> > > >
> > > > > Thanks for your work and effort, Stamatis!
> > > > > Congratulations, Haisheng!
> > > > >
> > > > > Stamatis Zampetakis  于2020年12月17日周四 下午9:49写道:
> > > > >
> > > > > > Calcite community members,
> > > > > >
> > > > > > I am pleased to announce that we have a new PMC chair and VP as per
> > > our
> > > > > > tradition of rotating the chair once a year. I have resigned, and
> > > > > > Haisheng was duly elected by the PMC and approved unanimously by
> > the
> > > > > Board.
> > > > > >
> > > > > > Please join me in congratulating Haisheng!
> > > > > >
> > > > > > Best,
> > > > > > Stamatis
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Thanks,
> > > Xin
> > >
> >
> 


Re: [CALCITE-4431] Use requireNonNull(var, "var") instead of requireNonNull(var) for better error messages

2020-12-09 Thread Haisheng Yuan
Makes sense. 

On 2020/12/09 22:44:35, Stamatis Zampetakis  wrote: 
> Sounds reasonable and shares some goals with JEP358 [1] so why not.
> 
> [1] https://openjdk.java.net/jeps/358
> 
> On Wed, Dec 9, 2020 at 8:26 AM Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
> 
> > Hi,
> >
> > I suggest we use requireNonNull(var, "var") pattern rather
> > than requireNonNull(var).
> >
> > Then error message would include which variable/field turned out to be
> > null.
> > It would make a big difference in case a constructor verifies multiple
> > parameters like in Correlate:
> >
> >   protected Correlate(
> >   RelOptCluster cluster,
> >   RelTraitSet traitSet,
> >   RelNode left,
> >   RelNode right,
> >   CorrelationId correlationId,
> >   ImmutableBitSet requiredColumns,
> >   JoinRelType joinType) {
> > super(cluster, traitSet, left, right);
> > assert !joinType.generatesNullsOnLeft() : "Correlate has invalid join
> > type " + joinType;
> > this.joinType = requireNonNull(joinType);
> > this.correlationId = requireNonNull(correlationId);
> > this.requiredColumns = requireNonNull(requiredColumns);
> >
> > If a user makes a mistake and passes null to joinType or correlationId,
> > then Calcite would throw NPE, however, it would be hard to tell which
> > parameter was null.
> > Line numbers are known to drift over time.
> >
> > I suggest we use the following pattern consistently:
> >
> > this.joinType = requireNonNull(joinType, "joinType");
> > this.correlationId = requireNonNull(correlationId, "correlationId");
> > this.requiredColumns = requireNonNull(requiredColumns,
> > "requiredColumns");
> >
> > The PR is https://github.com/apache/calcite/pull/2293
> >
> > The suggested change automatically aligns single-word message with the
> > first argument to requireNonNull.
> > In other words, "./gradlew style" command would insert/fix the messages
> > automatically.
> > I assume modern IDEs enable developers to configure var =>
> > requireNonNull(var, "var") expansion (I use it in IDEA).
> >
> > Currently we have
> > 545 usages of requireNonNull(var) <-- this is to be updated to (var, "var")
> > 359 usages of requireNonNull(var, "var")
> > 37 usages of requireNonNull(var, () -> ..)
> > 33 usages of requireNonNull(var, "custom message")
> >
> > If no-one objects within three days, I'll assume lazy consensus and commit
> > it.
> >
> > Vladimir
> >
> 


Re: Decouple core from linq4j and Avatica

2020-11-24 Thread Haisheng Yuan
> I would like to propose to decouple the "core" module from "ling4j" and 
> Avatica.
I like the idea.

Moving Enumerable out of core may be time consuming and disruptive, because 
many core tests are using Enumerable to verify plan quality and correctness.

Best,
Haisheng

On 2020/11/24 23:42:19, Stamatis Zampetakis  wrote: 
> Hi Vladimir,
> 
> Personally, I like the idea.
> I had similar thoughts in the past but it didn't try to break it down since
> those dependencies were not causing any real trouble.
> 
> Let's see what the others think.
> 
> Best,
> Stamatis
> 
> 
> On Tue, Nov 24, 2020 at 7:30 PM Vladimir Ozerov  wrote:
> 
> > Hi colleagues,
> >
> > Many Calcite integrations use only part of the framework. Specifically, it
> > is common to use only the parser/optimizer part. JDBC and runtime are used
> > less frequently because they are not very well suited for mature processing
> > engines (e.g. Enumerable runs out of memory easily).
> >
> > However, in order to use the parser/optimizer from the core module, you
> > also need to add "linq4j" and Avatica modules to the classpath, which is
> > not convenient - why to include modules, that you do not use?
> >
> > It turns out that most of the dependencies are indeed leaky abstractions,
> > that could be decoupled easily. For example, the RelOptUtil class from the
> > "core" depends on ... two string constants from the Avatica module.
> >
> > I would like to propose to decouple the "core" module from "ling4j" and
> > Avatica. For example, we may introduce the new "common" module, that will
> > hold common constants, utility classes, and interfaces (e.g. Meta). Then,
> > we can organize the dependencies like this:
> > common -> core
> > common -> linq4j
> > common -> Avatica
> >
> > Finally, we may shade and relocate the "common" module into the "core"
> > during the build. In the end, we will have -2 runtime dependencies with
> > relatively little effort. In principle, the same approach could be applied
> > to Janino and Jackson dependencies, but it could be more complex, so my
> > proposal is only about "linq4" and Avatica.
> >
> > How do you feel about it? Does this proposal sense to the community? If
> > yes, I can try implementing the POC for this.
> >
> > Regards,
> > Vladimir.
> >
> 


Re: [DISCUSS] Does anybody think this is debuging unfriendly: "call.transformTo(relBuilder.build())"

2020-11-23 Thread Haisheng Yuan
Agree with Jiatao, I had the same experience and feeling. But it mainly depends 
on the rule creator's preference.

On 2020/11/23 02:42:21, Danny Chan  wrote: 
> I kind of agree, but it's more like a programming specification, we can
> tell people how to write codes but they may not follow those rules.
> 
> JiaTao Tao  于2020年11月22日周日 下午5:27写道:
> 
> > Why I don't want to debug into "transformTo":
> >
> > 1. It's a common method, if you directly stop here, every rule will stop,
> > or you must stop the specific rule, then step into this method call, it's
> > one more step.
> > 2. There are many contexts in the rule, if you debug into "transformTo",
> > you have to go back to see these.
> >
> >
> > Regards!
> >
> > Aron Tao
> >
> >
> > JiaTao Tao  于2020年11月22日周日 下午5:23写道:
> >
> > > Hi
> > > I've been developed Calcite full time for a quite long time, and I ofter
> > > debug in the rule to see the transformations, but code like this is not
> > > debuging friendly in my opinion:  "call.transformTo(relBuilder.build())"
> > >
> > > I want to see the relBuilder.build()'s result, I have to debug into the
> > > "transformTo" method(you can not evaluate "relBuilder.build()" cuz it's a
> > > stack), if we split this into two lines, we can just stop at the last
> > link:
> > >
> > > RelNode ret = relBuilder.build()
> > > call.transformTo(ret)
> > >
> > > It's not a big deal, but every time I occur this, it has poor
> > experience, hope
> > > to hear the community's opinion.
> > >
> > > Regards!
> > >
> > > Aron Tao
> > >
> >
> 


Re: [DISCUSS] State of the project 2020

2020-11-20 Thread Haisheng Yuan
Thank you for serving as Calcite PMC chair, you are a great help for Calcite 
community.

I feel more than happy and honored to be nominated as candidate of chair for 
next year, I will do my best and continue serving the community if I were 
selected.

Thanks,
Haisheng Yuan

On 2020/11/18 18:53:08, Julian Hyde  wrote: 
> I think Haisheng is a great choice for chair, if he is willing.
> 
> Thanks for serving as chair this year, Stamatis. You've done a great
> job keeping the community civilized and on track.
> 
> Julian
> 
> On Sat, Nov 14, 2020 at 10:23 PM Forward Xu  wrote:
> >
> > Hi Stamatis,
> >
> > Thanks for your great work.
> >
> >
> > Haisheng, Danny, and Ruben are all excellent. I believe they can do the
> > work.
> >
> > This time I will vote for Haisheng first.
> >
> >
> > Finally, +1 on Haisheng being our next PMC chair.
> >
> > Best,
> > Forward
> >
> >
> >
> > Stamatis Zampetakis  于2020年11月15日周日 上午7:15写道:
> >
> > > Thanks to all those who shared their thoughts so far!
> > >
> > > Most of the suggestions around reviews and PRs sound promising. 
> > > Summarizing
> > > we have:
> > > * Adding component owners for reviews;
> > > * Cleaning up stale PRs;
> > > * Freezing master till every JIRA marked for the next release is reviewed;
> > >
> > > Having component owners has been discussed in the past [1, 2] but
> > > the general feeling was against.
> > > I don't know if now things are different but we could definitely follow-up
> > > in a separate discussion if people are interested.
> > >
> > > Stale PRs is an issue and becoming stale is what is more problematic. If 
> > > we
> > > don't review PRs in a timely manner then
> > > there is a high chance that the contributor will lose interest and the PR
> > > will never merge. In the past [3], we discussed the possibility of adding 
> > > a
> > > bot
> > > for helping us in this task but the idea was abandoned in the end. If done
> > > right, I think it can be useful. For instance, sending a reminder on our
> > > dev list
> > > if the PR does not have any activity for a certain time could help us 
> > > maybe
> > > prioritize this over others.
> > >
> > > One simple thing that we can do without too much discussion is active
> > > committers to assign a number 5/10 PRs to themselves and we make sure
> > > that they are resolved (merged or closed) for the next release.
> > >
> > > Regarding the role of the PMC chair, there were three names mentioned so
> > > far: Haisheng, Danny, and Ruben. I believe in all of them!
> > > Depending on the interest from their side and the feedback from the rest 
> > > of
> > > the community we will proceed to a vote.
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1]
> > >
> > > https://lists.apache.org/thread.html/3b1f43b15306b2f1496fd73bda01dcb3a1dba08747b1267e1b5f8187%40%3Cdev.calcite.apache.org%3E
> > > [2]
> > >
> > > https://lists.apache.org/thread.html/7231a09fc9d1bad52c229664e8cb2a3631314b34464c2df10c594745%40%3Cdev.calcite.apache.org%3E
> > > [3]
> > >
> > > https://lists.apache.org/thread.html/e794ec7f36246ec918a409b80759efc30d5433eea08ee564d4ba0c67%40%3Cdev.calcite.apache.org%3E
> > >
> > >
> > > On Wed, Nov 11, 2020 at 10:19 PM Rui Wang  wrote:
> > >
> > > > Thanks Stamatis for serving as the PMC chair for Calcite in 2020!
> > > >
> > > > One of the highlights I want to give for the PR reviewing/merging area
> > > was
> > > > the practice we started to adopt in recent releases: we started to check
> > > > each JIRA marked with the right release version and see whether there is
> > > a
> > > > PR in a good shape that can be merged. And we have built a contract that
> > > > unless those promising PR are merged, we won't unlock the Calcite
> > > master. I
> > > > feel like this practice has proven to be useful (and of course thanks 
> > > > for
> > > > release managers who have been working on this!).
> > > >
> > > >
> > > > Finally, +1 on Haisheng being our next PMC chair.
> > > >
> > > >
> > > > -Rui
> > > >
> > > > On Wed, Nov 11, 2020 at 1:57 AM Ruben Q L  wrote:
> > > >
> > > > >  Hello,
> > > > >

Re: Search/Sarg: untested feature merged to the default branch

2020-11-09 Thread Haisheng Yuan
I have the same feeling with Stamatis.

I completely understand that Vladimir is trying to keep it working correctly. 
However, I know some downstream project already updated Calcite version and 
changed their code to adapt to the Sarg/SEARCH operator. Reverting it and in 
case we failed to fix the regressions you mentioned, if we release the new 
version without getting it back, there will be another major breaking change 
again. If that is the case, I would rather push for the fixes instead of revert 
before we can release next version.

Thanks,
Haisheng Yuan

On 2020/11/09 08:32:20, Vladimir Sitnikov  wrote: 
> Danny>-1 for the revert, we should fix the issues we encountered instead of
> Danny>reverting the code brainless for a whole release.
> 
> Hi Danny, did you just veto the code change? (see
> https://www.apache.org/foundation/voting.html#Veto )
> I am afraid I fail to find a technical justification behind the veto.
> 
> Danny>So, logger the fail cases with issues and fix them, that is the way
> to go.
> 
> Thanks for the suggestion. I would tentatively decline it as it's going to
> be difficult to find time to do this.
> 
> Vladimir
> 


Re: [DISCUSS] Make SqlNodeList implement ListSqlNode>

2020-11-04 Thread Haisheng Yuan
The performance gain depends on the size of sqlnode list.
I am in favor of making SqlNodeList implement List.
+1 to the proposal.

Haisheng Yuan

On 2020/11/05 02:32:01, Chunwei Lei  wrote: 
> Thank you for raising this, Julian.
> 
> How much performance improvement can we get?
> 
> 
> 
> Best,
> Chunwei
> 
> 
> On Thu, Nov 5, 2020 at 5:32 AM Julian Hyde  wrote:
> 
> > Currently class SqlNodeList [1] implements Iterable but it
> > does not implement List. How do people feel about doing that?
> > (Or Collection?)
> >
> > The main potential benefit is performance. Consider this code:
> >
> >   SqlNodeList nodeList;
> >   ImmutableList immutableList1 = ImmutableList.copyOf(nodeList);
> >
> >   List list = nodeList.toList();
> >   ImmutableList immutableList2 = ImmutableList.copyOf(list);
> >
> > Today, the second form is faster, because ImmutableList.copyOf can
> > call size() and preallocate a list of the right size.
> >
> > The second benefit is that we can remove calls to '.toList()'.
> >
> > The downside is that a few locations are overloaded. For example,
> > class Span has overloaded methods
> >
> >   static Span of(SqlNode node);
> >   static Span of(Collection nodes)
> >
> > If SqlNodeList implements List, then the code
> >
> >   SqlNodeList nodeList;
> >   Span s = Span.of(nodeList);
> >
> > becomes ambiguous. We can disambiguate by adding Span.of(SqlNodeList).
> > But there may be other locations in client code that cannot be
> > disambiguated.
> >
> > Julian
> >
> > [1]
> > https://github.com/apache/calcite/blob/155276591288615c4d02d55fb7d77eceb2e24b2d/core/src/main/java/org/apache/calcite/sql/SqlNodeList.java#L38
> >
> 


Re: Questions about `order by column`

2020-11-03 Thread Haisheng Yuan
>  If `order by` is
> not pushed down to the underlying database, then how does Calcite handle
> this operator? 

It will generate a plan with sort operator on top of the data source.

> Will it read all data provided by the underlying database
> and enumerate it to sort?
I believe so.

> If so, how to guarantee the performance while a
> large amount of data?
It is not guaranteed. So better to push down sort into data source, if not, you 
need to teach the Calcite-based optimizer to push it down.


On 2020/11/03 03:05:10, Xiangwei Wei  wrote: 
> Hi all,
> 
> I'd like to ask some questions about the Calcite adapter. If `order by` is
> not pushed down to the underlying database, then how does Calcite handle
> this operator? Will it read all data provided by the underlying database
> and enumerate it to sort? If so, how to guarantee the performance while a
> large amount of data ?
> 
> Sorry to trouble you guys. But look forward to hearing from you :D
> 
> -- 
> Best,
> Xiangwei Wei
> 


Re: A question regarding querying Google Cloud BigTable or Spanner through Apache Calcite

2020-11-03 Thread Haisheng Yuan
Hi Jason,

Absolutely it is.

On 2020/11/03 20:53:08, Jason Chen  wrote: 
> Hey,
> 
> I am Jason Chen from Shopify Data Science and Engineering team. I have a few 
> questions regarding the Apache Calcite, and I am not sure if the Apache 
> Calcite fits our use cases. Feel free to point me to the correct email or 
> Slack channel if this email is not the correct one for asking questions.
> 
> We are exploring the approaches to do mixed querying across multiple storage 
> resources. One use cases is doing the “JOIN” in query time of query results 
> from both Druid and BigTable/Spanner. Is this a good use case for Apache 
> Calcite?
> 
> Thank you for any help!
> 
> Regards,
> Jason Chen
> 
> 
> Jason (Jianbin) Chen
> Senior Data Developer
> p: +1 2066608351 | e: jason.c...@shopify.com
> a: 234 Laurier Ave W Ottawa, ON K1N 5X8
> 


Re: Draft board report for October 2020

2020-10-05 Thread Haisheng Yuan
Looks good to me, thanks!

- Haisheng

--
发件人:Stamatis Zampetakis
日 期:2020年10月06日 06:04:06
收件人:
主 题:Draft board report for October 2020

Attached below is a draft of this month's board report. I plan to submit it
on October 7.
Please let me know if you have any additions or corrections.

## Description:
Apache Calcite is a highly customizable framework for parsing and planning
queries on data in a wide variety of formats. It allows database-like
access,
and in particular a SQL interface and advanced query optimization, for data
not
residing in a traditional database.

Avatica is a sub-project within Calcite and provides a framework for
building
local and remote JDBC and ODBC database drivers. Avatica has an independent
release schedule and its own repository.

## Issues:
There are no issues requiring board attention.

## Membership Data:
There are currently 51 committers and 23 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:3.

Community changes, past quarter:
- Ruben Quesada Lopez was added to the PMC on 2020-08-10
- Rui Wang was added as committer on 2020-09-07

## Project Activity:
Avatica Go 5.0.0 was released on 2020-07-16. It is a major release of
Avatica
Go with a number of improvements and a breaking change affecting connection
metadata. Worth mentioning the support for batching query string parameters
in
the DSN, allowing updates to the server to be executed once Close() is
called
on the prepared statement.

Calcite 1.24.0 was released on 2020-07-24, including more than 80 resolved
issues. Among those it’s worth highlighting some new features of the
optimizer
for more efficient search space pruning as well as the support of a new SQL
dialect for Presto.

Calcite 1.25.0 was released on 2020-08-22, with fewer but important features
improving the parameterization of optimizer rules, adding support for new
spatial functions, and interval expressions. The release also introduced a
few
breaking changes but so far the users seem to have embraced the changes.

Members of the community gave talks at ApacheCon 2020, bringing to the
surface
Calcite through discussions about other projects. Although we didn’t have
talks dedicated to Calcite, it is nice to receive mentions and witness the
adoption of Calcite by other open source projects and people in Academia.

## Community Health:
The overall activity in the community has slightly decreased in the past few
months without this being worrisome. Releases occur often, the community is
growing, and most of the time users' questions do not remain unanswered.

The design discussions were fewer in the past quarter and people leading
these
efforts had quickly led them to consensus. As a result, we didn't have
lengthy
debates thus it's normal to see the activity of the dev@ and issues@ lists
slightly decreased (5% and 9% accordingly).

The number of closed issues and pull requests decreased by 19% which can be
explained by the low number of active committers (~12 during this period)
out
of which the majority pushed mostly individual contributions. A small
decrease
in closed issues can be attributed to those opened after applying static
code
analysis frameworks on the project that remain as tasks for the next
versions.



Re: [ANNOUNCE] New committer: Rui Wang

2020-09-10 Thread Haisheng Yuan
Congratulations, Rui!
Thank you for your work on the new planner, well deserved.
Keep up the good work.

Haisheng Yuan

On 2020/09/10 20:14:29, Michael Mior  wrote: 
> Congratulations Rui!
> 
> --
> Michael Mior
> mm...@apache.org
> 
> Le mer. 9 sept. 2020 à 17:51, Stamatis Zampetakis  a écrit 
> :
> >
> > Apache Calcite's Project Management Committee (PMC) has invited Rui Wang to
> > become a committer, and we are pleased to announce that he has accepted.
> >
> > Rui has checked in a lot of high quality patches in the project including
> > many nice
> > features around streams and SQL as well as improvements in the trait
> > propagation
> > framework. Apart from code contributions, Rui provides valuable help to the
> > community by doing reviews and answering questions in the devlist.
> >
> > Rui, welcome, thank you for your contributions, and we look forward to your
> > further interactions with the community! If you wish, please feel free to
> > tell
> > us more about yourself and what you are working on.
> >
> > Stamatis (on behalf of the Apache Calcite PMC)
> 


Re: Fwd: [ANNOUNCE] Apache Calcite 1.25.0 released

2020-08-23 Thread Haisheng Yuan
Thanks for reminding, Craig.

There is a JIRA ticket tracking this issue, we will do it soon.

Thanks,
Haisheng Yuan

On 2020/08/23 21:36:19, Craig Russell  wrote: 
> Hi,
> 
> I've approved (moderated) this announcement, but please consider changing 
> your download page.
> 
> 1. Don't include action=download in your links to the mirror selection page. 
> This makes it impossible for the moderators to verify that the mirror 
> selection is working.
> 2. Use the "new" link downloads.apache.org/calcite/ 
> <http://downloads.apache.org/calcite/> instead of www.apache.org/dist/calcite 
> <http://www.apache.org/dist/calcite> for your checksums and signatures.
> 
> Thanks,
> Craig 
> 
> > Begin forwarded message:
> > 
> > From: Andrei Sereda 
> > Subject: [ANNOUNCE] Apache Calcite 1.25.0 released
> > Date: August 23, 2020 at 1:19:33 PM PDT
> > To: annou...@apache.org
> > 
> > The Apache Calcite team is pleased to announce the release of Apache 
> > Calcite 1.25.0.
> > 
> > Calcite is a dynamic data management framework. Its cost-based optimizer 
> > converts queries, represented in relational algebra, into executable plans. 
> > Calcite supports many front-end languages and back-end data engines, and 
> > includes an SQL parser and, as a sub-project, the Avatica JDBC driver.
> > 
> > This release comes one month after 1.24.0 and removes methods deprecated in 
> > previous versions. In addition, it adds support for spatial functions and 
> > SQL interval expressions.
> > 
> > You can start using it in Maven by simply updating your dependency to:
> > 
> >   
> > org.apache.calcite
> > calcite-core
> > 1.25.0
> >   
> > 
> > If you'd like to download the source release, you can find it here:
> > 
> >   https://calcite.apache.org/downloads/ 
> > <https://calcite.apache.org/downloads/>
> > 
> > You can read more about the release (including release notes) here:
> > 
> >   https://calcite.apache.org/news/2020/08/22/release-1.25.0/ 
> > <https://calcite.apache.org/news/2020/08/22/release-1.25.0/>
> > 
> > We welcome your help and feedback. For more information on how to report 
> > problems, and to get involved, visit the project website at:
> > 
> >https://calcite.apache.org/ <https://calcite.apache.org/>
> > 
> > Thanks to everyone involved!
> > 
> > Andrei Sereda, on behalf of the Apache Calcite Team
> 
> Craig L Russell
> c...@apache.org
> 
> 


Re: [DISCUSS] Release Managers

2020-08-13 Thread Haisheng Yuan
Thanks for volunteering, Ruben! I think you can be the release manager for 
1.26.0.

We still need 2 more volunteers for the next 2 versions.

Thanks,
Haisheng

On 2020/07/25 15:24:43, Ruben Q L  wrote: 
> Hi,
> 
> I can volunteer for one of them, for example 1.26.0
> 
> Best regards,
> Ruben
> 
> 
> Le sam. 25 juil. 2020 à 15:23, Haisheng Yuan  a écrit :
> 
> > Hi,
> >
> > Would anyone be interested in being release manager for v1.26.0, v1.27.0
> > or v1.28.0?
> > We need 3 volunteers (must be PMC or committer) for these 3 versions.
> >
> > Thanks,
> > Haisheng Yuan
> >
> 


Re: [VOTE] Release apache-calcite-1.25.0 (release candidate 0)

2020-08-13 Thread Haisheng Yuan
>>>
> >>> > >>>> Hi All,
> >>> > >>>>
> >>> > >>>> I plan to close the vote tomorrow (Aug 12th). If you still want to
> >>> > >>>> validate RC0 please do so before Wednesday.
> >>> > >>>>
> >>> > >>>> Regards,
> >>> > >>>> Andrei.
> >>> > >>>>
> >>> > >>>> On Tue, Aug 11, 2020 at 1:13 PM Andrei Sereda 
> >>> > >> wrote:
> >>> > >>>>
> >>> > >>>>> Thanks, Julian, for the hint. I'll update the KEYS file.
> >>> > >>>>>
> >>> > >>>>> On Tue, Aug 11, 2020 at 11:59 AM Julian Hyde <
> >>> jhyde.apa...@gmail.com
> >>> > >
> >>> > >>>>> wrote:
> >>> > >>>>>
> >>> > >>>>>> Andrei,
> >>> > >>>>>>
> >>> > >>>>>> Your key’s signature appears in KEYS because you signed
> >>> Stamatis’s
> >>> > >> key.
> >>> > >>>>>> But your key isn’t actually defined in the file. After I
> >>> imported
> >>> > the
> >>> > >>> file,
> >>> > >>>>>> your key was still ‘unknown’ according to gpg.
> >>> > >>>>>>
> >>> > >>>>>> Julian
> >>> > >>>>>>
> >>> > >>>>>>> On Aug 11, 2020, at 8:04 AM, Andrei Sereda 
> >>> > >> wrote:
> >>> > >>>>>>>
> >>> > >>>>>>> 
> >>> > >>>>>>>>
> >>> > >>>>>>>> * Andrei, I don’t think your key is in KEYS. Be sure to add it
> >>> > >>> before
> >>> > >>>>>> the
> >>> > >>>>>>> release announcement.
> >>> > >>>>>>>
> >>> > >>>>>>> I see my signing key in KEYS
> >>> > >>>>>>> $ curl -s
> >>> https://dist.apache.org/repos/dist/release/calcite/KEYS
> >>> > >> |
> >>> > >>>>>> grep
> >>> > >>>>>>> sereda
> >>> > >>>>>>> sig  C41CFDDFED34C028 2019-08-19  Andrei Sereda (CODE
> >>> > >> SIGNING
> >>> > >>>>>> KEY) <
> >>> > >>>>>>> ser...@apache.org>
> >>> > >>>>>>> sig  C41CFDDFED34C028 2019-08-19  Andrei Sereda (CODE
> >>> > >> SIGNING
> >>> > >>>>>> KEY) <
> >>> > >>>>>>> ser...@apache.org>
> >>> > >>>>>>>
> >>> > >>>>>>>> * There are several files in the source distro that are not in
> >>> > >> git:
> >>> > >>> an
> >>> > >>>>>>> empty ‘arrow’ directory, and a non-empty ‘licenses’ directory.
> >>> > >>>>>>> Yes you're right. I presume you're still OK with RC0 (judging
> >>> by
> >>> > >> your
> >>> > >>>>>> +1) ?
> >>> > >>>>>>>
> >>> > >>>>>>>
> >>> > >>>>>>>> On Tue, Aug 11, 2020 at 5:53 AM Julian Hyde <
> >>> > >> jhyde.apa...@gmail.com
> >>> > >>>>
> >>> > >>>>>> wrote:
> >>> > >>>>>>>>
> >>> > >>>>>>>> +1 (binding)
> >>> > >>>>>>>>
> >>> > >>>>>>>> Checked keys, hashes, LICENSE, NOTICE, README; compiled and
> >>> ran
> >>> > >>> tests
> >>> > >>>>>> on
> >>> > >>>>>>>> Ubuntu/JDK 14; ran RAT.
> >>> > >>>>>>>>
> >>> > >>>>>>>> * Andrei, I don’t think your key is in KEYS. Be sure to add it
> >>> > >>> 

Re: [ANNOUNCE] Ruben Quesada Lopez joins Calcite PMC

2020-08-11 Thread Haisheng Yuan
Congrats, Ruben!

On 2020/08/11 21:53:47, Stamatis Zampetakis  wrote: 
> I'm pleased to announce that Ruben has accepted an invitation to
> join the Calcite PMC. Ruben has been a consistent and helpful
> figure in the Calcite community for which we are very grateful. We
> look forward to the continued contributions and support.
> 
> Please join me in congratulating Ruben!
> 
> - Stamatis (on behalf of the Calcite PMC)
> 


Re: [DISCUSS] Sarg (search argument) to generalize and replace IN in RexCall

2020-08-11 Thread Haisheng Yuan
> I am proposing to use sargs only where we would today use RexCall(IN). The 
> data structures have about the same size. Sargs are sorted. I cannot see any 
> cases that would become more expensive.

IIRC, RexCall(IN) is not supported yet.

With sargs, you have to sort it, no matter explicitly or implicitly, in case 
20k values, it will take some time. I am not saying the data structure itself 
is expensive, but how it is used, it all depends on how downstream projects 
derive stats from it. I have seen Greenplum optimizer experienced performance 
issue for large IN list when deriving all the stats info from the thousands of 
disjoint range intervals. Some downstream project may not even use histogram. 
Some may just use the number of IN constants to derive the number of distinct 
values, I would imagine people have to construct the Sarg back to constant 
list, this is an extra burden. 

Suppose we have col in (1,2,4,6,8,10, 12,13, 15,17,19, 21,22, 24, 24+2.) 
10k values, now the range sets will be {[1,2], [4,4], [10,10], [12,13], 
[15,15][21,22], [24,24]}, in the execution engine, what if I want to do 
the look up for this expression? Find out the points, extract ranges and 
disjoint them, or infer points from integer ranges?

IMHO, the interval constraint is more like a kind of logical property and can 
be strategy for simplification, people can derive the interval constraints from 
the IN list, do simplification on it (but I doubt the value in practice, which 
usually brings more trouble than benefits), but we don't have to represent it 
with sarg in RexNode world.

> Requiring sargs to support geospatial operations seems an unreasonably high 
> bar. Many techniques that we use (sorting, hashing, sketches) do not support 
> geospatial.
> Areas that have not-totally-ordered data types tend to have their own 
> operators already. For your example I’d write ST_Intersects(col, 
> ST_Collect(area1, area2, area3)), and that would be a perfectly suitable 
> representation in RexNode-land.

I am not asking sargs to support geospatial operations. And geospatial 
intersect is just an example of potential customized operation, which are not 
limited to geospatials. It can be some other binary operations. We can't 
require all the customized operation to have something like ST_Collect. 

Oracle support this:
select * from foo where a > ANY(1, 2, 3, ., 999);
select * from foo where a <= ANY(b+1, b+2, b+3, ., 999);
Although in reality that kind of query might be rare, how would Calcite 
represent in Oracle dialect? And would it be good to support operations other 
than equal, like <, >=, or even customized operation? Especially when 
downstream projects may use calcite to build their own in-house system, some 
are not sql standard. 

BTW, what is your concern about the proposal of RexListCmp [1]?

Haisheng

[1] 
https://lists.apache.org/thread.html/ra36f256e3f86f78ee126455b3ba67dbcb7bd1862214886c65ffba432%40%3Cdev.calcite.apache.org%3E

On 2020/08/11 16:44:56, Julian Hyde  wrote: 
> Let’s bring this discussion back to the original question: for these kinds of 
> predicates, is Sarg a better data structure than IN?
> 
> It absolutely is. There are algorithms in RexSimplify that, for each term in 
> an OR list, simplify using all available predicates, and when they have 
> simplified that term, add it to the list of predicates. It is therefore 
> cartesian in the size of the OR list.
> 
> The sarg data structure converts many kinds of large OR-lists and predicate 
> sets into a single term. Because that term is immutable and based on sorted 
> ranges (or points) it can be handled efficiently (e.g. two sargs can be 
> intersected using a merge, rather than nested loops). That will make our 
> simplification process more efficient.
> 
> Whether we then add new classes of optimization is a discussion for another 
> day.
> 
> Julian
> 
> 
> > On Aug 10, 2020, at 1:13 PM, Vladimir Sitnikov 
> >  wrote:
> > 
> > Julian>I cannot see any cases that would become more expensive.
> > 
> > I mean the optimization passes might be complicated, not the storage of
> > sargs themselves.
> > For instance, CALCITE-4155 converts a in (1, 2, 3, 4, 5) to a >= 1 and a <=
> > 5
> > Is it a significant improvement?
> > Is between representation always better?
> > Does the case appear very often in practice?
> > 
> > However, the optimization does take time, so it looks like extra logic with
> > doubtful gains.
> > Of course, it is unlikely sargs would be a major time consumer, however, if
> > we keep adding tiny simplifications we might
> > end up with a condition where the planning is slow and we have no way to
> > fix it.
> > 
> > I'm ok with people updating RexSimplify (and 4155 looks like an innocent
> > feature), however, I think the current design is not really scalable
> > (e.g. it might process the same expression multiple times).
> > 
> > Vladimir
> 
> 


Re: [DISCUSS] Sarg (search argument) to generalize and replace IN in RexCall

2020-08-10 Thread Haisheng Yuan
I am against the proposal, for the following reasons:

1. It has to support range operations, not every type supports, especially for 
User Defined Types that are used in IN constant lists, they might only support 
equal operation.

2. The range optimization of ">2 AND <4" to "3” is only valid for integer-like 
types, but I don't think it will bring much gains. Optimizing  col in 
(1,2,3,4,5) to a >=1 and a <=5 is ok, but only ok for a single or very limited 
number of disjoint ranges, but this is a very limited corner case, most likely 
we may end up with many disjoint ranges, especially when there are 10k values. 
I don't think it is worth doing SARG for IN for this sake.

3. For non-integer data types, like double or string, we will end up with 
ranges {[a,a], [b,b], [c,c]...}, the stats derivation e.g. inferring 
selectivity from histogram may take much longer time depends on the 
implementation. And when passing to executor, we have to transform it back to 
form of col = ANY(a,b,c) or col IN (a,b,c) to be able to utilize hash look up.

4. It is not extensible. It can only be used for iN or NOT IN. What about 
customized operators like geospatial intersect? e.g. col intersect ANY(area1, 
area2, area3)

Haisheng

On 2020/08/10 16:15:56, Michael Mior  wrote: 
> The simplifications you present would also of course assume that the
> column has integer type. In any case, overall the proposal seems solid
> to me.
> 
> --
> Michael Mior
> mm...@apache.org
> 
> Le lun. 10 août 2020 à 04:21, Fan Liya  a écrit :
> >
> > Hi Julian,
> >
> > Thanks for opening the discussion.
> >
> > In general, I am convinced of the value and importance of the proposal.
> >
> > I want to discuss more about the algebra involved, as the simplified range
> > sets may not have the minimum computational costs.
> > Some examples:
> >
> > 1. Suppose we have expression x in (1, 2, 4, 5, 6).
> >
> > The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
> > 4 and x <= 6), which represents 4 comparisons and 3 logical operations.
> > Another equivalent expression is x = 1 or x = 2 or x = 4 or x = 5 or x = 6,
> > which represents 5 comparisons and 4 logical operations.
> > It's hard to say which one has a smaller computational cost.
> >
> > 2. Suppose we have expression x in (1, 2, 4, 5)
> > The range set implementation may reduce it to (x >= 1 and x <= 2) or (x >=
> > 4 and x <= 5).
> > Another (possibly cheaper) reduction should be x >= 1 and x <= 5 and x != 3.
> > I am not sure if the range set implementation supports this type of
> > simplification.
> >
> > Therefore, to address above problems,
> > 1. We may need a model to estimate the cost, and the simplification process
> > should be guided by the cost model.
> > 2. Maybe we need to extend the range set implementation so it produces
> > expressions with minimum computational cost.
> >
> > Best,
> > Liya Fan
> >
> >
> >
> >
> >
> > On Mon, Aug 10, 2020 at 3:27 AM Julian Hyde  wrote:
> >
> > > We have had several discussions over the years about how to represent
> > > IN-lists (e.g. “x IN (1, 3, 5)”) in RexNode-land.
> > >
> > > I have generally taken the position that we should expand to ORs (e.g. “x
> > > = 1 OR x = 3 OR x = 5”) but a few months ago accepted that we should allow
> > > IN in RexCall.
> > >
> > > I have given this some further thought, as part of a couple of RexSimplify
> > > bugs [1] [2] and I now think we should replace IN with something more
> > > powerful, namely sargs. A sarg (or search argument [3]) is an ordered set
> > > of intervals, and can be represented as a Guava ImmutableRangeSet, such as
> > > "[[0‥1), (1‥2], [3‥3], (5‥+∞)]". It can represent an IN-list of constants,
> > > but also ranges.
> > >
> > > Today you would write
> > >
> > >   RexCall(IN, RexInputRef(0), RexLiteral(1), RexLiteral(3), RexLiteral(5))
> > >
> > > With sargs, you would instead write
> > >
> > >   RexCall(IN_SARG, RexInputRef(0), RexSarg(ImmutableRangeSet(“[[1..1],
> > > [3..3], [5..5]]”)))
> > >
> > > There is a new operator IN_SARG, and a new node type RexSarg (it could
> > > almost be a RexLiteral).
> > >
> > > Sargs (and Guava RangeSets) have a powerful and consistent algebra, so if
> > > we invest in sarg support in RexSimplify and RelOptPredicateList, that
> > > investment is likely to pay dividends in better plans.
> > >
> > > Guava RangeSets, and hence sargs, have support for discrete domains, so
> > > they can easily optimize ">2 AND <4" to "3”.
> > >
> > > Sargs would be the preferred (therefore canonical) form for any AND or OR
> > > list that has more than one comparison on the same operand (e.g. "x > 3 
> > > AND
> > > x < 17 AND x <> 10”).
> > >
> > > This proposal would subsume IN and therefore we would stop supporting IN
> > > in RexCall.
> > >
> > > Julian
> > >
> > > [1] https://issues.apache.org/jira/browse/CALCITE-4155 <
> > > https://issues.apache.org/jira/browse/CALCITE-4155>
> > >
> > > [2] 

Re: [VOTE] Release apache-calcite-1.25.0 (release candidate 0)

2020-08-10 Thread Haisheng Yuan
+1 (binding)

Environment: 
Mac OS X 10.15.1 , JDK 1.8.0_162
- Ran unit tests (./gradlew build), OK
- Checked release notes, CALCITE-4156, CALCITE-4022 CALCITE-4115 CALCITE-4129 
CALCITE-4111 are not part of Build and test suite changes, those can be updated 
after release.

- Haisheng

On 2020/08/09 03:22:28, Andrei Sereda  wrote: 
> Hi all,
> 
> I have created a build for Apache Calcite 1.25.0, release
> candidate 0.
> 
> Thanks to everyone who has contributed to this release.
> 
> You can read the release notes here:
> https://apache.github.io/calcite-site-preview/docs/history.html
> 
> The commit to be voted upon:
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=68b02dfd4af15bc94a91a0cd2a30655d04439555
> 
> Its hash is 68b02dfd4af15bc94a91a0cd2a30655d04439555
> 
> Tag:
> https://github.com/apache/calcite/tree/calcite-1.25.0-rc0
> 
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.25.0-rc0
> (revision 40922)
> 
> RAT report:
> https://apache.github.io/calcite-site-preview/rat/rat-report.txt
> 
> Site preview is here:
> https://apache.github.io/calcite-site-preview/
> 
> JavaDoc API preview is here:
> https://apache.github.io/calcite-site-preview/api
> 
> The hashes of the artifacts are as follows:
> a5e61bd93657a274ee8a1d1ecbde68e3e471fd27b85bea179991b372f094ae3cdf692672245506a08b996534f9136be26569de81af2bb7d8f026799313957e87
> *apache-calcite-1.25.0-src.tar.gz
> 
> A staged Maven repository is available for review at:
> https://repository.apache.org/content/repositories/orgapachecalcite-1097/org/apache/calcite/
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/sereda.asc
> https://www.apache.org/dist/calcite/KEYS
> 
> N.B.
> To create the jars and test Apache Calcite: "./gradlew build".
> 
> If you do not have a Java environment available, you can run the tests
> using docker. To do so, install docker and docker-compose, then run
> "docker-compose run test" from the root of the directory.
> 
> Please vote on releasing this package as Apache Calcite 1.25.0.
> 
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Calcite 1.25.0
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
> 
> 
> Here is my vote:
> 
> +1 (non binding)
> 


Re: Pushing down aggregates through rhs of a left join

2020-07-31 Thread Haisheng Yuan
>  Is there interest
>  in PRing something like this to calcite, either as a new rule or part
>  of AggregateJoinTranspose?

Yes. It is better to be part of AggregateJoinTranspose.

On 2020/07/31 05:41:56, Alex Baden  wrote: 
> Hi all,
> 
> I have a query of the form:
> 
> SELECT a.x, SUM(b.y), SUM(b.z) FROM t1 a LEFT JOIN t2 b ON a.join_key
> = b.join_key GROUP BY a.x ORDER BY a.x;
> 
> If table b has a large number of duplicate keys for `join_key`, the
> left join can be very expensive to compute. Instead, we would like to
> run:
> 
> WITH t2g AS (SELECT b.join_key, SUM(b.y), SUM(b.z) FROM t2 b GROUP BY
> b.join_key)
> SELECT a.x, SUM(bg.y), SUM(bg.z) FROM t1 a LEFT JOIN t2g bg ON
> a.join_key = t2g.join_key GROUP BY a.x ORDER BY a.x;
> 
> Essentially, since we are only projecting aggregates from the rhs of
> the join, and the aggregate functions are associative, we can group by
> the join key to compute the aggregates up front, then join on the
> grouped results, and finally aggregate among join matches.
> 
> Looking at the comments of the AGGREGATE_JOIN_TRANSPOSE_RULE, I noted
> the following:
> // OUTER joins are supported for group by without aggregate functions
> 
> But based on the above, if we have a left join I believe we can
> transpose the aggregate and the join if the following conditions hold:
> 1) only expressions from the rhs of the join are aggregated
> 2) all aggregate functions from (1) are associative (can be split)
> 3) at least one expression from the lhs of the join is grouped
> 
> I am interested in implementing this rule (assuming the conditions
> above are strong enough to guarantee correctness). Is there interest
> in PRing something like this to calcite, either as a new rule or part
> of AggregateJoinTranspose?
> 
> Thanks,
> Alex
> 


[DISCUSS] Release Managers

2020-07-25 Thread Haisheng Yuan
Hi,

Would anyone be interested in being release manager for v1.26.0, v1.27.0 or 
v1.28.0?
We need 3 volunteers (must be PMC or committer) for these 3 versions.

Thanks,
Haisheng Yuan


Re: Changing JIRAs' status to “Closed”

2020-07-25 Thread Haisheng Yuan
Hi Chunwei,

You have to choose transition issues instead of edit issues.
I had the same confusion when releasing v1.23.0 [1].

I think we'd better add this into the document, so that no one would be wasted 
time on this in the future.

[1] 
https://lists.apache.org/thread.html/rc375dc347f94691f7a0ef415e727d715548b4927ba83c16c85fb1b93%40%3Cdev.calcite.apache.org%3E

On 2020/07/25 07:55:00, Francis Chuang  wrote: 
> I've added you to the administrator role. You should be able to do that now.
> 
> Francis
> 
> On 25/07/2020 4:45 pm, Chunwei Lei wrote:
> > Hi,
> > 
> > I tried to do a bulk update changing JIRAs' status to “Closed”. But I can
> > not
> > find the button. Is it because I don't have such permission?
> > 
> > Appreciate your help!
> > 
> > Best,
> > Chunwei
> > 
> 


Re: [DISCUSS] New Join Type: ANTI_NOTIN

2020-07-24 Thread Haisheng Yuan
I am not sure I get your idea.
What will the logical plan and physical plan look like for the following query?
SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- bar.b is nullable


On 2020/07/23 01:35:44, Julian Hyde  wrote: 
> How about a semi-join algorithm that adds column that hold the nature of the 
> match? This algorithm can be used to evaluate 3-valued IN and 3-valued NOT IN 
> queries.
> 
> Generalizing further, it could compute any “ANY (predicate)” or “ALL 
> (predicate)” condition as a column with values TRUE/FALSE/UNKNOWN. For 
> instance “empno NOT IN …” is equivalent to “ALL(empno <> …)”.
> 
> Here is an example.
> 
> Emp table:
> 
> empno deptno  mgr
> = == 
>   100 10 NULL
>   101 10  100
>   102 20  101
>   103 30  100
>   104 30  103
>   105 40 NULL
> 
> 
> The semi-join algorithm calculates the following intermediate result SJ:
> 
> empno deptno mgrsmatch anyNulls allNulls
> = == === =  
>   100 10 [NULL, 100] TRUE  TRUE FALSE
>   101 10 [NULL, 100] FALSE TRUE FALSE
>   102 20 [101]   FALSE FALSEFALSE
>   103 30 [100, 103]  TRUE  FALSEFALSE
>   104 30 [100, 103]  FALSE FALSETRUE
>   105 40 [NULL]  FALSE TRUE TRUE
> 
> So it can answer the following query:
> 
> SELECT empno,
>   deptno,
>   empno IN (SELECT mgr FROM Emp
> WHERE deptno = e.deptno) AS “in",
>   empno NOT IN (SELECT mgr FROM Emp
> WHERE deptno = e.deptno) AS “notIn"
> FROM Emp AS e;
> 
> empno deptno in  notIn
> = == === =
>   100 10 TRUEFALSE
>   101 10 FALSE   UNKNOWN
>   102 20 FALSE   TRUE
>   103 30 TRUEFALSE
>   104 40 FALSE   FALSE
>   105 50 UNKNOWN UNKNOWN
> 
> by mapping it to the output of the semi-join algorithm:
> 
> SELECT empno, deptno,
>   case when allNulls then unknown else match end AS “in”,
>   case when anyNulls then unknown else not match end AS “notIn”
> FROM SJ
>   
> 
> 
> 
>  
> 
> 
> 
> > On Jul 22, 2020, at 8:57 AM, Vladimir Sitnikov 
> >  wrote:
> > 
> > Julian> Vladimir said he *expected* Oracle would implement (3-valued) NOT
> > IN efficiently. (Back in the day, when I was at Oracle, they certainly did
> > not.) Does anyone have any evidence that they do?
> > 
> > Well, Oracle has "null aware" joins since Oracle 11g which is more than 10
> > years old.
> > I have not tested the actual performance of the `null-aware join`, however,
> > it would be extremely surprising, if `null-aware join` was less efficient
> > than "manually crafted aggregate + join + join + whatever".
> > 
> > That is why I think it is important to be able to generate regular "not in"
> > plans.
> > 
> > Vladimir
> 
> 


Re: [DISCUSS] Towards Calcite 1.25.0

2020-07-24 Thread Haisheng Yuan
Do we have release manager for v1.25.0?


On 2020/07/23 17:15:12, Julian Hyde  wrote: 
> The release vote for 1.24 RC0 just passed [1], and it will be released 
> shortly.
> 
> We planned that release 1.24 is a transitional release, and 1.25 will follow 
> soon afterwards. 1.24 deprecates some APIs that we intend to remove in 1.25. 
> We had to make the release so that people can see that the APIs are 
> deprecated, and replace them with the new versions.
> 
> During the release vote we ran into 4136 [2]. This is a real bug, caused in 
> part by the API changes to deprecate APIs (and provide long-term 
> replacements) and in part caused by 4079 [3], the real bug that we are trying 
> to solve in 1.25.
> 
> So, it is important that we release 1.25 soon, and that it solves all of 
> these problems. I don’t know whether people will perceive 1.24 as a release 
> “lower than our usual quality” (even though there are good reasons for it) 
> but 1.25 needs to be back to the usual quality.
> 
> Julian
> 
> [1] https://issues.apache.org/jira/browse/CALCITE-4117 
> 
> 
> [2] https://issues.apache.org/jira/browse/CALCITE-4136 
> 
> 
> [3] https://issues.apache.org/jira/browse/CALCITE-4079 
> 


Re: [RESULT] [VOTE] Release apache-calcite-1.24.0 (release candidate 0)

2020-07-23 Thread Haisheng Yuan
I dry run it, it will remove all the artifacts before 1.24.0. I think that is 
not Calcite's policy. At least it should keep 1.23.0.

On 2020/07/24 04:40:50, Francis Chuang  wrote: 
> orgapachecalcite-1096 is closed and released, so that's fine. The other 
> open ones are for rcs that did not pass, so I don't think they should be 
> closed and released.
> 
> There are currently 3 calcite releases in the release folder: 
> https://dist.apache.org/repos/dist/release/calcite/
> 
> We should just keep the latest 2 releases, so I think you can just use 
> svn to remove 1.22.0. There's the `removeStaleArtifacts` command [1], 
> but I have not tested it myself.
> 
> Francis
> 
> [1] 
> https://github.com/vlsi/vlsi-release-plugins/tree/master/plugins/stage-vote-release-plugin#removing-stale-artifacts
> 
> 
> On 24/07/2020 2:38 pm, Haisheng Yuan wrote:
> > Skipped the step of releaseRepository too.
> > Now finished release.
> > 
> > There are several closed or open repos in 
> > https://repository.apache.org/#stagingRepositories,
> > should I manually release all of them?
> > 
> > On 2020/07/24 04:25:43, Haisheng Yuan  wrote:
> >> I skipped the step, but now have this error:
> >> ~/calcite ❯❯❯ ./gradlew publishDist -Prc=0 -Pasf -x publishSvnDist
> >>
> >>> Configure project :
> >> Building Apache Calcite 1.24.0
> >>
> >>> Task :releaseRepository
> >> Initialized stagingRepositoryId orgapachecalcite-1096
> >>   for repository nexus
> >> POST request failed. 500: Server Error, body: [errors:[[id:*, 
> >> msg:Unhandled: Missing staging repository: orgapachecalcite-1096
> >> ]]]
> >>
> >>> Task :releaseRepository FAILED
> >>
> >> Build calcite FAILURE reason:
> >>  Execution failed for task ':releaseRepository':
> >>  io.codearte.gradle.nexus.infra.NexusHttpResponseException: 500: 
> >> Server Error, body: [errors:[[id:*, msg:Unhandled: Missing staging 
> >> repository: orgapachecalcite-1096
> >>  ]]]
> >>  at 
> >> io.codearte.gradle.nexus.infra.SimplifiedHttpJsonRestClient.sendRequestHandlingErrors(SimplifiedHttpJsonRestClient.groovy:52)
> >>
> >>
> >>
> >> On 2020/07/24 04:19:14, Haisheng Yuan  wrote:
> >>> How can I skip this step?
> >>>
> >>> ~/calcite ❯❯❯ ./gradlew publishDist -Prc=0 -Pasf
> >>>
> >>>> Configure project :
> >>> Building Apache Calcite 1.24.0
> >>>
> >>>> Task :publishSvnDist FAILED
> >>> Listing SVN folders [dev/calcite/apache-calcite-1.24.0-rc0] at 
> >>> https://dist.apache.org/repos/dist
> >>>
> >>> Build calcite FAILURE reason:
> >>>  Execution failed for task ':publishSvnDist':
> >>>  Unable to list folders [dev/calcite/apache-calcite-1.24.0-rc0] 
> >>> at https://dist.apache.org/repos/dist: svn: warning: W160013: URL 
> >>> 'https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0'
> >>>  non-existent in revision 40660
> >>>  svn: E29: Could not list all targets because some targets 
> >>> don't exist
> >>>
> >>>
> >>> I met an error when executing the command. But the command executed some 
> >>> steps successfully before error out. After I fixed the issue and 
> >>> re-execute the command, now I see this error. Because the folders is 
> >>> already removed from dev directory.
> >>>
> >>> On 2020/07/24 04:00:14, Haisheng Yuan  wrote:
> >>>> I will help push the artifacts.
> >>>>
> >>>> On 2020/07/24 03:04:05, Francis Chuang  wrote:
> >>>>> I wonder if this is because only PMCs can push to the SVN repo. If this
> >>>>> is the case, I think we need to nominate a PMC member to push the
> >>>>> artifacts for you.
> >>>>>
> >>>>> On 24/07/2020 12:58 pm, Chunwei Lei wrote:
> >>>>>> Hi, Francis.
> >>>>>>
> >>>>>> I already put those settings in ~/.gradle/gradle.properties:
> >>>>>>
> >>>>>> $cat ~/.gradle/gradle.properties
> >>>>>> asfCommitterId=chunwei
> >>>>>> asfNexusUsername=chunwei
> >>>>>> asfNexusPassword=xxx
> >>>>>>
> >>>>>> asfSvnUsername=chunwei
> >>>>>> asfSvnPassword=xx

Re: [RESULT] [VOTE] Release apache-calcite-1.24.0 (release candidate 0)

2020-07-23 Thread Haisheng Yuan
Can everyone please not commit to master branch for a moment?
The release manager hasn't open the master branch yet.

On 2020/07/24 04:38:07, Haisheng Yuan  wrote: 
> Skipped the step of releaseRepository too.
> Now finished release.
> 
> There are several closed or open repos in 
> https://repository.apache.org/#stagingRepositories,
> should I manually release all of them?
> 
> On 2020/07/24 04:25:43, Haisheng Yuan  wrote: 
> > I skipped the step, but now have this error:
> > ~/calcite ❯❯❯ ./gradlew publishDist -Prc=0 -Pasf -x publishSvnDist
> > 
> > > Configure project :
> > Building Apache Calcite 1.24.0
> > 
> > > Task :releaseRepository
> > Initialized stagingRepositoryId orgapachecalcite-1096
> >  for repository nexus
> > POST request failed. 500: Server Error, body: [errors:[[id:*, 
> > msg:Unhandled: Missing staging repository: orgapachecalcite-1096
> > ]]]
> > 
> > > Task :releaseRepository FAILED
> > 
> > Build calcite FAILURE reason:
> > Execution failed for task ':releaseRepository':
> > io.codearte.gradle.nexus.infra.NexusHttpResponseException: 500: 
> > Server Error, body: [errors:[[id:*, msg:Unhandled: Missing staging 
> > repository: orgapachecalcite-1096
> > ]]]
> > at 
> > io.codearte.gradle.nexus.infra.SimplifiedHttpJsonRestClient.sendRequestHandlingErrors(SimplifiedHttpJsonRestClient.groovy:52)
> > 
> > 
> > 
> > On 2020/07/24 04:19:14, Haisheng Yuan  wrote: 
> > > How can I skip this step?
> > > 
> > > ~/calcite ❯❯❯ ./gradlew publishDist -Prc=0 -Pasf
> > > 
> > > > Configure project :
> > > Building Apache Calcite 1.24.0
> > > 
> > > > Task :publishSvnDist FAILED
> > > Listing SVN folders [dev/calcite/apache-calcite-1.24.0-rc0] at 
> > > https://dist.apache.org/repos/dist
> > > 
> > > Build calcite FAILURE reason:
> > > Execution failed for task ':publishSvnDist':
> > > Unable to list folders [dev/calcite/apache-calcite-1.24.0-rc0] at 
> > > https://dist.apache.org/repos/dist: svn: warning: W160013: URL 
> > > 'https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0'
> > >  non-existent in revision 40660
> > > svn: E29: Could not list all targets because some targets 
> > > don't exist
> > > 
> > > 
> > > I met an error when executing the command. But the command executed some 
> > > steps successfully before error out. After I fixed the issue and 
> > > re-execute the command, now I see this error. Because the folders is 
> > > already removed from dev directory.
> > > 
> > > On 2020/07/24 04:00:14, Haisheng Yuan  wrote: 
> > > > I will help push the artifacts.
> > > > 
> > > > On 2020/07/24 03:04:05, Francis Chuang  
> > > > wrote: 
> > > > > I wonder if this is because only PMCs can push to the SVN repo. If 
> > > > > this 
> > > > > is the case, I think we need to nominate a PMC member to push the 
> > > > > artifacts for you.
> > > > > 
> > > > > On 24/07/2020 12:58 pm, Chunwei Lei wrote:
> > > > > > Hi, Francis.
> > > > > > 
> > > > > > I already put those settings in ~/.gradle/gradle.properties:
> > > > > > 
> > > > > > $cat ~/.gradle/gradle.properties
> > > > > > asfCommitterId=chunwei
> > > > > > asfNexusUsername=chunwei
> > > > > > asfNexusPassword=xxx
> > > > > > 
> > > > > > asfSvnUsername=chunwei
> > > > > > asfSvnPassword=xxx
> > > > > > 
> > > > > > asfGitSourceUsername=chunweilei
> > > > > > asfGitSourcePassword=xxx
> > > > > > 
> > > > > > useGpgCmd=true
> > > > > > signing.gnupg.useLegacyGpg=true
> > > > > > signing.gnupg.keyName=1F1597F99BCE51A1
> > > > > > 
> > > > > > asfTestSvnPassword=test
> > > > > > asfTestSvnUsername=test
> > > > > > asfTestGitSourceUsername=test
> > > > > > asfTestGitSourcePassword=test
> > > > > > asfTestNexusUsername=test
> > > > > > asfTestNexusPassword=test
> > > > > > 
> > > > > > 
> > > > > > Best,
> > > > > > Chunwei
> > > > > > 
> > > > > >

Re: [RESULT] [VOTE] Release apache-calcite-1.24.0 (release candidate 0)

2020-07-23 Thread Haisheng Yuan
Skipped the step of releaseRepository too.
Now finished release.

There are several closed or open repos in 
https://repository.apache.org/#stagingRepositories,
should I manually release all of them?

On 2020/07/24 04:25:43, Haisheng Yuan  wrote: 
> I skipped the step, but now have this error:
> ~/calcite ❯❯❯ ./gradlew publishDist -Prc=0 -Pasf -x publishSvnDist
> 
> > Configure project :
> Building Apache Calcite 1.24.0
> 
> > Task :releaseRepository
> Initialized stagingRepositoryId orgapachecalcite-1096
>  for repository nexus
> POST request failed. 500: Server Error, body: [errors:[[id:*, msg:Unhandled: 
> Missing staging repository: orgapachecalcite-1096
> ]]]
> 
> > Task :releaseRepository FAILED
> 
> Build calcite FAILURE reason:
> Execution failed for task ':releaseRepository':
> io.codearte.gradle.nexus.infra.NexusHttpResponseException: 500: 
> Server Error, body: [errors:[[id:*, msg:Unhandled: Missing staging 
> repository: orgapachecalcite-1096
> ]]]
> at 
> io.codearte.gradle.nexus.infra.SimplifiedHttpJsonRestClient.sendRequestHandlingErrors(SimplifiedHttpJsonRestClient.groovy:52)
> 
> 
> 
> On 2020/07/24 04:19:14, Haisheng Yuan  wrote: 
> > How can I skip this step?
> > 
> > ~/calcite ❯❯❯ ./gradlew publishDist -Prc=0 -Pasf
> > 
> > > Configure project :
> > Building Apache Calcite 1.24.0
> > 
> > > Task :publishSvnDist FAILED
> > Listing SVN folders [dev/calcite/apache-calcite-1.24.0-rc0] at 
> > https://dist.apache.org/repos/dist
> > 
> > Build calcite FAILURE reason:
> > Execution failed for task ':publishSvnDist':
> > Unable to list folders [dev/calcite/apache-calcite-1.24.0-rc0] at 
> > https://dist.apache.org/repos/dist: svn: warning: W160013: URL 
> > 'https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0' 
> > non-existent in revision 40660
> > svn: E29: Could not list all targets because some targets don't 
> > exist
> > 
> > 
> > I met an error when executing the command. But the command executed some 
> > steps successfully before error out. After I fixed the issue and re-execute 
> > the command, now I see this error. Because the folders is already removed 
> > from dev directory.
> > 
> > On 2020/07/24 04:00:14, Haisheng Yuan  wrote: 
> > > I will help push the artifacts.
> > > 
> > > On 2020/07/24 03:04:05, Francis Chuang  wrote: 
> > > > I wonder if this is because only PMCs can push to the SVN repo. If this 
> > > > is the case, I think we need to nominate a PMC member to push the 
> > > > artifacts for you.
> > > > 
> > > > On 24/07/2020 12:58 pm, Chunwei Lei wrote:
> > > > > Hi, Francis.
> > > > > 
> > > > > I already put those settings in ~/.gradle/gradle.properties:
> > > > > 
> > > > > $cat ~/.gradle/gradle.properties
> > > > > asfCommitterId=chunwei
> > > > > asfNexusUsername=chunwei
> > > > > asfNexusPassword=xxx
> > > > > 
> > > > > asfSvnUsername=chunwei
> > > > > asfSvnPassword=xxx
> > > > > 
> > > > > asfGitSourceUsername=chunweilei
> > > > > asfGitSourcePassword=xxx
> > > > > 
> > > > > useGpgCmd=true
> > > > > signing.gnupg.useLegacyGpg=true
> > > > > signing.gnupg.keyName=1F1597F99BCE51A1
> > > > > 
> > > > > asfTestSvnPassword=test
> > > > > asfTestSvnUsername=test
> > > > > asfTestGitSourceUsername=test
> > > > > asfTestGitSourcePassword=test
> > > > > asfTestNexusUsername=test
> > > > > asfTestNexusPassword=test
> > > > > 
> > > > > 
> > > > > Best,
> > > > > Chunwei
> > > > > 
> > > > > 
> > > > > On Fri, Jul 24, 2020 at 10:46 AM Francis Chuang 
> > > > > 
> > > > > wrote:
> > > > > 
> > > > >> You need to include your username and password using the -Pasf* 
> > > > >> flags.
> > > > >>
> > > > >> Here's how I use those flags to publish an avatica release using 
> > > > >> docker:
> > > > >> https://github.com/apache/calcite-avatica/blob/master/docker.sh#L251
> > > > >>
> > > > >> Francis
> > > > >>
> > > > >> On 24/07/2020 12:43 pm, Chunwei Lei wrote:
> 

Re: [RESULT] [VOTE] Release apache-calcite-1.24.0 (release candidate 0)

2020-07-23 Thread Haisheng Yuan
I skipped the step, but now have this error:
~/calcite ❯❯❯ ./gradlew publishDist -Prc=0 -Pasf -x publishSvnDist

> Configure project :
Building Apache Calcite 1.24.0

> Task :releaseRepository
Initialized stagingRepositoryId orgapachecalcite-1096
 for repository nexus
POST request failed. 500: Server Error, body: [errors:[[id:*, msg:Unhandled: 
Missing staging repository: orgapachecalcite-1096
]]]

> Task :releaseRepository FAILED

Build calcite FAILURE reason:
Execution failed for task ':releaseRepository':
io.codearte.gradle.nexus.infra.NexusHttpResponseException: 500: Server 
Error, body: [errors:[[id:*, msg:Unhandled: Missing staging repository: 
orgapachecalcite-1096
]]]
at 
io.codearte.gradle.nexus.infra.SimplifiedHttpJsonRestClient.sendRequestHandlingErrors(SimplifiedHttpJsonRestClient.groovy:52)



On 2020/07/24 04:19:14, Haisheng Yuan  wrote: 
> How can I skip this step?
> 
> ~/calcite ❯❯❯ ./gradlew publishDist -Prc=0 -Pasf
> 
> > Configure project :
> Building Apache Calcite 1.24.0
> 
> > Task :publishSvnDist FAILED
> Listing SVN folders [dev/calcite/apache-calcite-1.24.0-rc0] at 
> https://dist.apache.org/repos/dist
> 
> Build calcite FAILURE reason:
> Execution failed for task ':publishSvnDist':
> Unable to list folders [dev/calcite/apache-calcite-1.24.0-rc0] at 
> https://dist.apache.org/repos/dist: svn: warning: W160013: URL 
> 'https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0' 
> non-existent in revision 40660
> svn: E29: Could not list all targets because some targets don't 
> exist
> 
> 
> I met an error when executing the command. But the command executed some 
> steps successfully before error out. After I fixed the issue and re-execute 
> the command, now I see this error. Because the folders is already removed 
> from dev directory.
> 
> On 2020/07/24 04:00:14, Haisheng Yuan  wrote: 
> > I will help push the artifacts.
> > 
> > On 2020/07/24 03:04:05, Francis Chuang  wrote: 
> > > I wonder if this is because only PMCs can push to the SVN repo. If this 
> > > is the case, I think we need to nominate a PMC member to push the 
> > > artifacts for you.
> > > 
> > > On 24/07/2020 12:58 pm, Chunwei Lei wrote:
> > > > Hi, Francis.
> > > > 
> > > > I already put those settings in ~/.gradle/gradle.properties:
> > > > 
> > > > $cat ~/.gradle/gradle.properties
> > > > asfCommitterId=chunwei
> > > > asfNexusUsername=chunwei
> > > > asfNexusPassword=xxx
> > > > 
> > > > asfSvnUsername=chunwei
> > > > asfSvnPassword=xxx
> > > > 
> > > > asfGitSourceUsername=chunweilei
> > > > asfGitSourcePassword=xxx
> > > > 
> > > > useGpgCmd=true
> > > > signing.gnupg.useLegacyGpg=true
> > > > signing.gnupg.keyName=1F1597F99BCE51A1
> > > > 
> > > > asfTestSvnPassword=test
> > > > asfTestSvnUsername=test
> > > > asfTestGitSourceUsername=test
> > > > asfTestGitSourcePassword=test
> > > > asfTestNexusUsername=test
> > > > asfTestNexusPassword=test
> > > > 
> > > > 
> > > > Best,
> > > > Chunwei
> > > > 
> > > > 
> > > > On Fri, Jul 24, 2020 at 10:46 AM Francis Chuang 
> > > > 
> > > > wrote:
> > > > 
> > > >> You need to include your username and password using the -Pasf* flags.
> > > >>
> > > >> Here's how I use those flags to publish an avatica release using 
> > > >> docker:
> > > >> https://github.com/apache/calcite-avatica/blob/master/docker.sh#L251
> > > >>
> > > >> Francis
> > > >>
> > > >> On 24/07/2020 12:43 pm, Chunwei Lei wrote:
> > > >>> Hi all,
> > > >>>
> > > >>> I tried to push the release out to mirrors. But I met an error:
> > > >>>
> > > >>> $ ./gradlew publishDist -Prc=0 -Pasf
> > > >>> Starting a Gradle Daemon, 3 busy and 1 incompatible Daemons could not 
> > > >>> be
> > > >>> reused, use --status for details
> > > >>>
> > > >>>> Configure project :
> > > >>> Building Apache Calcite 1.24.0
> > > >>>
> > > >>>> Task :publishSvnDist
> > > >>> Listing SVN folders [dev/calcite/apache-calcite-1.24.0-rc0] at
> > > >>> https://dist.apache.org/repos/dis

Re: [RESULT] [VOTE] Release apache-calcite-1.24.0 (release candidate 0)

2020-07-23 Thread Haisheng Yuan
How can I skip this step?

~/calcite ❯❯❯ ./gradlew publishDist -Prc=0 -Pasf

> Configure project :
Building Apache Calcite 1.24.0

> Task :publishSvnDist FAILED
Listing SVN folders [dev/calcite/apache-calcite-1.24.0-rc0] at 
https://dist.apache.org/repos/dist

Build calcite FAILURE reason:
Execution failed for task ':publishSvnDist':
Unable to list folders [dev/calcite/apache-calcite-1.24.0-rc0] at 
https://dist.apache.org/repos/dist: svn: warning: W160013: URL 
'https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0' 
non-existent in revision 40660
svn: E29: Could not list all targets because some targets don't 
exist


I met an error when executing the command. But the command executed some steps 
successfully before error out. After I fixed the issue and re-execute the 
command, now I see this error. Because the folders is already removed from dev 
directory.

On 2020/07/24 04:00:14, Haisheng Yuan  wrote: 
> I will help push the artifacts.
> 
> On 2020/07/24 03:04:05, Francis Chuang  wrote: 
> > I wonder if this is because only PMCs can push to the SVN repo. If this 
> > is the case, I think we need to nominate a PMC member to push the 
> > artifacts for you.
> > 
> > On 24/07/2020 12:58 pm, Chunwei Lei wrote:
> > > Hi, Francis.
> > > 
> > > I already put those settings in ~/.gradle/gradle.properties:
> > > 
> > > $cat ~/.gradle/gradle.properties
> > > asfCommitterId=chunwei
> > > asfNexusUsername=chunwei
> > > asfNexusPassword=xxx
> > > 
> > > asfSvnUsername=chunwei
> > > asfSvnPassword=xxx
> > > 
> > > asfGitSourceUsername=chunweilei
> > > asfGitSourcePassword=xxx
> > > 
> > > useGpgCmd=true
> > > signing.gnupg.useLegacyGpg=true
> > > signing.gnupg.keyName=1F1597F99BCE51A1
> > > 
> > > asfTestSvnPassword=test
> > > asfTestSvnUsername=test
> > > asfTestGitSourceUsername=test
> > > asfTestGitSourcePassword=test
> > > asfTestNexusUsername=test
> > > asfTestNexusPassword=test
> > > 
> > > 
> > > Best,
> > > Chunwei
> > > 
> > > 
> > > On Fri, Jul 24, 2020 at 10:46 AM Francis Chuang 
> > > wrote:
> > > 
> > >> You need to include your username and password using the -Pasf* flags.
> > >>
> > >> Here's how I use those flags to publish an avatica release using docker:
> > >> https://github.com/apache/calcite-avatica/blob/master/docker.sh#L251
> > >>
> > >> Francis
> > >>
> > >> On 24/07/2020 12:43 pm, Chunwei Lei wrote:
> > >>> Hi all,
> > >>>
> > >>> I tried to push the release out to mirrors. But I met an error:
> > >>>
> > >>> $ ./gradlew publishDist -Prc=0 -Pasf
> > >>> Starting a Gradle Daemon, 3 busy and 1 incompatible Daemons could not be
> > >>> reused, use --status for details
> > >>>
> > >>>> Configure project :
> > >>> Building Apache Calcite 1.24.0
> > >>>
> > >>>> Task :publishSvnDist
> > >>> Listing SVN folders [dev/calcite/apache-calcite-1.24.0-rc0] at
> > >>> https://dist.apache.org/repos/dist
> > >>> Executing svnmucc. root=https://dist.apache.org/repos/dist,
> > >>> message=Promoting Apache Calcite calcite-1.24.0-rc0 -> calcite-1.24.0 to
> > >>> release area, commands:
> > >>> mkdir
> > >>> release/calcite/apache-calcite-1.24.0
> > >>> cp
> > >>> 40574
> > >>> dev/calcite/apache-calcite-1.24.0-rc0/apache-calcite-1.24.0-src.tar.gz
> > >>> release/calcite/apache-calcite-1.24.0/apache-calcite-1.24.0-src.tar.gz
> > >>> cp
> > >>> 40574
> > >>>
> > >> dev/calcite/apache-calcite-1.24.0-rc0/apache-calcite-1.24.0-src.tar.gz.asc
> > >>>
> > >> release/calcite/apache-calcite-1.24.0/apache-calcite-1.24.0-src.tar.gz.asc
> > >>> cp
> > >>> 40574
> > >>>
> > >> dev/calcite/apache-calcite-1.24.0-rc0/apache-calcite-1.24.0-src.tar.gz.sha512
> > >>>
> > >> release/calcite/apache-calcite-1.24.0/apache-calcite-1.24.0-src.tar.gz.sha512
> > >>> rm
> > >>> dev/calcite/apache-calcite-1.24.0-rc0
> > >>> svnmucc: E175013: Access to
> > >>> '/repos/dist/!svn/txr/40659-yas/release/calcite/apache-calcite-1.24.0'
> > >>> forbidden
> &

Re: [VOTE] Release apache-calcite-1.24.0 (release candidate 0)

2020-07-23 Thread Haisheng Yuan
Yes, I agree.
I will add the details to the JIRA.
Thanks for reminding.

On 2020/07/23 17:43:24, Julian Hyde  wrote: 
> 
> Haisheng,
> 
> > > Julian wrote:
> > > * why is 4032 'breaking'?
> 
> > Haisheng wrote:
> > With that change, the CalcMergeRule won't match PhysicalNode
> > (including EnumerableCalc) in VolcanoPlanner. Perhaps I should
> > elaborate in the release notes.
> 
> Maybe a little more in the release notes. But please add the full details 
> (including link to the email discussion) in CALCITE-4032. In my opinion JIRA 
> cases are a better place for long-term information than release notes or 
> commit messages.
> 
> Julian
> 
> 


Re: [RESULT] [VOTE] Release apache-calcite-1.24.0 (release candidate 0)

2020-07-23 Thread Haisheng Yuan
I will help push the artifacts.

On 2020/07/24 03:04:05, Francis Chuang  wrote: 
> I wonder if this is because only PMCs can push to the SVN repo. If this 
> is the case, I think we need to nominate a PMC member to push the 
> artifacts for you.
> 
> On 24/07/2020 12:58 pm, Chunwei Lei wrote:
> > Hi, Francis.
> > 
> > I already put those settings in ~/.gradle/gradle.properties:
> > 
> > $cat ~/.gradle/gradle.properties
> > asfCommitterId=chunwei
> > asfNexusUsername=chunwei
> > asfNexusPassword=xxx
> > 
> > asfSvnUsername=chunwei
> > asfSvnPassword=xxx
> > 
> > asfGitSourceUsername=chunweilei
> > asfGitSourcePassword=xxx
> > 
> > useGpgCmd=true
> > signing.gnupg.useLegacyGpg=true
> > signing.gnupg.keyName=1F1597F99BCE51A1
> > 
> > asfTestSvnPassword=test
> > asfTestSvnUsername=test
> > asfTestGitSourceUsername=test
> > asfTestGitSourcePassword=test
> > asfTestNexusUsername=test
> > asfTestNexusPassword=test
> > 
> > 
> > Best,
> > Chunwei
> > 
> > 
> > On Fri, Jul 24, 2020 at 10:46 AM Francis Chuang 
> > wrote:
> > 
> >> You need to include your username and password using the -Pasf* flags.
> >>
> >> Here's how I use those flags to publish an avatica release using docker:
> >> https://github.com/apache/calcite-avatica/blob/master/docker.sh#L251
> >>
> >> Francis
> >>
> >> On 24/07/2020 12:43 pm, Chunwei Lei wrote:
> >>> Hi all,
> >>>
> >>> I tried to push the release out to mirrors. But I met an error:
> >>>
> >>> $ ./gradlew publishDist -Prc=0 -Pasf
> >>> Starting a Gradle Daemon, 3 busy and 1 incompatible Daemons could not be
> >>> reused, use --status for details
> >>>
> >>>> Configure project :
> >>> Building Apache Calcite 1.24.0
> >>>
> >>>> Task :publishSvnDist
> >>> Listing SVN folders [dev/calcite/apache-calcite-1.24.0-rc0] at
> >>> https://dist.apache.org/repos/dist
> >>> Executing svnmucc. root=https://dist.apache.org/repos/dist,
> >>> message=Promoting Apache Calcite calcite-1.24.0-rc0 -> calcite-1.24.0 to
> >>> release area, commands:
> >>> mkdir
> >>> release/calcite/apache-calcite-1.24.0
> >>> cp
> >>> 40574
> >>> dev/calcite/apache-calcite-1.24.0-rc0/apache-calcite-1.24.0-src.tar.gz
> >>> release/calcite/apache-calcite-1.24.0/apache-calcite-1.24.0-src.tar.gz
> >>> cp
> >>> 40574
> >>>
> >> dev/calcite/apache-calcite-1.24.0-rc0/apache-calcite-1.24.0-src.tar.gz.asc
> >>>
> >> release/calcite/apache-calcite-1.24.0/apache-calcite-1.24.0-src.tar.gz.asc
> >>> cp
> >>> 40574
> >>>
> >> dev/calcite/apache-calcite-1.24.0-rc0/apache-calcite-1.24.0-src.tar.gz.sha512
> >>>
> >> release/calcite/apache-calcite-1.24.0/apache-calcite-1.24.0-src.tar.gz.sha512
> >>> rm
> >>> dev/calcite/apache-calcite-1.24.0-rc0
> >>> svnmucc: E175013: Access to
> >>> '/repos/dist/!svn/txr/40659-yas/release/calcite/apache-calcite-1.24.0'
> >>> forbidden
> >>>
> >>>> Task :publishSvnDist FAILED
> >>>
> >>> Build calcite FAILURE reason:
> >>>   Execution failed for task ':publishSvnDist':
> >>>   org.gradle.process.internal.ExecException: Process 'command
> >>> 'svnmucc'' finished with non-zero exit value 1
> >>>   at
> >>>
> >> org.gradle.process.internal.DefaultExecHandle$ExecResultImpl.assertNormalExitValue(DefaultExecHandle.java:417)
> >>>   at
> >>>
> >> org.gradle.process.internal.DefaultExecAction.execute(DefaultExecAction.java:38)
> >>>   at
> >>>
> >> org.gradle.process.internal.DefaultExecActionFactory.exec(DefaultExecActionFactory.java:156)
> >>>   at
> >>>
> >> org.gradle.api.internal.project.DefaultProject.exec(DefaultProject.java:1147)
> >>>   at
> >>> com.github.vlsi.gradle.release.SvnmuccTask.mucc(SvnmuccTask.kt:137)
> >>>   at
> >>> org.gradle.internal.reflect.JavaMethod.invoke(JavaMethod.java:104)
> >>>
> >>>
> >>> Any step I may miss?
> >>>
> >>>
> >>> Best,
> >>> Chunwei
> >>>
> >>>
> >>> On Fri, Jul 24, 2020 at 12:28 AM Chunwei Lei 
> >> wrote:
> >>>
> >>>> Thanks to everyone who has tested the release candidate and given
> >>>> their comments and votes.
> >>>>
> >>>> The tally is as follows.
> >>>>
> >>>> 5 binding +1s:
> >>>> Michael Mior
> >>>> Julian Hyde
> >>>> Haisheng Yuan
> >>>> Francis Chuang
> >>>> Danny Chen
> >>>>
> >>>> 4 non-binding +1s:
> >>>> Enrico Olivelli
> >>>> Ruleben Q L
> >>>> Rui Wang
> >>>> Chunwei Lei
> >>>>
> >>>> 1 binding 0s:
> >>>> Stamatis Zampetakis
> >>>>
> >>>> No  -1s.
> >>>>
> >>>> Therefore I am delighted to announce that the proposal to release
> >>>> Apache Calcite 1.24.0 has passed.
> >>>>
> >>>> Thanks everyone. We’ll now roll the release out to the mirrors.
> >>>>
> >>>> The release notes will be updated afterward according to the feedback.
> >>>>
> >>>>
> >>>> Best,
> >>>> Chunwei
> >>>>
> >>>
> >>
> > 
> 


Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN

2020-07-21 Thread Haisheng Yuan
Thanks Julian for explaining the NOT IN to everyone. I am sorry for not making 
myself clear.

For that kind of simple 2-valued logic uncorrelated NOT IN subquery, SQL Server 
and Greenplum can generate efficient plan with null aware ANTI SEMI join.

Hash ANTI SEMI Join for NOT IN 
 Join Condition: t1.a = t2.a
 ->  Table Scan on t1  
 ->  Hash 
   ->  Table Scan on t2 

ANTI_NOTIN can be interpreted as null awareness ANTI joins. As vlsi pointed 
out, Oracle supports that too.

Of course, that requires the query executor to deal specifically with this kind 
of join. But optimizer should not be the database's bottleneck, it should 
generate as good plan as possible, if the query executor can support the 
operator, that is good. If not, downstream project just disable it.

Just like "a in (1,2,3...1000)", many databases (Greenplum, Postgres, Tidb ...) 
support implicit lookup inside the Filter operator, it doesn't need to be 
expanded to OR, or an explicit inner join. The implicit lookup is more 
efficient. I think the downstream would love to adapt to support the good plan, 
if not, they can choose to disable it, or rewrite to the form they support 
during post-processing stage after best plan is generated. But we can't limit 
us not to generate this kind of plan because some downstream project might not 
support it.

On 2020/07/21 18:41:39, Julian Hyde  wrote: 
> I want to remind everyone how hard it is to evaluate NOT IN queries.
> Here is an example query:
> 
> sqlline> !connect
> jdbc:calcite:model=core/build/resources/test/hsqldb-model.json sa sa
> > !set outputFormat csv
> > EXPLAIN PLAN FOR SELECT * FROM dept WHERE deptno NOT IN (SELECT mgr FROM 
> > emp);
> EnumerableCalc(expr#0..6=[{inputs}], expr#7=[0], expr#8=[=($t3, $t7)],
> expr#9=[IS NULL($t6)], expr#10=[>=($t4, $t3)], expr#11=[AND($t9,
> $t10)], expr#12=[OR($t8, $t11)], proj#0..2=[{exprs}],
> $condition=[$t12])
>   EnumerableHashJoin(condition=[=($0, $5)], joinType=[left])
> EnumerableNestedLoopJoin(condition=[true], joinType=[inner])
>   JdbcToEnumerableConverter
> JdbcTableScan(table=[[SCOTT, DEPT]])
>   JdbcToEnumerableConverter
> JdbcAggregate(group=[{}], c=[COUNT()], ck=[COUNT($3)])
>   JdbcTableScan(table=[[SCOTT, EMP]])
> JdbcToEnumerableConverter
>   JdbcAggregate(group=[{0, 1}])
> JdbcProject(MGR=[$3], i=[true])
>   JdbcTableScan(table=[[SCOTT, EMP]])
> > EXPLAIN PLAN WITHOUT IMPLEMENTATION FOR SELECT * FROM dept WHERE deptno NOT 
> > IN (SELECT mgr FROM emp);
> LogicalProject(DEPTNO=[$0], DNAME=[$1], LOC=[$2])
>   LogicalFilter(condition=[NOT(IN($0, {
> LogicalProject(MGR=[$3])
>   JdbcTableScan(table=[[SCOTT, EMP]])
> }))])
> JdbcTableScan(table=[[SCOTT, DEPT]])
> 
> Why is the physical plan so complicated? Because 'mgr' is nullable. We
> need to account for 3 different cases:
> 
> 1. dept.deptno is NULL (and therefore 'deptno NOT IN ...' evaluates to
> UNKNOWN for every row where dept.deptno is NULL);
> 2. dept.deptno is not NULL and the sub-query returns at least one NULL
> value for mgr (and therefore 'deptno not in ...' evaluates to UNKNOWN
> for every row);
> 3. dept.deptno is not NULL and the sub-query returns only non-NULL
> values of mgr (and therefore 'deptno not in ...' evaluates to TRUE or
> FALSE for every row).
> 
> To distinguish between cases 2 and 3, the plan counts the number of
> values and the number of not-null values from the sub-query.
> 
> I say that 'NOT IN is toxic' because a single null value in the
> sub-query affects the result. The IN sub-query returns 3 values and
> relational join can only account for two - match or not match.
> 
> There are actually cases where we care about the 3 values of IN. For
> example 'SELECT *, deptno IN (SELECT mgr FROM emp) FROM dept'. But
> usually IN occurs inside WHERE, and we can safely fold UNKNOWN into
> FALSE.
> 
> It is tempting to talk about the cases where there are no NULL keys,
> or UNKNOWN can safely be folded into FALSE. But I think we should be
> talking about 3-valued IN (e.g. the scalar sub-query in the previous
> paragraph). If we can solve that, we can easily convert to a solution
> for 3-valued NOT IN.
> 
> Julian
> 
> On Mon, Jul 20, 2020 at 11:25 PM Haisheng Yuan  wrote:
> >
> > I think they might be orthogonal.
> > It is all about sub-query.
> >
> > On 2020/07/21 05:48:54, Danny Chan  wrote:
> > > If it is only constant NOT IN predicate, how difficult it is to rewrite 
> > > it into a normal composite AND predicate before entering the planning 
> > > phrase ?
> > >
> > > Best,
> > > Danny Chan
> > > 在 2020年7月21日 +0800 PM12:

Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN

2020-07-21 Thread Haisheng Yuan
I think they might be orthogonal.
It is all about sub-query.

On 2020/07/21 05:48:54, Danny Chan  wrote: 
> If it is only constant NOT IN predicate, how difficult it is to rewrite it 
> into a normal composite AND predicate before entering the planning phrase ?
> 
> Best,
> Danny Chan
> 在 2020年7月21日 +0800 PM12:35,Haisheng Yuan ,写道:
> > Thanks Jinpeng for providing a good example for not in subquery.
> >
> > I 100% agree with you that correlated query won't be represented by 
> > ANTI_NOTIN join type, and it is not the proposal's intention. Here what we 
> > are discussing is not to use ANTI_NOTIN to represent all the NOT IN 
> > sub-queries, that is impossible. Instead, if you take a close look at the 
> > example query, it is a simple uncorrelated NOT IN sub-query. That is the 
> > target. Let's focus on that kind of query, ask ourselves this question: Can 
> > such a simple query be transformed into a ANTI join to make the plan 
> > efficient?
> >
> > Sadly no. The reality is that this kind of query is not uncommon, may be 
> > much more common than correlated NOT IN sub-queries.
> >
> >
> > Reply to Julian:
> > > > How about making a sub-query type (in RexSubQuery), so it is gone
> > > > before we reach algebra.
> > It will be nice to have a NOT_IN subquery type, without expanding NOT IN to 
> > NOT(IN).
> > However, if there is no ANTI_NOTIN in the join type (without reaching 
> > algebra), does that mean the optimizer still can't generate efficient plan 
> > for simple NOT IN sub-queries?
> >
> > > > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is
> > > > the opposite of NOT IN?!
> > It depends how people interpret ANTI. You interpret it as "opposite", I 
> > interpret it as "ANTI JOIN", means "anti join for NOT IN, instead of NOT 
> > EXISTS". But it is just a naming issue, I am OK to change it whatever name 
> > that makes sense to the community, as long as it can convey the meaning.
> >
> > Thanks,
> > Haisheng
> >
> > On 2020/07/21 03:02:20, Jinpeng Wu  wrote:
> > > Hi.
> > >
> > > In some SQL engine, the query
> > > select * from A where c1 not in ( select c1 from B where B.c2 = A.c2);
> > > is transformed to a plan like
> > > select * from A LEFT ANTI JOIN B on A.c2 = B.c2 AND (A.c1 = B.c1 OR A.c1 
> > > is
> > > null OR B.c1 is null);
> > >
> > > Here, the "LEFT ANTI JOIN" is nothing more than traditional definition. 
> > > One
> > > thing seems to be a problem is that A.c1 cannot be used as a join key in
> > > the new plan. However, the problem is also there for ANTI_NOTIN, and even
> > > other NOT-IN-SUBQUERY physical implementations.
> > >
> > > Thanks,
> > > Qiupeng
> > >
> > > On Tue, Jul 21, 2020 at 5:30 AM Julian Hyde  wrote:
> > >
> > > > How about making a sub-query type (in RexSubQuery), so it is gone
> > > > before we reach algebra.
> > > >
> > > > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is
> > > > the opposite of NOT IN?!
> > > >
> > > > On Mon, Jul 20, 2020 at 1:00 PM Haisheng Yuan  wrote:
> > > > >
> > > > > Typo:
> > > > > We can just add a security guard saying that it is supported.
> > > > > Should be
> > > > > We can just add a security guard saying that it is NOT supported.
> > > > >
> > > > > On 2020/07/20 19:57:34, Haisheng Yuan  wrote:
> > > > > > I am not sure I got your implication by "pollute". If you mean
> > > > changes, yes, it requires some changes in rules. Do we need to change
> > > > enumerables? Not necessary. We can just add a security guard saying 
> > > > that it
> > > > is supported. Not everyone requires the Enumerable operators to support
> > > > everything. More importantly, currently there is no logic or rules to
> > > > translate sub-query directly to SEMI/ANTI joins, let alone translating
> > > > directly to ANTI_NOTIN. Currently NOT IN is expanded to NOT(IN ...) 
> > > > before
> > > > entering RelNode land. That means we don't even have the chance to 
> > > > generate
> > > > the NOT IN anti join. Is that still a concern?
> > > > > >
> > > > > > Even if some day, some contributor extends Calcite's parser and
> > > > SubqueryRemovalRule to b

Re: Re: [DISCUSS] New Join Type: ANTI_NOTIN

2020-07-20 Thread Haisheng Yuan
Thanks Jinpeng for providing a good example for not in subquery.

I 100% agree with you that correlated query won't be represented by ANTI_NOTIN 
join type, and it is not the proposal's intention. Here what we are discussing 
is not to use ANTI_NOTIN to represent all the NOT IN sub-queries, that is 
impossible. Instead, if you take a close look at the example query, it is a 
simple uncorrelated NOT IN sub-query. That is the target. Let's focus on that 
kind of query, ask ourselves this question:  Can such a simple query be 
transformed into a ANTI join to make the plan efficient?

Sadly no. The reality is that this kind of query is not uncommon, may be much 
more common than correlated NOT IN sub-queries.


Reply to Julian:
> > How about making a sub-query type (in RexSubQuery), so it is gone
> > before we reach algebra.
It will be nice to have a NOT_IN subquery type, without expanding NOT IN to 
NOT(IN). 
However, if there is no  ANTI_NOTIN in the join type (without reaching 
algebra), does that mean the optimizer still can't generate efficient plan for 
simple NOT IN sub-queries?

> > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is
> > the opposite of NOT IN?!
It depends how people interpret ANTI. You interpret it as "opposite", I 
interpret it as "ANTI JOIN", means "anti join for NOT IN, instead of NOT 
EXISTS". But it is just a naming issue, I am OK to change it whatever name that 
makes sense to the community, as long as it can convey the meaning.

Thanks,
Haisheng

On 2020/07/21 03:02:20, Jinpeng Wu  wrote: 
> Hi.
> 
> In some SQL engine, the query
> select * from A where c1 not in ( select c1 from B where B.c2 = A.c2);
> is transformed to a plan like
> select * from A LEFT ANTI JOIN B on A.c2 = B.c2 AND (A.c1 = B.c1 OR A.c1 is
> null OR B.c1 is null);
> 
> Here, the "LEFT ANTI JOIN" is nothing more than traditional definition. One
> thing seems to be a problem is that A.c1 cannot be used as a join key in
> the new plan. However, the problem is also there for ANTI_NOTIN, and even
> other NOT-IN-SUBQUERY physical implementations.
> 
> Thanks,
> Qiupeng
> 
> On Tue, Jul 21, 2020 at 5:30 AM Julian Hyde  wrote:
> 
> > How about making a sub-query type (in RexSubQuery), so it is gone
> > before we reach algebra.
> >
> > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is
> > the opposite of NOT IN?!
> >
> > On Mon, Jul 20, 2020 at 1:00 PM Haisheng Yuan  wrote:
> > >
> > > Typo:
> > > We can just add a security guard saying that it is supported.
> > > Should be
> > > We can just add a security guard saying that it is NOT supported.
> > >
> > > On 2020/07/20 19:57:34, Haisheng Yuan  wrote:
> > > > I am not sure I got your implication by "pollute". If you mean
> > changes, yes, it requires some changes in rules. Do we need to change
> > enumerables? Not necessary. We can just add a security guard saying that it
> > is supported. Not everyone requires the Enumerable operators to support
> > everything. More importantly, currently there is no logic or rules to
> > translate sub-query directly to SEMI/ANTI joins, let alone translating
> > directly to ANTI_NOTIN. Currently NOT IN is expanded to NOT(IN ...) before
> > entering RelNode land. That means we don't even have the chance to generate
> > the NOT IN anti join. Is that still a concern?
> > > >
> > > > Even if some day, some contributor extends Calcite's parser and
> > SubqueryRemovalRule to be able to  transform NOT_IN subquery into NOT IN
> > anti join, we still have chance to disable it. Is that still a concern?
> > > >
> > > > There are many ways to play it safe.
> > > >
> > > > > Brainstorming: maybe we could consider it as a separate logical
> > operator
> > > > > (with its corresponding enumerable implementation)?
> > > > It doesn't sound cool. It requires much more work. You have to
> > duplicate all the rules, metadata handler that deal with LogicalJoin, and
> > for some rule that matches Join base class, you have to check it is a
> > LogicalJoin or the logical operator for ANTI_NOTIN.
> > > >
> > > > On 2020/07/20 08:28:42, Ruben Q L  wrote:
> > > > > I have some concerns that this new type would "pollute" the existing
> > Join
> > > > > logic, rules and enumerable implementations.
> > > > >
> > > > > Brainstorming: maybe we could consider it as a separate logical
> > operator
> > > > > (with its corresponding enumerable implementation)?
> > > > >

Re: [VOTE] Release apache-calcite-1.24.0 (release candidate 0)

2020-07-20 Thread Haisheng Yuan
Environment: 
Mac OS X 10.15.1, JDK 1.8.0_162

- Checked signatures and checksums, OK
- Ran unit tests (./gradlew build), OK

+1 (binding)

> * why is 4032 'breaking'?
With that change, the CalcMergeRule won't match PhysicalNode(including 
EnumerableCalc) in VolcanoPlanner. Perhaps I should elaborate in the release 
notes.

> * why is 3786 breaking? (recomputeDigest was not present in 1.23; the
> remarks about caching digests are useful, so why aren't they in the
> javadoc?)
recomputeDigest() has been there since b0dab68 (2012-05-07). I will add the 
remarks into the javadoc after release.

Thanks,
Haisheng

On 2020/07/21 01:14:17, Julian Hyde  wrote: 
> Downloaded, checked hashes, built and ran tests on Ubuntu/JDK 14;
> checked distro against git (see issue 1); reviewed release notes (see
> issue 2).
> 
> +1 (binding) but issues 1 and 2 need to be fixed right after the release.
> 
> Issue 1. License file is not the same as in source control:
> 
> diff -r ./LICENSE /tmp/apache-calcite-1.24.0-src/LICENSE
> 177a178,189
> >
> > Additional License files can be found in the 'licenses' folder located in 
> > the same directory as the LICENSE file (i.e. this file)
> >
> > - Software produced outside the ASF which is available under other licenses 
> > (not Apache-2.0)
> >
> > MIT
> > * cobyism:html5shiv:3.7.2
> > * font-awesome:font-awesome-code:4.2.0
> > * gridsim:gridsim:
> > * jekyll:jekyll:
> > * normalize:normalize:3.0.2
> > * respond:respond:1.4.2
> 
> Can you fix the release instructions that the generated LICENSE needs
> to be committed (probably at the same time you revise the release
> notes).
> 
> Issue 2. Release notes
> 
> For the 'highlights', I prefer a paragraph with hyperlinks over a list
> (see 
> https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md#1180--2018-12-21).
> 
> Regarding categorization:
> * why is 4032 'breaking'?
> * why is 3786 breaking? (recomputeDigest was not present in 1.23; the
> remarks about caching digests are useful, so why aren't they in the
> javadoc?)
> * we need a note that a bunch of methods are deprecated in this
> release and will be removed before 1.25 (see 3923, 4023 and 4079).
> This will break semantic versioning in 1.25, so is a big deal.
> * 4073, 3224, 4056, 4008, 3972, 4060 are listed as new features, but I
> think they are bug fixes or improved implementations
> * 3946, 4089, 4087 are listed as fixes but could be listed as new features
> * 4075 should be under 'test suite'
> * 4094 description does not need 'follow-up after review comments'
> * 4086 is an upgrade, so should be in 'bug fixes', not documentation
> * A few places SQL and Java keywords are not in code font (e.g. NPE,
> IllegalArgumentException, RexNode, Expression, HAVING, ARRAY, MAP,
> CAST)
> 
> Julian
> 
> On Mon, Jul 20, 2020 at 12:01 PM Michael Mior  wrote:
> >
> > +1
> >
> > Checked hash and signature and compiled and ran tests. Thanks Chunwei!
> >
> > --
> > Michael Mior
> > mm...@apache.org
> >
> > Le lun. 20 juil. 2020 à 11:41, Chunwei Lei  a écrit 
> > :
> > >
> > > Hi all,
> > >
> > > I have created a build for Apache Calcite 1.24.0, release
> > > candidate 0.
> > >
> > > Thanks to everyone who has contributed to this release.
> > >
> > > You can read the release notes here:
> > > https://github.com/apache/calcite/blob/calcite-1.24.0-rc0/site/_docs/history.md
> > >
> > > The commit to be voted upon:
> > > https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=4b5b9100e59ae4a43424156c9beabec6805f3d7c
> > >
> > > Its hash is 4b5b9100e59ae4a43424156c9beabec6805f3d7c
> > >
> > > Tag:
> > > https://github.com/apache/calcite/tree/calcite-1.24.0-rc0
> > >
> > > The artifacts to be voted on are located here:
> > > https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.24.0-rc0
> > > (revision 40574)
> > >
> > > The hashes of the artifacts are as follows:
> > > ffc7821089a444d50be228b0f0d9d8fb875c98f3b31ed0ad5a81cf5f56b9139dd353fd2c866b5bfd42a06c2a09bca579bcf6ed1e05322be1ae228fd7848f4aec
> > > *apache-calcite-1.24.0-src.tar.gz
> > >
> > > A staged Maven repository is available for review at:
> > > https://repository.apache.org/content/repositories/orgapachecalcite-1096/org/apache/calcite/
> > >
> > > Release artifacts are signed with the following key:
> > > https://www.apache.org/dist/calcite/KEYS
> > >
> > > N.B.
> > > To create the jars and test Apache Calcite: "./gradlew build".
> > >
> > > If you do not have a Java environment available, you can run the tests
> > > using docker. To do so, install docker and docker-compose, then run
> > > "docker-compose run test" from the root of the directory.
> > >
> > > Please vote on releasing this package as Apache Calcite 1.24.0.
> > >
> > > The vote is open for the next 72 hours and passes if a majority of at
> > > least three +1 PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache Calcite 1.24.0
> > > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > > [ ] -1 Do 

  1   2   3   4   5   >