Re: [DISCUSS] Towards Calcite 1.23.0

2020-05-11 Thread Haisheng Yuan
Oops, thanks for reminding. lol

On 2020/05/11 09:41:33, Stamatis Zampetakis  wrote: 
> Previously you mentioned:
> ~/vlsi-release-plugins ❯❯❯ ./gradlew prepareVote -Prc=0
> 
> Just to be sure for releasing Calcite you should execute ./gradlew
> prepareVote -Prc=0 in calcite directory (not in vlsi-release-plugins).
> 
> On Mon, May 11, 2020 at 11:03 AM Haisheng Yuan  wrote:
> 
> > Hi Stamatis,
> >
> > I tried to add all the info to the property file, still no luck:
> >
> > signing.gnupg.keyName=3CD22ABAC50DDCEF
> > asfCommitterId=hyuan
> > ghTestGitSourceUsername=test
> > ghTestGitSourcePassword=test
> > ghTestNexusUsername=test
> > ghTestNexusPassword=test
> > asfGitSourceUsername=test
> > asfGitSourcePassword=test
> > asfNexusUsername=test
> > asfNexusPassword=test
> > asfGitSourceUsername=test
> > asfGitSourcePassword=test
> > asfSvnUsername=test
> > asfSvnPassword=test
> >
> > useGpgCmd=true
> > signing.gnupg.useLegacyGpg=true
> >
> > 1: Task failed with an exception.
> > ---
> > * What went wrong:
> > Execution failed for task ':initializeNexusStagingRepository'.
> > > Failed to find staging profile for package group: com.github.vlsi
> >
> > * Try:
> > Run with --stacktrace option to get the stack trace. Run with --info or
> > --debug option to get more log output. Run with --scan to get full insights.
> >
> > ==
> >
> > 2: Task failed with an exception.
> > ---
> > * What went wrong:
> > Execution failed for task
> > ':plugins:crlf-plugin:signCrlfPluginMarkerMavenPublication'.
> > > Process 'command 'gpg'' finished with non-zero exit value 2
> >
> > * Try:
> > Run with --stacktrace option to get the stack trace. Run with --info or
> > --debug option to get more log output. Run with --scan to get full insights.
> >
> > ==
> >
> > 3: Task failed with an exception.
> > ---
> > * What went wrong:
> > Execution failed for task
> > ':plugins:checksum-dependency-plugin:signChecksum-dependencyPluginMarkerMavenPublication'.
> > > Process 'command 'gpg'' finished with non-zero exit value 2
> >
> > Thanks,
> > Haisheng
> >
> > On 2020/05/11 08:41:16, Stamatis Zampetakis  wrote:
> > > Hi Haisheng,
> > >
> > > You might be missing some gradle.properties especially those required to
> > > identify your PGP key.
> > >
> > > Have a look in the discussion in [1]. Among other info it contains also
> > the
> > > complete list of properties.
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1]
> > >
> > https://lists.apache.org/thread.html/r71feef9a44c1f4ea0f2211dc8c74b25dab143d2e3215cf7ffcfd6f56%40%3Cdev.calcite.apache.org%3E
> > >
> > > On Mon, May 11, 2020 at 9:55 AM Haisheng Yuan  wrote:
> > >
> > > > Hi,
> > > >
> > > > I saw this error during the dry-run:
> > > > ~/vlsi-release-plugins ❯❯❯ ./gradlew prepareVote -Prc=0
> > > > > Task :plugins:crlf-plugin:signCrlfPluginMarkerMavenPublication FAILED
> > > > > Task
> > > >
> > :plugins:checksum-dependency-plugin:signChecksum-dependencyPluginMarkerMavenPublication
> > > > FAILED
> > > > > Task :initializeNexusStagingRepository FAILED
> > > >
> > > > FAILURE: Build completed with 3 failures.
> > > >
> > > > 1: Task failed with an exception.
> > > > ---
> > > > * What went wrong:
> > > > Execution failed for task
> > > > ':plugins:crlf-plugin:signCrlfPluginMarkerMavenPublication'.
> > > > > Cannot perform signing task
> > > > ':plugins:crlf-plugin:signCrlfPluginMarkerMavenPublication' because it
> > has
> > > > no configured signatory
> > > >
> > > > * Try:
> > > > Run with --stacktrace option to get the stack trace. Run with --info or
> > > > --debug option to get more log output. Run with --scan to get full
> > insights.
> > > >
> > > >
> > ==
> > > >
> > > > 2: Task failed with an exception.
> > > > ---
>

Re: [DISCUSS] Towards Calcite 1.23.0

2020-05-11 Thread Haisheng Yuan
Hi Stamatis,

I tried to add all the info to the property file, still no luck:

signing.gnupg.keyName=3CD22ABAC50DDCEF
asfCommitterId=hyuan
ghTestGitSourceUsername=test
ghTestGitSourcePassword=test
ghTestNexusUsername=test
ghTestNexusPassword=test
asfGitSourceUsername=test
asfGitSourcePassword=test
asfNexusUsername=test
asfNexusPassword=test
asfGitSourceUsername=test
asfGitSourcePassword=test
asfSvnUsername=test
asfSvnPassword=test

useGpgCmd=true
signing.gnupg.useLegacyGpg=true

1: Task failed with an exception.
---
* What went wrong:
Execution failed for task ':initializeNexusStagingRepository'.
> Failed to find staging profile for package group: com.github.vlsi

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output. Run with --scan to get full insights.
==

2: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':plugins:crlf-plugin:signCrlfPluginMarkerMavenPublication'.
> Process 'command 'gpg'' finished with non-zero exit value 2

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output. Run with --scan to get full insights.
==

3: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':plugins:checksum-dependency-plugin:signChecksum-dependencyPluginMarkerMavenPublication'.
> Process 'command 'gpg'' finished with non-zero exit value 2

Thanks,
Haisheng

On 2020/05/11 08:41:16, Stamatis Zampetakis  wrote: 
> Hi Haisheng,
> 
> You might be missing some gradle.properties especially those required to
> identify your PGP key.
> 
> Have a look in the discussion in [1]. Among other info it contains also the
> complete list of properties.
> 
> Best,
> Stamatis
> 
> [1]
> https://lists.apache.org/thread.html/r71feef9a44c1f4ea0f2211dc8c74b25dab143d2e3215cf7ffcfd6f56%40%3Cdev.calcite.apache.org%3E
> 
> On Mon, May 11, 2020 at 9:55 AM Haisheng Yuan  wrote:
> 
> > Hi,
> >
> > I saw this error during the dry-run:
> > ~/vlsi-release-plugins ❯❯❯ ./gradlew prepareVote -Prc=0
> > > Task :plugins:crlf-plugin:signCrlfPluginMarkerMavenPublication FAILED
> > > Task
> > :plugins:checksum-dependency-plugin:signChecksum-dependencyPluginMarkerMavenPublication
> > FAILED
> > > Task :initializeNexusStagingRepository FAILED
> >
> > FAILURE: Build completed with 3 failures.
> >
> > 1: Task failed with an exception.
> > ---
> > * What went wrong:
> > Execution failed for task
> > ':plugins:crlf-plugin:signCrlfPluginMarkerMavenPublication'.
> > > Cannot perform signing task
> > ':plugins:crlf-plugin:signCrlfPluginMarkerMavenPublication' because it has
> > no configured signatory
> >
> > * Try:
> > Run with --stacktrace option to get the stack trace. Run with --info or
> > --debug option to get more log output. Run with --scan to get full insights.
> >
> > ==
> >
> > 2: Task failed with an exception.
> > ---
> > * What went wrong:
> > Execution failed for task
> > ':plugins:checksum-dependency-plugin:signChecksum-dependencyPluginMarkerMavenPublication'.
> > > Cannot perform signing task
> > ':plugins:checksum-dependency-plugin:signChecksum-dependencyPluginMarkerMavenPublication'
> > because it has no configured signatory
> >
> > * Try:
> > Run with --stacktrace option to get the stack trace. Run with --info or
> > --debug option to get more log output. Run with --scan to get full insights.
> >
> > ==
> >
> > 3: Task failed with an exception.
> > ---
> > * What went wrong:
> > Execution failed for task ':initializeNexusStagingRepository'.
> > > Failed to find staging profile for package group: com.github.vlsi
> >
> > Here is the gradle.properties settings:
> > ghTestGitSourceUsername=test
> > ghTestGitSourcePassword=test
> > ghTestNexusUsername=test
> > ghTestNexusPassword=test
> >
> >
> > BTW, my gpg public key has been uploaded:
> > https://pgp.mit.edu/pks/lookup?search=0x3CD22ABAC50DDCEF&op=vindex&exact=on
> >
> > And the gpg fingerprint has been updated in my apache id profile at
> > https://id.apache.org/.
> >
> > Any kind of help is apprec

Re: [DISCUSS] Towards Calcite 1.23.0

2020-05-11 Thread Haisheng Yuan
Hi,

I saw this error during the dry-run:
~/vlsi-release-plugins ❯❯❯ ./gradlew prepareVote -Prc=0
> Task :plugins:crlf-plugin:signCrlfPluginMarkerMavenPublication FAILED
> Task 
> :plugins:checksum-dependency-plugin:signChecksum-dependencyPluginMarkerMavenPublication
>  FAILED
> Task :initializeNexusStagingRepository FAILED

FAILURE: Build completed with 3 failures.

1: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':plugins:crlf-plugin:signCrlfPluginMarkerMavenPublication'.
> Cannot perform signing task 
> ':plugins:crlf-plugin:signCrlfPluginMarkerMavenPublication' because it has no 
> configured signatory

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output. Run with --scan to get full insights.
==

2: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':plugins:checksum-dependency-plugin:signChecksum-dependencyPluginMarkerMavenPublication'.
> Cannot perform signing task 
> ':plugins:checksum-dependency-plugin:signChecksum-dependencyPluginMarkerMavenPublication'
>  because it has no configured signatory

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output. Run with --scan to get full insights.
==

3: Task failed with an exception.
---
* What went wrong:
Execution failed for task ':initializeNexusStagingRepository'.
> Failed to find staging profile for package group: com.github.vlsi

Here is the gradle.properties settings:
ghTestGitSourceUsername=test
ghTestGitSourcePassword=test
ghTestNexusUsername=test
ghTestNexusPassword=test


BTW, my gpg public key has been uploaded:
https://pgp.mit.edu/pks/lookup?search=0x3CD22ABAC50DDCEF&op=vindex&exact=on

And the gpg fingerprint has been updated in my apache id profile at 
https://id.apache.org/.

Any kind of help is appreciated.

Thanks,
Haisheng

On 2020/05/06 02:09:11, Chunwei Lei  wrote: 
> Hi, Haisheng, I would like to review the PR for CALCITE-3896.
> 
> 
> Best,
> Chunwei
> 
> 
> On Wed, May 6, 2020 at 4:50 AM Rui Wang  wrote:
> 
> > Thanks for leading 1.23.0 release, Haisheng!
> >
> > I will actively contact Danny and Feng to see if we can reach a conclusion
> > on https://github.com/apache/calcite/pull/1761 by next monday.
> >
> >
> > -Rui
> >
> > On Tue, May 5, 2020 at 11:31 AM Haisheng Yuan  wrote:
> >
> > > Yes, no problem. Will have a RC before next Monday, PDT.
> > >
> > > @Roman, thanks for helping.
> > > This is the PR: https://github.com/apache/calcite/pull/1953
> > >
> > > Haisheng
> > >
> > > On 2020/05/05 14:57:48, Julian Hyde  wrote:
> > > > Thanks for volunteering, Haisheng.
> > > >
> > > > Before everyone piles in with wish-lists, can we set a timescale for
> > the
> > > release?
> > > >
> > > > I would suggest an aggressive timescale, for example an RC a week from
> > > today. Hold the RC only for mandatory cases. Other stuff gets in if it’s
> > > ready.
> > > >
> > > > Julian
> > > >
> > > > > On May 5, 2020, at 5:51 AM, Ruben Q L  wrote:
> > > > >
> > > > > Thanks for starting the discussion, Stamatis.
> > > > >
> > > > > Just a reminder that there is a blocking issue [1] that needs to be
> > > solved
> > > > > (it is a regression introduced after 1.22).
> > > > > Apart from that, it would be nice if we could include [2] in 1.23.
> > > > >
> > > > > Best regards,
> > > > > Ruben
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/CALCITE-3926
> > > > > [2] https://issues.apache.org/jira/browse/CALCITE-3951
> > > > >
> > > > >
> > > > >
> > > > >> Le mar. 5 mai 2020 à 12:34, Roman Kondakov
> > 
> > > a
> > > > >> écrit :
> > > > >>
> > > > >> Hi Haisheng,
> > > > >>
> > > > >> I would like to review CALCITE-3896. Do you have a PR?
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Kind Regards
> > > > >> Roman Kondakov
> > > > >>
> > > > >>
> > > > >>> On 05.05.2020 06:37, Haisheng

Re: Calcite in research - The ReProVide Query-Sequence Optimization in a Hardware-Accelerated DBMS

2020-05-09 Thread Haisheng Yuan
Glad to hear that.
More paper are on the way.

Haisheng

On 2020/05/10 02:54:00, Julian Hyde  wrote: 
> Calcite is being used in a research project that accelerates queries using 
> FPGAs (The ReProVide Query-Sequence Optimization in a Hardware-Accelerated 
> DBMS, by Lekshmi B. G., Andreas Becher, Klaus Meyer-Wegener, 
> https://arxiv.org/pdf/2005.01511.pdf .)
> 
> "Hardware acceleration of database query processing can be done with the help 
> of FPGAs. In particular, they are partially reconfigurable at runtime, which 
> allows for the runtime adaption of the hardware to a variety of queries.  
> This paper presents optimizations based on query sequences, which reduces the 
> impact of the reconfigurations."
> 
> "We have implemented a ReProVide prototype, which is demonstrated at the 2020 
> EDBT conference [5]. The host is running Apache Calcite [6] as a DBMS. It has 
> been extended with optimization rules to push down available operators to RPU 
> based on cost.”
> 
> Always good to hear about Calcite being used in research.
> 
> Julian
> 
> 


[jira] [Created] (CALCITE-3983) Add utility methods to RelTraitSet

2020-05-08 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3983:
--

 Summary: Add utility methods to RelTraitSet
 Key: CALCITE-3983
 URL: https://issues.apache.org/jira/browse/CALCITE-3983
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


Add the following methods to RelTraitSet:
{code:java}
boolean isDefault(boolean ignoreConvention=true);

// Convenient methods to get the reltrait defined by
// Calcite, should throw exception if default trait
// defs are not registered.
getConvention();
getDistribution();
getCollation();{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Towards Calcite 1.23.0

2020-05-05 Thread Haisheng Yuan
Yes, no problem. Will have a RC before next Monday, PDT.

@Roman, thanks for helping.
This is the PR: https://github.com/apache/calcite/pull/1953

Haisheng

On 2020/05/05 14:57:48, Julian Hyde  wrote: 
> Thanks for volunteering, Haisheng.
> 
> Before everyone piles in with wish-lists, can we set a timescale for the 
> release? 
> 
> I would suggest an aggressive timescale, for example an RC a week from today. 
> Hold the RC only for mandatory cases. Other stuff gets in if it’s ready. 
> 
> Julian
> 
> > On May 5, 2020, at 5:51 AM, Ruben Q L  wrote:
> > 
> > Thanks for starting the discussion, Stamatis.
> > 
> > Just a reminder that there is a blocking issue [1] that needs to be solved
> > (it is a regression introduced after 1.22).
> > Apart from that, it would be nice if we could include [2] in 1.23.
> > 
> > Best regards,
> > Ruben
> > 
> > [1] https://issues.apache.org/jira/browse/CALCITE-3926
> > [2] https://issues.apache.org/jira/browse/CALCITE-3951
> > 
> > 
> > 
> >> Le mar. 5 mai 2020 à 12:34, Roman Kondakov  a
> >> écrit :
> >> 
> >> Hi Haisheng,
> >> 
> >> I would like to review CALCITE-3896. Do you have a PR?
> >> 
> >> 
> >> --
> >> Kind Regards
> >> Roman Kondakov
> >> 
> >> 
> >>> On 05.05.2020 06:37, Haisheng Yuan wrote:
> >>> IIRC, I am the release manager for 1.23.0.
> >>> 
> >>> I think CALCITE-3896 (top-down trait request) is in good shape, it will
> >> be nice if it can go into 1.23.0. Can someone help review?
> >>> 
> >>> On 2020/05/04 22:58:25, Stamatis Zampetakis  wrote:
> >>>> Hello,
> >>>> 
> >>>> Calcite 1.22.0 was released about 2 months ago (on March 5, 2020).
> >> During
> >>>> the vote we said to try to have a release sooner than usual so I am
> >>>> launching the discussion now hoping we could get to 1.23.0 rather soon.
> >>>> 
> >>>> I was checking the status of the current release [1] and we have already
> >>>> over 100 resolved issues with 91 fixes so it is already pretty big.
> >>>> 
> >>>> I know that there quite a few discussions in progress so it would be
> >> good
> >>>> to see how many of the ongoing cases should be fixed for 1.23.0.
> >>>> 
> >>>> More importantly do we have a release manager for 1.23.0?
> >>>> 
> >>>> Best,
> >>>> Stamatis
> >>>> 
> >>>> [1]
> >>>> 
> >> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12333950
> >>>> 
> >> 
> 


Re: [DISCUSS] Towards Calcite 1.23.0

2020-05-04 Thread Haisheng Yuan
IIRC, I am the release manager for 1.23.0.

I think CALCITE-3896 (top-down trait request) is in good shape, it will be nice 
if it can go into 1.23.0. Can someone help review?

On 2020/05/04 22:58:25, Stamatis Zampetakis  wrote: 
> Hello,
> 
> Calcite 1.22.0 was released about 2 months ago (on March 5, 2020). During
> the vote we said to try to have a release sooner than usual so I am
> launching the discussion now hoping we could get to 1.23.0 rather soon.
> 
> I was checking the status of the current release [1] and we have already
> over 100 resolved issues with 91 fixes so it is already pretty big.
> 
> I know that there quite a few discussions in progress so it would be good
> to see how many of the ongoing cases should be fixed for 1.23.0.
> 
> More importantly do we have a release manager for 1.23.0?
> 
> Best,
> Stamatis
> 
> [1]
> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12333950
> 


[jira] [Created] (CALCITE-3968) Disable JoinPushThroughJoinRule.left by default

2020-05-02 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3968:
--

 Summary: Disable JoinPushThroughJoinRule.left by default
 Key: CALCITE-3968
 URL: https://issues.apache.org/jira/browse/CALCITE-3968
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


JoinPushThroughJoinRule.right does 
(RS)T -> (RT)S

JoinPushThroughJoinRule.left does
(RS)T -> (TS)R 

If JoinCommuteRule is enabled, only enabling JoinPushThroughJoinRule.right can 
achieve the same alternative with JoinPushThroughJoinRule.left, vice versa 
(suppose they are connected). So there is no need to enable left and right 
instances at the same time, which will slow down the optimization dramatically, 
e.g TPCH q05, q07 in TpchTest.java never finish. There is also the same bug 
report from [1].

Meanwhile, JoinPushThroughJoinRule matches RelNode.class, which is unnecessary. 
It should be just a group, or RelSet / RelSubset, as a place holder, because we 
don't care about what exactly the top join's right child is. But since the rule 
is designed to work with both HepPlanner and VolcanoPlanner, so just bear with 
the slowness.

[1] 
https://lists.apache.org/thread.html/r195c267ef15f50aa21bbcefd7bf523f8bf2495b2345fd163e91e3c36%40%3Cdev.calcite.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] New committer: Forward Xu

2020-05-01 Thread Haisheng Yuan
Congrats, Forward.

Haisheng


On 2020/05/01 20:19:47, Vineet G  wrote: 
> Congratulations!
> 
> > On Apr 30, 2020, at 10:57 AM, Julian Hyde  wrote:
> > 
> > Congratulations and welcome, Forward! Thank you for your contributions.
> > 
> > It would be great to add TBDS (and its logo) to the “powered by” page[1]. 
> > What do you think?
> > 
> > Julian
> > 
> > [1] https://calcite.apache.org/docs/powered_by.html 
> > 
> > 
> >> On Apr 29, 2020, at 5:34 AM, Forward Xu  wrote:
> >> 
> >> Thank you everyone for your warm welcome!
> >> I'm working in the TBDS team of Tencent in Shenzhen. TBDS (Tencent Big Data
> >> Suite) is similar to Alibaba's EMR, TBDS is a big data ecosystem. I am
> >> responsible for Oceanus(flink streaming jobs) and Tdbank (Tencent real-time
> >> data collection system). I‘m very happy to become calcite committer and
> >> looking forward to make more contributions.
> >> 
> >> Best,
> >> Forward
> >> 
> >> Zoltan Haindrich  于2020年4月29日周三 下午1:58写道:
> >> 
> >>> Congratulations!
> >>> 
> >>> On 4/29/20 7:31 AM, Enrico Olivelli wrote:
>  Congrats!
>  
>  Enrico
>  
>  Il Mer 29 Apr 2020, 04:52 Feng Zhu  ha scritto:
>  
> > Congrations! Forward!
> > 
> > best,
> > Feng
> > 
> > Chunwei Lei  于2020年4月29日周三 上午10:17写道:
> > 
> >> Congrats, Forward!
> >> 
> >> 
> >> 
> >> Best,
> >> Chunwei
> >> 
> >> 
> >> On Wed, Apr 29, 2020 at 6:46 AM Rui Wang  wrote:
> >> 
> >>> Congrats!
> >>> 
> >>> 
> >>> -Rui
> >>> 
> >>> On Tue, Apr 28, 2020 at 3:04 PM Francis Chuang <
> > francischu...@apache.org
> >>> 
> >>> wrote:
> >>> 
>  Congrats, Forward!
>  
>  Francis
>  
>  On 29/04/2020 7:53 am, Stamatis Zampetakis wrote:
> > Apache Calcite's Project Management Committee (PMC) has invited
> >> Forward
>  Xu
> > to
> > become a committer, and we are pleased to announce that he has
> >>> accepted.
> > 
> > Forward has been helping the project for some time now. He added
> > many
> >>> new
> > SQL
> > functions to the project and is one of our JSON experts. On top of
> >>> that,
>  and
> > other fixes, he is the one who added the Redis adapter to the
> >> project.
> > 
> > Forward, welcome, thank you for your contributions, and we look
> >> forward
>  to
> > your
> > further interactions with the community! If you wish, please feel
> >> free
> >>> to
> > tell
> > us more about yourself and what you are working on.
> > 
> > Stamatis (on behalf of the Apache Calcite PMC)
> > 
>  
> >>> 
> >> 
> > 
>  
> >>> 
> > 
> 
> 


Re: [ANNOUNCE] New committer: Wang Yanlin

2020-05-01 Thread Haisheng Yuan
Congrats, Yanlin. Looking forward to more.

Haisheng


On 2020/05/01 20:19:27, Vineet G  wrote: 
> Congratulations!
> 
> > On May 1, 2020, at 12:11 PM, Julian Hyde  wrote:
> > 
> > Yanlin,
> > 
> > Please create a PR adding an Ant Financial paragraph to that page. in
> > the PR, include the URL of the image that you would like me to add to
> > the logo diagram.
> > 
> > Julian
> > 
> > On Fri, May 1, 2020 at 8:02 AM Wang Yanlin <1989yanlinw...@163.com> wrote:
> >> 
> >> 
> >> 
> >> 
> >> Thanks Julian, having an entry on the powered by page for Ant Financial 
> >> would be nice.
> >> 
> >> 
> >> 
> >> [1] is the official homepage of the company I'm serving.
> >> 
> >> 
> >> Yanlin
> >> 
> >> 
> >> 
> >> 
> >> [1]  https://www.antfin.com/  
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> --
> >> 
> >> Best,
> >> Wang Yanlin
> >> 
> >> 
> >> 
> >> 
> >> 
> >> At 2020-05-01 01:54:38, "Julian Hyde"  wrote:
> >>> Welcome, Yanlin!
> >>> 
> >>> Thanks for your contributions so far, and thanks for introducing 
> >>> yourself. I often learn so much from committers’ self-introductions about 
> >>> how Calcite is being used.
> >>> 
> >>> I know we have other Alibaba-related projects on the “powered by” page 
> >>> [1] (Flink/Ververica, MaxCompute) but it seems that Ant Financial is a 
> >>> distinct business, so deserves its own entry on the page, and logo. What 
> >>> do you think?
> >>> 
> >>> Julian
> >>> 
> >>> [1] https://calcite.apache.org/docs/powered_by.html#alibaba-maxcompute 
> >>> 
> >>> 
>  On Apr 29, 2020, at 5:50 AM, Wang Yanlin <1989yanlinw...@163.com> wrote:
>  
>  Hi, guys, thanks for your warm welcome.
>  
>  
>  
>  I'm working in Ant Finical, Alibaba  Group. Currently my team is working 
>  on building a system to process big data in form of sql.
>  We use calcite to parse sql, optimize Relnode and rewrite SqlNode to 
>  execute on different engines, like Spark,MaxCompute, HBase and so on.
>  Calcite is really a great community, and it's really an honor for me to 
>  become calcite committer, hops to make more contribution to calcite.
>  
>  
>  Thanks again.
>  
>  --
>  
>  Best,
>  Wang Yanlin
>  
>  
>  
>  
>  
>  在 2020-04-29 13:58:35,"Zoltan Haindrich"  写道:
> > Congratulations!
> > 
> > On 4/29/20 7:32 AM, Enrico Olivelli wrote:
> >> Congrats!
> >> 
> >> Enrico
> >> 
> >> Il Mer 29 Apr 2020, 04:51 Feng Zhu  ha scritto:
> >> 
> >>> Congrations! Yanlin!
> >>> 
> >>> best,
> >>> Feng
> >>> 
> >>> Chunwei Lei  于2020年4月29日周三 上午10:16写道:
> >>> 
>  Congrats, Yanlin!
>  
>  
>  Best,
>  Chunwei
>  
>  
>  On Wed, Apr 29, 2020 at 10:07 AM Forward Xu 
>  wrote:
>  
> > Congrats
> > 
> > 
> > Best,
> > 
> > Forward
> > 
> > 953396112 <953396...@qq.com> 于2020年4月29日周三 上午8:26写道:
> > 
> >> Congrats, Wang Yanlin!
> >> 
> >> 
> >> 
> >> 
> >> ---Original---
> >> From: "Stamatis Zampetakis"
> >> Date: Wed, Apr 29, 2020 05:51 AM
> >> To: "dev";
> >> Subject: [ANNOUNCE] New committer: Wang Yanlin
> >> 
> >> 
> >> Apache Calcite's Project Management Committee (PMC) has invited 
> >> Wang
> > Yanlin
> >> to
> >> become a committer, and we are pleased to announce that he has
>  accepted.
> >> 
> >> Wang has pushed numerous fixes and improvements to the project,
> >>> landing
> > in
> >> total
> >> the impressive number of 30 commits to the master. Among other
> >>> things,
>  he
> >> contributed some important features in the Interpreter.
> >> 
> >> Wang, welcome, thank you for your contributions, and we look 
> >> forward
>  your
> >> further interactions with the community! If you wish, please feel
> >>> free
>  to
> >> tell
> >> us more about yourself and what you are working on.
> >> 
> >> Stamatis (on behalf of the Apache Calcite PMC)
> > 
>  
> >>> 
> >> 
> >>> 
> 
> 


Re: [ANNOUNCE] New committer: Jin Xing

2020-05-01 Thread Haisheng Yuan
Welcome, Jin Xing. Looking forward to more.

Haisheng

On 2020/05/01 20:43:39, Xiening Dai  wrote: 
> Congrats Jin Xing. Well deserved!
> 
> > On Apr 30, 2020, at 7:54 PM, XING JIN  wrote:
> > 
> > Thanks a lot, Julian ~
> > I'm not from MaxCompute team, but from big data platform in Alibaba Ant
> > Financial Group.
> > Actually we cooperate a lot with MaxCompute, it's our sister team.
> > 
> > Jin
> > 
> > Julian Hyde  于2020年5月1日周五 上午1:48写道:
> > 
> >> Welcome Jin! Thanks for your contributions so far, looking forward to more!
> >> 
> >> Are you on the MaxCompute project? It’s already on our “powered by”
> >> page[1], so I think people are familiar with it.
> >> 
> >> Julian
> >> 
> >> [1] https://calcite.apache.org/docs/powered_by.html#alibaba-maxcompute <
> >> https://calcite.apache.org/docs/powered_by.html#alibaba-maxcompute>
> >> 
> >> 
> >>> On Apr 29, 2020, at 5:06 AM, XING JIN  wrote:
> >>> 
> >>> Thanks a lot ~
> >>> Calcite is a great project and it's great honor for me to work with you
> >>> guys. I really appreciate the help from community.
> >>> I'm working in Alibaba. My team builds big data system to optimize batch
> >>> and streaming jobs. We use Calcite to process Sql queries and accommodate
> >>> to different physical engines.
> >>> I'm very excited to become Calcite committer and looking forward to make
> >>> more contributions.
> >>> 
> >>> Best regards,
> >>> Jin
> >>> 
> >>> 
> >>> Zoltan Haindrich  于2020年4月29日周三 下午1:58写道:
> >>> 
>  Congrats!
>  
>  On 4/29/20 7:32 AM, Enrico Olivelli wrote:
> > Congratulations!
> > 
> > Enrico
> > 
> > Il Mer 29 Apr 2020, 04:51 Feng Zhu  ha scritto:
> > 
> >> Congrations!
> >> 
> >> best,
> >> Feng
> >> 
> >> Chunwei Lei  于2020年4月29日周三 上午10:16写道:
> >> 
> >>> Congrats, Jin!
> >>> 
> >>> 
> >>> Best,
> >>> Chunwei
> >>> 
> >>> 
> >>> On Wed, Apr 29, 2020 at 10:07 AM Forward Xu 
> >>> wrote:
> >>> 
>  Congrats
>  
>  
>  best,
>  
>  Forward
>  
>  953396112 <953396...@qq.com> 于2020年4月29日周三 上午8:21写道:
>  
> > Congrats, Jin Xing!
> > 
> > 
> > ---Original---
> > From: "Stamatis Zampetakis"
> > Date: Wed, Apr 29, 2020 05:47 AM
> > To: "dev";
> > Subject: [ANNOUNCE] New committer: Jin Xing
> > 
> > 
> > Apache Calcite's Project Management Committee (PMC) has invited Jin
> >>> Xing
>  to
> > become a committer, and we are pleased to announce that he has
> >>> accepted.
> > 
> > Jin has contributed a lot of code in the project and many
> > recent improvements in
> > materialized view matching have his signature on them. Apart from
> >> code
> > contributions, Jin provides valuable help to the community by doing
>  reviews
> > and
> > answering questions in the devlist.
> > 
> > Jin, welcome, thank you for your contributions, and we look forward
> >> to
>  your
> > further interactions with the community! If you wish, please feel
> >> free
> >>> to
> > tell
> > us more about yourself and what you are working on.
> > 
> > Stamatis (on behalf of the Apache Calcite PMC)
>  
> >>> 
> >> 
> > 
>  
> >> 
> >> 
> 
> 


[jira] [Created] (CALCITE-3966) Trigger rules for existing RelSubset when it becomes delivered

2020-05-01 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3966:
--

 Summary: Trigger rules for existing RelSubset when it becomes 
delivered
 Key: CALCITE-3966
 URL: https://issues.apache.org/jira/browse/CALCITE-3966
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan
 Fix For: 1.23.0


Existing RelSubset may be first created by parent trait request, so its state 
is required. But when there is a new RelNode in the same RelSet can deliver the 
same traitset, we mark the subset as derived/delivered too, in which case, we 
should fire rule for the subset too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Towards Cascades Optimizer

2020-04-30 Thread Haisheng Yuan
es planner. I think I'll move it to some common superclass:
> >>>> either
> >>>>>>>> to existing AbstractRelOptPlanner or a new AbstractCostBasedPlanner.
> >>>>>>>> 
> >>>>>>>>> Like that it will be easier for everybody to test and see if the
> >>>> changes make this better or worse. From a backward compatibility
> >>>> perspective it seems feasible to keep the new eatures configurable for a
> >>>> certain amount of time.
> >>>>>>>> 
> >>>>>>>> I was thinking about backward compatibility in this way: what if we
> >>>> can
> >>>>>>>> switch planner in tests by setting some flag (system property?)
> >>>>>>>> somewhere and check what happens with new planner if we replace
> >>>> Volcano
> >>>>>>>> with it. And if ever it turns out that the new planner passes all
> >>>>>>>> Volcano's tests and at the same time it works more efficiently, we
> >> can
> >>>>>>>> safely replace Volcano planner with Cascades planner.
> >>>>>>>> 
> >>>>>>>>> A design doc would definitely help, especially if it has a few
> >>>> end-to-end (from logical to physical plan) examples showing how the
> >>>> optimizer works
> >>>>>>>> at each step before/after the changes. This is actually what is
> >>>> usually
> >>>>>>>> missing in research papers that makes them hard to understand.
> >>>>>>>>> I am thinking some similar to the examples that Haisheng send in
> >> the
> >>>> first  email but possibly a bit more detailed.
> >>>>>>>> 
> >>>>>>>> I agree, I'll add some exmples to the design doc very soon.
> >>>>>>>> 
> >>>>>>>>> I looked very briefly in the PR by Roman but I think I didn't see
> >>>> tests where the final plan contains operators from multiple conventions.
> >>>> Multiple conventions is among the choices that complicate certain parts
> >> of
> >>>> he existing planner so we should make sure that we take this into
> >> account.
> >>>>>>>> 
> >>>>>>>> It's on my radar. I'm going to add these tests.
> >>>>>>>> 
> >>>>>>>> I would like to ask the commutnity about my next steps on the way
> >> from
> >>>>>>>> transition of the Cascades planner from the prototype status to
> >>>> becoming
> >>>>>>>> a part of the project. I see these steps like this:
> >>>>>>>> 
> >>>>>>>> 1. Create a jira ticket.
> >>>>>>>> 2. Update design document with examples.
> >>>>>>>> 3. Make some research to obtain backward compatibility with Volcano
> >>>>>>>> planner to be able to replace Volcano planner with Cascades planner
> >> in
> >>>>>>>> test and to understand the current porblems with planner.
> >>>>>>>> 4. Solve known problems
> >>>>>>>> - materialized views
> >>>>>>>> - hints
> >>>>>>>> - multiple convetions
> >>>>>>>> - listener hooks
> >>>>>>>> - problems from p.3.
> >>>>>>>> 5. new PR, review and merge.
> >>>>>>>> 6. Replacie Volcano planner wiith Cascades after several releases.
> >>>>>>>> 
> >>>>>>>> What do you think about this roadmap?
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> --
> >>>>>>>> Kind Regards
> >>>>>>>> Roman Kondakov
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> On 27.04.2020 01:55, Stamatis Zampetakis wrote:
> >>>>>>>>> Hi all,
> >>>>>>>>> 
> >>>>>>>>> I am very excited about the ideas discussed so far and especially
> >> by
> >>>> the
> >>>>>>>>> enthusiasm of many people that are ready to help

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-26 Thread Haisheng Yuan
Hi Roman,

Excellent! This is definitely a helpful contribution to the Calcite community.
Thank you for your endeavors.

Haisheng

On 2020/04/26 19:25:00, Roman Kondakov  wrote: 
> Hi everyone!
> 
> Haisheng, thank you for bringing this subject up. A new Cascades-style
> optimizer should be definitely the next step for Apache Calcite. Many
> projects suffer from the lack of this kind of optimizer.
> 
> That was the reason why several weeks ago I started working on the
> prototype of Cascades optimizer for Apache Calcite. I was not sure that
> I would build something useful without much experience in this area. But
> now I'd like to share my work with the community. You can find the
> Cascades prototype in PR [1]. This prototype is based on the well-known
> paper [2].
> 
> What prototype can do:
> - Top-down trait request
> - Convert traits without Abstract converters
> - Top-down rule apply
> - Bottom-up trait derivation
> 
> What is not supported yet:
> - Search space pruning (but I'm going to fix it in the next commits)
> - Materialized views
> - Planner hooks
> - Hints
> - Backward compatibility with Volcano planner (some research needed)
> 
> I prepared a design doc for this planner [3], you can find many details
> there. I also opened it for comments.
> 
> I've written several basic test cases in
> org.apache.calcite.plan.cascades.CascadesPlannerTest including that was
> discussed in this thread previously in the context of trait requests:
> 
> >  MergeJoin hash[a] 
> >   | TableScan R hash[a] (RelSubset)
> >   + TableScan S hash[a] (RelSubset)
> 
> 
> Haisheng, this is very similar to what you propose. Answering your question:
> 
> > There are 2 ways:
> > a) Modify on current VolcanoPlanner.
> >   Pros: code reuse, existing numerous test cases and infrastructure, fast 
> > integration
> >   Cons: changing code always brings risk
> > 
> > b) Add a new Planner
> >   Pros: no risk, no tech debt, no need to worry about backward compatability
> >   Cons: separate test cases for new planner, one more planner to maintain
> 
> I've chosen the second approach. Because I don't have clear
> understanding how to fix Volcano planner gradually.
> 
> This new planner is very far from perfect, but I think it can be a good
> starting point for community.
> 
> Please, share your thoughts about this planner.
> 
> 
> [1] PR: https://github.com/apache/calcite/pull/1948
> [2] Paper:
> https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimizer1/xu-columbia-thesis1998.pdf
> [3] Design doc:
> https://docs.google.com/document/d/1qaV3eSKTw4gfLuBR3XB_LVIc51hxIng9k1J4VD6qnRg/edit?usp=sharing
> 
> 
> 
> -- 
> Kind Regards
> Roman Kondakov
> 
> 
> On 22.04.2020 09:52, Danny Chan wrote:
> >> Is there any recommended approach to make that happen smoothly besides
> > coding and testing work? We need to be aware that the new planner might be
> > co-exist with VolcanoPlanner for 5 or more years, or even never replace
> > VolcanoPlanner.
> > 
> > If that is true, i might say the new planner is probably with a not that
> > good design, we expect to see in advance for what cases/reasons user has
> > the reason to keep the old VolcanoPlanner and we *must* give a solution for
> > those problems in the new design.
> > 
> > I was expecting that migrating to a new planner would at least take 1 year
> > for developing, if that is true, modifying directly based on current
> > planner means for the near future 3~4 versions Calcite, there would bring
> > in huge plan changes/bugs for each release which i believe all the users of
> > Calcite don't want to see. And on one can guarantee that modifying directly
> > can keep good stability and compatibility, only the test set do.
> > 
> > From the experience of Alibaba Blink planner which has contributed to
> > Apache Flink, yes, the old/new planner would co-exist at least for 2 years.
> > For the reasons that the new and old planner has different ability in some
> > corner cases.
> > 
> > From my point of view, we should at least:
> > - Give a convincing test set for the new planner that makes us believe the
> > new planner is stable and powerful enough. I mean obviously the current
> > rule tests are far away from enough to support the new planner
> > - We should give a more detailed design doc about the new planner,
> > especially about the interfaces changes and any change that would bring in
> > the compatibility problem. Then we can make more accurate decision how much
> > work the new planner woul

Re: [ANNOUNCE] New committer: Vineet Garg

2020-04-25 Thread Haisheng Yuan
Congrats, Vineet!

On 2020/04/25 22:18:35, Forward Xu  wrote: 
> Congratulations
> 
> best,
> Forward
> 
> Francis Chuang  于2020年4月26日周日 上午6:04写道:
> 
> > Congrats, Vineet!
> >
> > On 26/04/2020 7:52 am, Stamatis Zampetakis wrote:
> > > Apache Calcite's Project Management Committee (PMC) has invited Vineet
> > > Garg to become a committer, and we are pleased to announce that he
> > > has accepted.
> > >
> > > With the first code contribution in Calcite back in 2017, Vineet is
> > > definitely
> > > not new to the project. Since then he has contributed many patches,
> > > fixing and improving various modules of Calcite, notably things around
> > > subqueries.
> > >
> > > Vineet, welcome, thank you for your contributions, and we look forward
> > > your further interactions with the community! If you wish, please feel
> > > free to tell us more about yourself and what you are working on.
> > >
> > > Stamatis (on behalf of the Apache Calcite PMC)
> > >
> >
> 


[jira] [Created] (CALCITE-3949) RelDistributions.of() and RelCollations.of() should canonize trait instance

2020-04-21 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3949:
--

 Summary: RelDistributions.of() and RelCollations.of() should 
canonize trait instance
 Key: CALCITE-3949
 URL: https://issues.apache.org/jira/browse/CALCITE-3949
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Haisheng Yuan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Towards Cascades Optimizer

2020-04-21 Thread Haisheng Yuan
Hi Andrii,

> Obviously, from what is written here, I could guess that this would require 
> me to change my physical planning rules, even if only by implementing a 
> marker interface.
You don't need to change your physical rules, it will be treated as equal as 
logical rules and be applied together with the real logical rules, no more 
logical/physical rules difference. This is also how current VolcanoPlanner 
works. 

> I don't want you to think that I somehow resent the changes you are pushing.
Don't get me wrong. I am seriously thinking of revert these changes, since most 
people like the idea of adding new planner, why don't we make all the plan 
changes in the new planner, instead of forcing people changing test cases for 
the code changes that they might not need in VolcanoPlanner during upgrade.

I didn't intend to replace VolcanoPlanner, thought just change the search 
strategy and add trait derivation mechanism, because most of the code in 
VolcanoPlanner can be reused. But since many agree to add new planner and 
replace VolcanoPlanner as the final goal, I won't be against most people's 
decision. 

Is there any recommended approach to make that happen smoothly besides coding 
and testing work? We need to be aware that the new planner might be co-exist 
with VolcanoPlanner for 5 or more years, or even never replace VolcanoPlanner.

More thoughts are welcome.

Haisheng

On 2020/04/21 19:56:25, Андрей Цвелодуб  wrote: 
> Hello Haisheng,
> 
> > To keep backward compatibility, all the un-marked rules will be treated
> as logical rules, except rules that uses AbstractConverter as rule operand,
> these rules still need to applied top-down, or random order.
> Obviously, from what is written here, I could guess that this would require
> me to change my physical planning rules, even if only by implementing a
> marker interface. I am not saying this is a bad thing, but this is a thing
> that should be communicated and planned ahead in case the VolcanoPlanner is
> modified.
> 
> > Looks like I have to revert changes in CALCITE-2970 and CALCITE-3753,
> because they will cause another tons of plan changes.
> I see you are still bitter due to all the discussions on this list lately,
> I'm sorry. I don't want you to think that I somehow resent the changes you
> are pushing, au contraire I support them and would be happy to help if I
> can. I just want the process of these changes to be executed in the best
> possible way.
> As I see there are already several opinions in this thread that basically
> align with what I am saying, so I guess I am not the crazy guy running
> around and yelling "the end is nigh!".
> 
> Thank you for taking these mumbled thoughts into account.
> 
> Bestest Regards,
> Andrii Tsvielodub
> 
> On Tue, 21 Apr 2020 at 21:08, Haisheng Yuan  wrote:
> 
> > Hi Andrii,
> >
> > > I guess changing the planner would lead to changes in tons of rules and
> > even more tests.
> > Obviously you didn't read through my email. You are not required to do any
> > changes to your rule if you don't want to, but if you do, just need to mark
> > the rule to tell planner whether it is a physical rule or not, simply by
> > implementing an empty interface.
> >
> > > many on this list already experienced problems with upgrading even
> > between the minor versions of Calcite.
> > Sorry to see the problem you have experienced when upgrading Calcite.
> > Looks like I have to revert changes in CALCITE-2970 and CALCITE-3753,
> > because they will cause another tons of plan changes.
> >
> > But I will see if I can add a setting to use the old search strategy,
> > which can be left untouched.
> >
> > Haisheng
> >
> > On 2020/04/21 06:33:08, Андрей Цвелодуб  wrote:
> > > Hello everyone,
> > >
> > > First of all, thanks for this great effort of improving the core parts of
> > > the framework we all are using,
> > > I believe this is long overdue and hope this will have benefits both for
> > > the maintainers and users of the library.
> > >
> > > I don't have anything to say about the general idea at the moment,
> > > but I want to make a point that maintaining the old implementation of
> > > VolcanoPlanner during
> > > the initial stages of implementing the new planner is absolutely
> > CRITICAL.
> > > As a lot of users of Calcite do various customizations to the engine, to
> > > the rules
> > > and all that is there in between, I believe changing the implementation
> > of
> > > the core component
> > > would have a huge impact on most users of the library. I think 

Re: RelMetadataQuery.getRowCount stackoverflow

2020-04-21 Thread Haisheng Yuan
I think ProjectMergeRule related cyclic triggering has been walked around by 
CALCITE-3774 [1].

Haisheng

[1] https://issues.apache.org/jira/browse/CALCITE-3774

On 2020/04/21 21:20:35, Stamatis Zampetakis  wrote: 
> I've seen it also quite a few times and most often the reason was cycles in
> the subsets that were triggering some rules (e.g., ProjectMergeRule [1]) to
> fire an infinite number of times [2, 3, 4].
> 
> Best,
> Stamatis
> 
> [1]
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/ProjectMergeRule.java
> [2] https://issues.apache.org/jira/browse/CALCITE-3505
> [3] https://issues.apache.org/jira/browse/CALCITE-2223
> [4] https://issues.apache.org/jira/browse/CALCITE-3124
> 
> On Tue, Apr 21, 2020 at 6:43 AM Haisheng Yuan  wrote:
> 
> > Can you add a reproducible test case and log a JIRA? It may be different
> > with CALCITE-2057. People who is interested will investigate the issue.
> >
> > On 2020/04/21 04:24:31, JiaTao Tao  wrote:
> > > Thanks
> > > I didn't add any new rule, just these:
> > >
> > > CONSTANT_REDUCTION_RULES
> > > ABSTRACT_RELATIONAL_RULES
> > > BASE_RULES
> > > ABSTRACT_RULES
> > > ENUMERABLE_RULES
> > >
> > > So this is a bug, or it just because of the call stack is too deep(if
> > this,
> > > I can adjust JVM parameter).
> > >
> > > Regards!
> > >
> > > Aron Tao
> > >
> > >
> > > Scott Reynolds  于2020年4月21日周二 上午1:10写道:
> > >
> > > > I have had this happen numerous times when writing new planner rules.
> > Most
> > > > of the time my rule is missing some boolean logic to prevent itself
> > from
> > > > transforming the call. This results in the rule continuously
> > transforming
> > > > it's previous transformations.
> > > >
> > > > I can usually see this happening when I add a
> > > > System.out.println(RelOptUtil.dumpPlan()) to the line before the
> > > > call.transformTo(newRelationNode)
> > > >
> > > > On Mon, Apr 20, 2020 at 3:13 AM JiaTao Tao 
> > wrote:
> > > >
> > > > > Hi
> > > > > Has anyone encountered this problem before? Just a simple query(no
> > more
> > > > > than 20 lines, two joins, no union).
> > > > >
> > > > > And I see this ticket:
> > > > https://issues.apache.org/jira/browse/CALCITE-2057,
> > > > > but there's no follow up, also I see flink may occur this problem(
> > > > > https://developer.aliyun.com/ask/129548)
> > > > >
> > > > > java.lang.StackOverflowError
> > > > > at java.util.HashMap.hash(HashMap.java:339)
> > > > > at java.util.HashMap.put(HashMap.java:612)
> > > > > at
> > > > >
> > > >
> > com.google.common.collect.StandardTable.getOrCreate(StandardTable.java:165)
> > > > > at
> > > > com.google.common.collect.StandardTable.put(StandardTable.java:174)
> > > > > at
> > > > com.google.common.collect.HashBasedTable.put(HashBasedTable.java:55)
> > > > > at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
> > > > > at
> > > > >
> > > >
> > org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:208)
> > > > > at
> > > > >
> > > >
> > org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:72)
> > > > > at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown
> > Source)
> > > > > at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
> > > > > at ...
> > > > >
> > > > > Regards!
> > > > >
> > > > > Aron Tao
> > > > >
> > > >
> > >
> >
> 


Re: [DISCUSS] Towards Cascades Optimizer

2020-04-21 Thread Haisheng Yuan
revent issues becoming disasters, we are not
> > vulnerable to making mistakes. This not only enables faster iterations but
> > also let us have enough time to resolve big bugs, like considering it in
> > detail and applying a time-consuming refactoring for it. To work around a
> > critical bug using tricky ways usually introduces more issues.
> >
> > Thanks,
> > Jinpeng
> >
> > On Tue, Apr 21, 2020 at 2:04 AM Xiening Dai  wrote:
> >
> > > Hi Jinpeng,
> > >
> > > Regarding this comment - I believe there are ways to calculate the
> > logical
> > > cost, but I think it’s not that simple as  "cardinality *
> > unit_copy_cost.”,
> > > would  you provide more details of other different ways? Just the
> > algorithm
> > > description or pseudo code would help us understand. Thanks.
> > >
> > > Regarding the approach of creating new planner, I don’t think a new
> > > planner would lower the risk. We don’t know what we don’t know. If we
> > > introduced an issue while modifying the planner, most likely we would do
> > > the same with new planner class. A new planner doesn’t necessarily
> > prevent
> > > the issue from happening, but just delay its surfacing, which is worse
> > IMO.
> > >
> > > There’s one obvious benefit with new planner is that we can provide some
> > > sort of isolation so the change won’t cause test baseline updates, which
> > > could be painful at times.  We should see if we could use planner config
> > to
> > > achieve the same if we decide to just modify the current planner.
> > >
> > >
> > > > On Apr 20, 2020, at 8:32 AM, Haisheng Yuan  wrote:
> > > >
> > > > Igor,
> > > >
> > > > That's great.
> > > >
> > > > On 2020/04/20 11:17:49, Seliverstov Igor  wrote:
> > > >> Haisheng, Xiening,
> > > >>
> > > >> Ok, Now I see how it should work.
> > > >>
> > > >> Thanks for your replies.
> > > >>
> > > >> Regards,
> > > >> Igor
> > > >>
> > > >>> 20 апр. 2020 г., в 09:56, Seliverstov Igor 
> > > написал(а):
> > > >>>
> > > >>> Haisheng, Xiening,
> > > >>>
> > > >>> Thanks for clarifying.
> > > >>>
> > > >>> In this proposal, we are not trying to split logical and physical
> > > planning entirely. - actually I was in doubt about an idea of entire
> > > splitting logical and physical phases, if you aren't going to, I have no
> > > objections.
> > > >>>
> > > >>> But it returns me to my first question: how we will propagate traits
> > > in bottom-up manner using proposed approach. (See [DISCUSS] Proposal to
> > add
> > > API to force rules matching specific rels for details)
> > > >>>
> > > >>> One of inconveniences of current VolcanoPlanner implementation is
> > > amount of tricks that we need to get desired behaviour. It would be great
> > > if some of issues (or all of them) were solved in the new approach.
> > > >>>
> > > >>> Regards,
> > > >>> Igor
> > > >>>
> > > >>> пн, 20 апр. 2020 г., 7:02 Xiening Dai  > > xndai@gmail.com>>:
> > > >>> Hi Igor,
> > > >>>
> > > >>> Your comment - "because actual cost may be calculated correctly using
> > > physical operators only. So won't be able to implement Branch and Bound
> > > Space Pruning.“ is actually not true. In Cascade’s lower bound / upper
> > > bound pruning algorithm, you can get cost lower bound of input RelNode
> > > using cardinality * unit_copy_cost. The unit_copy_cost is a constant,
> > which
> > > stands for the minimal cost for processing a tuple from the input. So
> > this
> > > will be the minimal cost the input RelNode can achieve, and if this is
> > > indeed larger than the current best cost, this input node can be pruned.
> > > >>>
> > > >>> In this proposal, we are not trying to split logical and physical
> > > planning entirely. But for any given equivalent set, we would need to
> > > finish exploration before implementation. But for the entire memo tree,
> > > each set could be in different planning stage, and an equivalent set can
> > be
> > > prune

[jira] [Created] (CALCITE-3944) Move dumpSets and dumpGraphviz out of VolcanoPlanner

2020-04-21 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3944:
--

 Summary: Move dumpSets and dumpGraphviz out of VolcanoPlanner
 Key: CALCITE-3944
 URL: https://issues.apache.org/jira/browse/CALCITE-3944
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


They are helper methods, will move to Dumper.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: RelMetadataQuery.getRowCount stackoverflow

2020-04-20 Thread Haisheng Yuan
Can you add a reproducible test case and log a JIRA? It may be different with 
CALCITE-2057. People who is interested will investigate the issue.

On 2020/04/21 04:24:31, JiaTao Tao  wrote: 
> Thanks
> I didn't add any new rule, just these:
> 
> CONSTANT_REDUCTION_RULES
> ABSTRACT_RELATIONAL_RULES
> BASE_RULES
> ABSTRACT_RULES
> ENUMERABLE_RULES
> 
> So this is a bug, or it just because of the call stack is too deep(if this,
> I can adjust JVM parameter).
> 
> Regards!
> 
> Aron Tao
> 
> 
> Scott Reynolds  于2020年4月21日周二 上午1:10写道:
> 
> > I have had this happen numerous times when writing new planner rules. Most
> > of the time my rule is missing some boolean logic to prevent itself from
> > transforming the call. This results in the rule continuously transforming
> > it's previous transformations.
> >
> > I can usually see this happening when I add a
> > System.out.println(RelOptUtil.dumpPlan()) to the line before the
> > call.transformTo(newRelationNode)
> >
> > On Mon, Apr 20, 2020 at 3:13 AM JiaTao Tao  wrote:
> >
> > > Hi
> > > Has anyone encountered this problem before? Just a simple query(no more
> > > than 20 lines, two joins, no union).
> > >
> > > And I see this ticket:
> > https://issues.apache.org/jira/browse/CALCITE-2057,
> > > but there's no follow up, also I see flink may occur this problem(
> > > https://developer.aliyun.com/ask/129548)
> > >
> > > java.lang.StackOverflowError
> > > at java.util.HashMap.hash(HashMap.java:339)
> > > at java.util.HashMap.put(HashMap.java:612)
> > > at
> > >
> > com.google.common.collect.StandardTable.getOrCreate(StandardTable.java:165)
> > > at
> > com.google.common.collect.StandardTable.put(StandardTable.java:174)
> > > at
> > com.google.common.collect.HashBasedTable.put(HashBasedTable.java:55)
> > > at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
> > > at
> > >
> > org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:208)
> > > at
> > >
> > org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:72)
> > > at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
> > > at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
> > > at ...
> > >
> > > Regards!
> > >
> > > Aron Tao
> > >
> >
> 


Re: [DISCUSS] Towards Cascades Optimizer

2020-04-20 Thread Haisheng Yuan
Hi Hanumath,

The trait in the example is for distribution only for brevity, not including 
collation. No matter it is hash join or merge join or nestedloop join, the same 
distribution applied.

> Are you planning to use the same interface as that of VolcanoPlanner?
Yes, not only for compatibility, but also because the design of RelNode and 
Rule plays a huge impact on how planner can be refactored. They are integrated 
components, not really separable. I have no choice.

> The concern about the optimizer can take up huge time for optimizing large 
> queries is a genuine. 
Because very few efforts were spent on improving speed of the core framework 
for the last 10 years. Most of queries(MV putting side), as long as there is no 
dynamic programming style join reordering, should be optimized in a reasonable 
span of time, otherwise it is a **bug**.  Alibaba MaxCompute processed millions 
of queries per day, they are mission-critical and time-critical, all have 
optimization and execution time limit. We spent a lot of efforts on improving 
optimization latency, all the changes we made on Calcite will be battlefield 
tested. The inefficiency issue caused by inefficient search strategy and lack 
of trait propagation mechanism, that is experienced by other Calcite based 
systems, will be resolved by this proposal. If you genuinely want 
iteration-based optimization, you will be disappointed. Because even Cascades' 
first adopter SQL Server, doesn't provide iteration-based optimization, instead 
it has a phase with very limited optimization rules for transactional 
processing. You can't have your cake and eat it.

I am inclined to modify on VolcanoPlanner. A separate new planner won't make it 
move fast and lower online production risk. No body knows the pain on 
maintaining 2 optimizers more than me. Ask Greenplum database developers how 
they feel about working on Orca and Postres planner.

Haisheng

On 2020/04/20 18:07:06, hanu mapr  wrote: 
> Hello Haisheng,
> 
> 
> Thanks for the detailed analysis on the support for cascades framework. I
> am quite interested to be part of the new optimization framework. I believe
> this a very important infrastructural work to make calcite a robust query
> optimizer.
> 
> 
> I like your approach on the trait propagation and derivation of the traits
> from child nodes. I have a question though in the example you provided.
> Shouldn't HashJoin make more sense in place of MergeJoin as it can
> passThrough the traits like Hash whereas MergeJoin needs the sorted
> traits.(Please correct me if I am missing anything here).
> 
> 
> How are you imagining the new Optimizer interface from user point of view.
> Are you planning to use the same interface as that of VolcanoPlanner?
> 
> 
> The concern about the optimizer can take up huge time for optimizing large
> queries is a genuine. At the time when I was part of writing an optimizer
> for a federated engine, I also dealt with it by making the search
> time-bound. As mentioned even iteration based approach might be viable but
> I think a little more thought might be needed to finalize.
> 
> 
> I am of the opinion that option b add a new Planner might be the right
> approach. We can continuously work on the new optimizer and make it robust
> so that switching to the new optimizer can be seamless and can be
> controlled by the user. From my experience any bug in the optimizer is
> quite subtle and can have high magnitude degradation in performance, so
> tweaking the existing VolcanoOptimizer can be risky.
> 
> 
> Thanks,
> 
> -Hanumath Maduri
> 
> On Mon, Apr 20, 2020 at 11:04 AM Xiening Dai  wrote:
> 
> > Hi Jinpeng,
> >
> > Regarding this comment - I believe there are ways to calculate the logical
> > cost, but I think it’s not that simple as  "cardinality * unit_copy_cost.”,
> > would  you provide more details of other different ways? Just the algorithm
> > description or pseudo code would help us understand. Thanks.
> >
> > Regarding the approach of creating new planner, I don’t think a new
> > planner would lower the risk. We don’t know what we don’t know. If we
> > introduced an issue while modifying the planner, most likely we would do
> > the same with new planner class. A new planner doesn’t necessarily prevent
> > the issue from happening, but just delay its surfacing, which is worse IMO.
> >
> > There’s one obvious benefit with new planner is that we can provide some
> > sort of isolation so the change won’t cause test baseline updates, which
> > could be painful at times.  We should see if we could use planner config to
> > achieve the same if we decide to just modify the current planner.
> >
> >
> > > On Apr 20, 2020, a

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-20 Thread Haisheng Yuan
Igor,

That's great.

On 2020/04/20 11:17:49, Seliverstov Igor  wrote: 
> Haisheng, Xiening,
> 
> Ok, Now I see how it should work.
> 
> Thanks for your replies.
> 
> Regards,
> Igor
> 
> > 20 апр. 2020 г., в 09:56, Seliverstov Igor  
> > написал(а):
> > 
> > Haisheng, Xiening,
> > 
> > Thanks for clarifying. 
> > 
> > In this proposal, we are not trying to split logical and physical planning 
> > entirely. - actually I was in doubt about an idea of entire splitting 
> > logical and physical phases, if you aren't going to, I have no objections.
> > 
> > But it returns me to my first question: how we will propagate traits in 
> > bottom-up manner using proposed approach. (See [DISCUSS] Proposal to add 
> > API to force rules matching specific rels for details)
> > 
> > One of inconveniences of current VolcanoPlanner implementation is amount of 
> > tricks that we need to get desired behaviour. It would be great if some of 
> > issues (or all of them) were solved in the new approach.
> > 
> > Regards,
> > Igor
> > 
> > пн, 20 апр. 2020 г., 7:02 Xiening Dai  > <mailto:xndai@gmail.com>>:
> > Hi Igor,
> > 
> > Your comment - "because actual cost may be calculated correctly using 
> > physical operators only. So won't be able to implement Branch and Bound 
> > Space Pruning.“ is actually not true. In Cascade’s lower bound / upper 
> > bound pruning algorithm, you can get cost lower bound of input RelNode 
> > using cardinality * unit_copy_cost. The unit_copy_cost is a constant, which 
> > stands for the minimal cost for processing a tuple from the input. So this 
> > will be the minimal cost the input RelNode can achieve, and if this is 
> > indeed larger than the current best cost, this input node can be pruned. 
> > 
> > In this proposal, we are not trying to split logical and physical planning 
> > entirely. But for any given equivalent set, we would need to finish 
> > exploration before implementation. But for the entire memo tree, each set 
> > could be in different planning stage, and an equivalent set can be pruned 
> > even before it’s implemented. A text book example of aforementioned “lower 
> > bound / upper bound pruning” would be the join ordering case.
> > 
> > Regarding #3, I think we can still achieve that (partially) through this 
> > proposal. Remember every time when we start the optimization, we pass down 
> > an upper bound limit. Initially this upper bound for root is infinite - 
> > meaning that no feasible plan is available. Every time when we find a 
> > physical plan we update this upper bound, then start the search again. We 
> > could stop the search when the cost is less than a pre-defined threshold - 
> > which gives you a “good enough” plan with early termination. Still this 
> > wouldn't avoid the logical exploration. For that, you would probably 
> > archive through rule configurations, and avoid some expansive 
> > transformation to keep the cost down. 
> > 
> > 
> > > On Apr 19, 2020, at 7:30 PM, Haisheng Yuan  > > <mailto:hy...@apache.org>> wrote:
> > > 
> > > Igor,
> > > 
> > > a) Given current Calcite's stats derivation strategy, mixing logical and 
> > > physical planning won't make it better. I hope you went through my email 
> > > to the end, currently operators inside a memo group don't share stats 
> > > info, each operator's stats may differ with the other one, and the 
> > > RelSubset's best node and stats may vary during optimization. So logical 
> > > and physical rule pruning are not safe at the moment, otherwise it almost 
> > > implies changing downstream project's cost model silently. 
> > > 
> > > On the other hand, ensuring child group exploration task finish first 
> > > will make rule mutual exclusivity check possible, like the result of 
> > > ReduceExpressionRule won't need trigger the same rule again, The join 
> > > result of JoinCommuteRule won't need trigger JoinCommuteRule and 
> > > ReduceExpressionRule again. 
> > > 
> > > More importantly, most if not all the the long planning queries in 
> > > Calcite are not caused by too many alternatives, but mutual triggering, 
> > > or cyclic triggering, which can be avoided by customizing the rules 
> > > instead of using the default one. Unless you use dynamic programming 
> > > (Calcite use heuristic) on join reordering (left-deep, right-deep, 
> > > bushy)

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Haisheng Yuan
Igor,

a) Given current Calcite's stats derivation strategy, mixing logical and 
physical planning won't make it better. I hope you went through my email to the 
end, currently operators inside a memo group don't share stats info, each 
operator's stats may differ with the other one, and the RelSubset's best node 
and stats may vary during optimization. So logical and physical rule pruning 
are not safe at the moment, otherwise it almost implies changing downstream 
project's cost model silently. 

On the other hand, ensuring child group exploration task finish first will make 
rule mutual exclusivity check possible, like the result of ReduceExpressionRule 
won't need trigger the same rule again, The join result of JoinCommuteRule 
won't need trigger JoinCommuteRule and ReduceExpressionRule again. 

More importantly, most if not all the the long planning queries in Calcite are 
not caused by too many alternatives, but mutual triggering, or cyclic 
triggering, which can be avoided by customizing the rules instead of using the 
default one. Unless you use dynamic programming (Calcite use heuristic) on join 
reordering (left-deep, right-deep, bushy), space pruning won't help make long / 
infinite running query faster.

b) No evidence shows current version of Calcite will return the most promising 
plan in first planning iteration. Instead of praying for getting good enough 
plan in the first iteration, why not focus on fixing rules that causes the 
issue?

c) That is not the goal.

On 2020/04/19 15:14:57, Seliverstov Igor  wrote: 
> Haisheng,
> 
> From my point of view splitting logical and physical planning parts isn’t a 
> good idea. 
> 
> I think so because actual cost may be calculated correctly using physical 
> operators only. So that we:
>  a) won't be able to implement Branch and Bound Space Pruning (as far as I 
> understand, at exploring time there are no physical operators, no correct 
> costs, but assumptions only, I don’t think we should rely on them)
>  b) won’t be able to get good enough plan (without Branch and Bound Space 
> Pruning it’s non-trivial task to get right search direction and most 
> promising plans in first planning iterations)
>  c) won’t be able to tune the good enough plan during several similar queries 
> are executed
> 
> Regards,
> Igor
> 
> 
> > 19 апр. 2020 г., в 17:37, Haisheng Yuan  написал(а):
> > 
> > Hi Igor,
> > 
> > You can't have your cake and eat it.
> > But one feasible work item definitely we can do is that once timeout, stop 
> > exploring, use the first available physical operator in each group and 
> > optimize it.
> > 
> > Because most, if not all, of the long / infinite running optimizations are 
> > caused by project related transpose, join reordering (join commute + 
> > associative), constant reduction for large expression (see CALCITE-3460), 
> > all of which are logical transformations rules and many of which have 
> > corresponding JIRAs. So another alternative is, let's fix these bugs to 
> > improve Calcite to make timeout option less usable.
> > 
> > Another thing worth noting is that sql server optimizer timeout mechanism 
> > is not based on clock time, but the number of optimization tasks it has 
> > done [1].
> > 
> > [1] 
> > https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188
> > 
> > Regards,
> > Haisheng 
> > 
> > On 2020/04/19 11:31:27, Seliverstov Igor  wrote: 
> >> Haisheng,
> >> 
> >> Ok, then such notification isn't needed
> >> 
> >> But in this case we don't have any control over how long planning takes
> >> 
> >> For some systems it's necessary to get good enough plan right now instead
> >> of best one after while
> >> 
> >> For example we've been considering a case when a query is optimised several
> >> times in short iterations in case it's impossible to get the best plan in
> >> reasonable period of time (for example there is an SLA for response time)
> >> 
> >> This mean we need all needed physical implementations after each logical
> >> transformation is applied.
> >> 
> >> Regards,
> >> Igor
> >> 
> >> 
> >> вс, 19 апр. 2020 г., 13:55 Haisheng Yuan :
> >> 
> >>> Hi Igor,
> >>> 
> >>> There will be no rule queue anymore. Y will be fully explored (logical
> >>> rules are matched and applied) before it can be implemented and optimized.
> >>> 
> >>> Thanks,
> >>>

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Haisheng Yuan
Hi Igor,

You can't have your cake and eat it.
But one feasible work item definitely we can do is that once timeout, stop 
exploring, use the first available physical operator in each group and optimize 
it.

Because most, if not all, of the long / infinite running optimizations are 
caused by project related transpose, join reordering (join commute + 
associative), constant reduction for large expression (see CALCITE-3460), all 
of which are logical transformations rules and many of which have corresponding 
JIRAs. So another alternative is, let's fix these bugs to improve Calcite to 
make timeout option less usable.

Another thing worth noting is that sql server optimizer timeout mechanism is 
not based on clock time, but the number of optimization tasks it has done [1].

[1] 
https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188

Regards,
Haisheng 

On 2020/04/19 11:31:27, Seliverstov Igor  wrote: 
> Haisheng,
> 
> Ok, then such notification isn't needed
> 
> But in this case we don't have any control over how long planning takes
> 
> For some systems it's necessary to get good enough plan right now instead
> of best one after while
> 
> For example we've been considering a case when a query is optimised several
> times in short iterations in case it's impossible to get the best plan in
> reasonable period of time (for example there is an SLA for response time)
> 
> This mean we need all needed physical implementations after each logical
> transformation is applied.
> 
> Regards,
> Igor
> 
> 
> вс, 19 апр. 2020 г., 13:55 Haisheng Yuan :
> 
> > Hi Igor,
> >
> > There will be no rule queue anymore. Y will be fully explored (logical
> > rules are matched and applied) before it can be implemented and optimized.
> >
> > Thanks,
> > Haisheng
> >
> > On 2020/04/19 10:11:45, Seliverstov Igor  wrote:
> > > Hi Haisheng,
> > >
> > > Great explanation, thanks.
> > >
> > > One thing I'd like to cover in advance is trait propagation process (I
> > need
> > > it for Apache Ignite SQL engine implementation).
> > >
> > > For example:
> > >
> > > There are two nodes: Rel X and its child node Rel Y
> > >
> > > Both nodes are in Optimized state, and there is a Logical rule for Rel Y
> > in
> > > a rules queue matched,
> > >
> > > After logical rule is applied, there is a logical rel Rel Y' and
> > containing
> > > it set moves into Exploring state (since there is a logical node which
> > has
> > > to be implemented)
> > >
> > > After whole process Exploring -> ... -> Optimized we need to force parent
> > > Rel X set to switch its state from Optimized to Optimizing to peek the
> > > newly implemented child and (possible) produce a new Rel X' physical rel
> > on
> > > the basis of previously requested from the Rel X set traits.
> > >
> > > If a new Rel X' is produced, a Rel X' parent should move its state
> > > Optimized -> Optimizing and repeat described above operations.
> > >
> > > Does it look like true?
> > >
> > > Regards,
> > > Igor
> > >
> > >
> > > вс, 19 апр. 2020 г., 6:52 Haisheng Yuan :
> > >
> > > > Hi,
> > > >
> > > > In the past few months, we have discussed a lot about Cascades style
> > > > top-down optimization, including on-demand trait derivation/request,
> > rule
> > > > apply, branch and bound space pruning. Now we think it is time to move
> > > > towards these targets.
> > > >
> > > > We will separate it into several small issues, and each one can be
> > > > integrated as a standalone, independent feature, and most importantly,
> > > > meanwhile keep backward compatibility.
> > > >
> > > > 1. Top-down trait request
> > > > In other words, pass traits requirements from parent nodes to child
> > nodes.
> > > > The trait requests happens after all the logical transformation rules
> > and
> > > > physical implementation rules are done, in a top-down manner, driven
> > from
> > > > root set. e.g.:
> > > > SELECT a, sum(c) FROM
> > > > (SELECT * FROM R JOIN S USING (a, b)) t
> > > > GROUP BY a;
> > > >
> > > > Suppose we have the following plan tree in the MEMO, and let's only
> > > > consider distribution for 

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Haisheng Yuan
Hi Igor,

There will be no rule queue anymore. Y will be fully explored (logical rules 
are matched and applied) before it can be implemented and optimized. 

Thanks,
Haisheng

On 2020/04/19 10:11:45, Seliverstov Igor  wrote: 
> Hi Haisheng,
> 
> Great explanation, thanks.
> 
> One thing I'd like to cover in advance is trait propagation process (I need
> it for Apache Ignite SQL engine implementation).
> 
> For example:
> 
> There are two nodes: Rel X and its child node Rel Y
> 
> Both nodes are in Optimized state, and there is a Logical rule for Rel Y in
> a rules queue matched,
> 
> After logical rule is applied, there is a logical rel Rel Y' and containing
> it set moves into Exploring state (since there is a logical node which has
> to be implemented)
> 
> After whole process Exploring -> ... -> Optimized we need to force parent
> Rel X set to switch its state from Optimized to Optimizing to peek the
> newly implemented child and (possible) produce a new Rel X' physical rel on
> the basis of previously requested from the Rel X set traits.
> 
> If a new Rel X' is produced, a Rel X' parent should move its state
> Optimized -> Optimizing and repeat described above operations.
> 
> Does it look like true?
> 
> Regards,
> Igor
> 
> 
> вс, 19 апр. 2020 г., 6:52 Haisheng Yuan :
> 
> > Hi,
> >
> > In the past few months, we have discussed a lot about Cascades style
> > top-down optimization, including on-demand trait derivation/request, rule
> > apply, branch and bound space pruning. Now we think it is time to move
> > towards these targets.
> >
> > We will separate it into several small issues, and each one can be
> > integrated as a standalone, independent feature, and most importantly,
> > meanwhile keep backward compatibility.
> >
> > 1. Top-down trait request
> > In other words, pass traits requirements from parent nodes to child nodes.
> > The trait requests happens after all the logical transformation rules and
> > physical implementation rules are done, in a top-down manner, driven from
> > root set. e.g.:
> > SELECT a, sum(c) FROM
> > (SELECT * FROM R JOIN S USING (a, b)) t
> > GROUP BY a;
> >
> > Suppose we have the following plan tree in the MEMO, and let's only
> > consider distribution for simplicity, each group represents a RelSet in the
> > MEMO.
> >
> >Group 1: Agg on [a]
> >Group 2:   +-- MergeJoin on [a, b]
> >Group 3:   |--- TableScan R
> >Group 4:   +--- TableScan S
> >| Group No | Operator| Derived Traits | Required Traits |
> >| --- | - | --- | --- |
> >| Group 1  | Aggregate| Hash[a] | N/A|
> >| Group 2  | MergeJoin| Hash[a,b]  | Hash[a]  |
> >| Group 3  | TableScan R | None| Hash[a,b]   |
> >| Group 4  | TableScan S | None| Hash[a,b]   |
> >
> > We will add new interface PhysicalNode (or extending RelNode) with
> > methods:
> >
> > - Pair> requireTraits(RelTraitSet required);
> >   pair.left is the current operator's new traitset, pair.right is the list
> > of children's required traitset.
> >
> > - RelNode passThrough(RelTraitSet required);
> >   Default implementation will call above method requireTraits() and
> > RelNode.copy() to create new RelNode. Available to be overriden for
> > physical operators to customize the logic.
> >
> > The planner will call above method on MergeJoin operator to pass the
> > required traits (Hash[a]) to Mergejoin's child operators.
> >
> > We will get a new MergeJoin:
> >  MergeJoin hash[a]
> >   | TableScan R hash[a] (RelSubset)
> >   + TableScan S hash[a] (RelSubset)
> >
> > Now the MEMO group looks like:
> >| Group No | Operator| Derived Traits   | Required Traits  |
> >| -- |  - |  |
> > - |
> >| Group1   | Aggregate   | Hash[a] | N/A
> > |
> >| Group2   | MergeJoin   | Hash[a,b], Hash[a]| Hash[a]
> > |
> >| Group3   | TableScan R | None| Hash[a,b], Hash[a]
> > |
> >| Group4   | TableScan S | None| Hash[a,b], Hash[a]
> > |
> >
> > Calcite user may choose to ignore / not implement the interface to keep
> > the original behavior. Each physical operator, according to its own logic,
>

[DISCUSS] Towards Cascades Optimizer

2020-04-18 Thread Haisheng Yuan
Hi,

In the past few months, we have discussed a lot about Cascades style top-down 
optimization, including on-demand trait derivation/request, rule apply, branch 
and bound space pruning. Now we think it is time to move towards these targets.

We will separate it into several small issues, and each one can be integrated 
as a standalone, independent feature, and most importantly, meanwhile keep 
backward compatibility.

1. Top-down trait request
In other words, pass traits requirements from parent nodes to child nodes. The 
trait requests happens after all the logical transformation rules and physical 
implementation rules are done, in a top-down manner, driven from root set. e.g.:
SELECT a, sum(c) FROM 
(SELECT * FROM R JOIN S USING (a, b)) t
GROUP BY a;

Suppose we have the following plan tree in the MEMO, and let's only consider 
distribution for simplicity, each group represents a RelSet in the MEMO.

   Group 1: Agg on [a]  
   Group 2:   +-- MergeJoin on [a, b]
   Group 3:   |--- TableScan R
   Group 4:   +--- TableScan S
   | Group No | Operator| Derived Traits | Required Traits |
   | --- | - | --- | --- |
   | Group 1  | Aggregate| Hash[a] | N/A|
   | Group 2  | MergeJoin| Hash[a,b]  | Hash[a]  |
   | Group 3  | TableScan R | None| Hash[a,b]   |
   | Group 4  | TableScan S | None| Hash[a,b]   |

We will add new interface PhysicalNode (or extending RelNode) with methods: 

- Pair> requireTraits(RelTraitSet required);
  pair.left is the current operator's new traitset, pair.right is the list of 
children's required traitset.

- RelNode passThrough(RelTraitSet required); 
  Default implementation will call above method requireTraits() and 
RelNode.copy() to create new RelNode. Available to be overriden for physical 
operators to customize the logic.

The planner will call above method on MergeJoin operator to pass the required 
traits (Hash[a]) to Mergejoin's child operators.

We will get a new MergeJoin:
 MergeJoin hash[a] 
  | TableScan R hash[a] (RelSubset)
  + TableScan S hash[a] (RelSubset)

Now the MEMO group looks like:
   | Group No | Operator| Derived Traits   | Required Traits  |
   | -- |  - |  | - 
|
   | Group1   | Aggregate   | Hash[a] | N/A 
   |
   | Group2   | MergeJoin   | Hash[a,b], Hash[a]| Hash[a]  |
   | Group3   | TableScan R | None| Hash[a,b], Hash[a] |
   | Group4   | TableScan S | None| Hash[a,b], Hash[a] |

Calcite user may choose to ignore / not implement the interface to keep the 
original behavior. Each physical operator, according to its own logic, decides 
whether passThrough() should pass traits down or not by returning:
- a non-null RelNode, which means it can pass down
- null object, which means can't pass down

2. Provide option to disable AbstractConverter
Once the plan can request traits in top-down way in the framework, many system 
don't need AbstractConverter anymore, since it is just a intermediate operator 
to generate physical sort / exchange. For those, we can provide option to 
disable AbstractConverter, generate physical enforcer directly by adding a 
method to interface Convention:
- RelNode enforce(RelNode node, RelTraitSet traits);

The default implementation may just calling changeTraitsUsingConverters(), but 
people may choose to override it if the system has special needs, like several 
traits must implement together, or the position of collation in RelTraitSet is 
before distribution.

3. Top-down driven, on-demand rule apply
For every RelNode in a RelSet, rule is matched and applied sequentially, newly 
generated RelNodes are added to the end of RelNode list in the RelSet waiting 
for rule apply. RuleQueue and DeferringRuleCall is not needed anymore. This 
will make space pruning and rule mutual exclusivity check possible.

There are 3 stages for each RelSet:
a). Exploration: logical transformation, yields logical nodes
b). Implementation: physical transformation, yields physical nodes
c). Optimization: trait request, enforcement

The general process looks like:
- optimize RelSet X:
implement RelSet X
for each physical relnode in RelSet X:
// pass down trait requests to child RelSets
for each child RelSet Y of current relnode:
optimize RelSet Y

- implement RelSet X:
if X has been implemented:
return
explore RelSet X
for each relnode in RelSet X:
apply physical rules
- explore RelSet X:
 if X has been explored 
return
for each relnode in RelSet X:
// ensure each child RelSet of current relnode is explored
for each child RelSet Y of current relnode:
   explore RelSet Y
apply logical rules on cur

[jira] [Created] (CALCITE-3937) Only fire rule for RelSubset when it is derived

2020-04-18 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3937:
--

 Summary: Only fire rule for RelSubset when it is derived 
 Key: CALCITE-3937
 URL: https://issues.apache.org/jira/browse/CALCITE-3937
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


It is meaningless to fire rule for RelSubset when it is generated by parent's 
trait requirement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3932) Make data type cache thread local, non-evictable

2020-04-16 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3932:
--

 Summary: Make data type cache thread local, non-evictable
 Key: CALCITE-3932
 URL: https://issues.apache.org/jira/browse/CALCITE-3932
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


Data type can be evicted out of cache, which is global, thread-safe. `=` based 
type comparison may suffer from cache eviction. 

It seems not necessary to cache them globally, because most of them are 
RelRecordType, which is query dependent, not sharable between different queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3927) RelSubset is not fired for rule when set gets merged

2020-04-15 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3927:
--

 Summary: RelSubset is not fired for rule when set gets merged
 Key: CALCITE-3927
 URL: https://issues.apache.org/jira/browse/CALCITE-3927
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Haisheng Yuan


In VolcanoPlanner, when set gets merged, planner fires rules again for RelNodes 
in both sets, but not for RelSubset. We might miss something because of this. 

If all the logical transformation rules and physical implementation rules are 
separated out and physical rules don't do logical work, we might be OK. But the 
reality is that all the things are mixed together at the moment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Flakey test on Linux (JDK 11), Avatica master

2020-04-14 Thread Haisheng Yuan
Hi Xu,

Thanks for your quick response and looking into this issue.

Looks like neoremind just opened a pull request based on your analysis (thanks 
neoremind!), can you help review?
https://github.com/apache/calcite/pull/1916

- Haisheng

--
发件人:xu
日 期:2020年04月14日 20:13:15
收件人:
主 题:Re: Flakey test on Linux (JDK 11), Avatica master

Hi Haisheng,



I spend some time debugging and find the possible root cause.



In BasicSqlType, the value of `toString` result and instance member
`digest` might be different. For TIMESTAMP without precision (precision =
-1), toString returns TIMESTAMP and digest is TIMESTAMP(0), which conflicts
with real TIMESTAMP(0) with 0 as precision.


`Interner DATATYPE_CACHE = Interners.newWeakInterner()` makes
sure SqlType is singleton between invocations, TIMESTAMP(0) might return
SqlType - TIMESTAMP without precision literally which is wrong. Because the
global cache uses weak reference, so the cache would usually invalidate
after Java GC, which mitigates the impact of the hidden bug. As for the
specific environment Linux JDK 11 situation, I could not give a reasonable
explanation yet but I suppose the cause is clear.



I find `org.apache.calcite.rel.rules.DateRangeRulesTest` could create
TIMESTAMP without precision, just right before the errors occurred, and the
error always starts from org.apache.calcite.test.SqlLimitsTest >
testPrintLimits() according to the two failure builds per your mail.



If the analysis above makes sense, I can create an issue in JIRA and fix
the problem to make digest & toString return the same value for TIMESTAMP.


Best,

Xu



Haisheng Yuan  于2020年4月14日周二 上午10:00写道:

> Hi,
>
> There are frequently flakey test on Linux(JDK 11) related with
> TIMESTAMP(0) vs TIMESTAMP.
>
> Below is failure test from [1].
> FAILURE 0.0sec, org.apache.calcite.test.SqlToRelConverterExtendedTest >
> testTableValuedFunctionTumbleWithSubQueryParam()
> 1663 org.opentest4j.AssertionFailedError: plan ==> expected: <
> 1664 LogicalProject(ORDERID=[$0], ROWTIME=[$1], window_start=[$2],
> window_end=[$3])
> 1665 LogicalTableFunctionScan(invocation=[TUMBLE($1, DESCRIPTOR($1),
> 6:INTERVAL MINUTE)], rowType=[RecordType(INTEGER ORDERID, TIMESTAMP(0)
> ROWTIME, TIMESTAMP(0) window_start, TIMESTAMP(0) window_end)])
> 1666 LogicalProject(ORDERID=[$0], ROWTIME=[$1])
> 1667 LogicalTableScan(table=[[CATALOG, SALES, SHIPMENTS]])
> 1668 > but was: <
> 1669 LogicalProject(ORDERID=[$0], ROWTIME=[$1], window_start=[$2],
> window_end=[$3])
> 1670 LogicalTableFunctionScan(invocation=[TUMBLE($1, DESCRIPTOR($1),
> 6:INTERVAL MINUTE)], rowType=[RecordType(INTEGER ORDERID, TIMESTAMP
> ROWTIME, TIMESTAMP window_start, TIMESTAMP window_end)])
> 1671 LogicalProject(ORDERID=[$0], ROWTIME=[$1])
> 1672 LogicalTableScan(table=[[CATALOG, SALES, SHIPMENTS]])
> 1673 >
>
> Below is failure test from [2]:
> testPrintLimits
> The only diff is TIMESTAMP(0) vs TIMESTAMP
>
> Any clue?
>
> [1] https://github.com/apache/calcite/runs/576436249
> [2] https://github.com/apache/calcite/runs/584340845
>
>
> - Haisheng
>
>

-- 

Best regards,

Xu Zhang (张旭)



Flakey test on Linux (JDK 11), Avatica master

2020-04-13 Thread Haisheng Yuan
Hi,

There are frequently flakey test on Linux(JDK 11) related with TIMESTAMP(0) vs 
TIMESTAMP.

Below is failure test from [1].
FAILURE 0.0sec, org.apache.calcite.test.SqlToRelConverterExtendedTest > 
testTableValuedFunctionTumbleWithSubQueryParam()
1663 org.opentest4j.AssertionFailedError: plan ==> expected: < 
1664 LogicalProject(ORDERID=[$0], ROWTIME=[$1], window_start=[$2], 
window_end=[$3]) 
1665 LogicalTableFunctionScan(invocation=[TUMBLE($1, DESCRIPTOR($1), 
6:INTERVAL MINUTE)], rowType=[RecordType(INTEGER ORDERID, TIMESTAMP(0) 
ROWTIME, TIMESTAMP(0) window_start, TIMESTAMP(0) window_end)]) 
1666 LogicalProject(ORDERID=[$0], ROWTIME=[$1]) 
1667 LogicalTableScan(table=[[CATALOG, SALES, SHIPMENTS]]) 
1668 > but was: < 
1669 LogicalProject(ORDERID=[$0], ROWTIME=[$1], window_start=[$2], 
window_end=[$3]) 
1670 LogicalTableFunctionScan(invocation=[TUMBLE($1, DESCRIPTOR($1), 
6:INTERVAL MINUTE)], rowType=[RecordType(INTEGER ORDERID, TIMESTAMP 
ROWTIME, TIMESTAMP window_start, TIMESTAMP window_end)]) 
1671 LogicalProject(ORDERID=[$0], ROWTIME=[$1]) 
1672 LogicalTableScan(table=[[CATALOG, SALES, SHIPMENTS]]) 
1673 >

Below is failure test from [2]:
testPrintLimits
The only diff is TIMESTAMP(0) vs TIMESTAMP

Any clue?

[1] https://github.com/apache/calcite/runs/576436249
[2] https://github.com/apache/calcite/runs/584340845


- Haisheng



[jira] [Created] (CALCITE-3918) SubQueryFilterRemoveRule failed to decorrelate subquery for TPCH q17

2020-04-11 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3918:
--

 Summary: SubQueryFilterRemoveRule failed to decorrelate subquery 
for TPCH q17
 Key: CALCITE-3918
 URL: https://issues.apache.org/jira/browse/CALCITE-3918
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Haisheng Yuan


Disable RelDecorrelator and run TpchTest.testQuery17(),  
SubQueryFilterRemoveRule generates plan with Correlate, which is not expected.


{code:java}
EnumerableProject(AVG_YEARLY=[/($0, 7.0:DECIMAL(2, 1))])
  EnumerableAggregate(group=[{}], agg#0=[SUM($2)])
EnumerableFilter(condition=[AND(=($3, $0), =(CAST($4):VARCHAR, 'Brand#13'), 
=(CAST($5):VARCHAR, 'JUMBO CAN'), <($1, $6))])
  EnumerableCorrelate(correlation=[$cor0], joinType=[left], 
requiredColumns=[{3}])
EnumerableNestedLoopJoin(condition=[true], joinType=[inner])
  EnumerableProject(L_PARTKEY=[$1], L_QUANTITY=[$4], 
L_EXTENDEDPRICE=[$5])
EnumerableTableScan(table=[[TPCH_01, LINEITEM]])
  EnumerableProject(P_PARTKEY=[$0], P_BRAND=[$3], P_CONTAINER=[$6])
EnumerableTableScan(table=[[TPCH_01, PART]])
EnumerableProject($f0=[*(0.2:DECIMAL(2, 1), CAST(/(CASE(=($1, 0), 
null:JavaType(class java.lang.Long), $0), $1)):JavaType(class java.lang.Long))])
  EnumerableAggregate(group=[{}], agg#0=[$SUM0($4)], agg#1=[COUNT($4)])
EnumerableFilter(condition=[=($1, $cor0.P_PARTKEY)])
  EnumerableTableScan(table=[[TPCH_01, LINEITEM]])
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3917) Revive pruned node when a rule generates RelNode that is already pruned

2020-04-11 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3917:
--

 Summary: Revive pruned node when a rule generates RelNode that is 
already pruned
 Key: CALCITE-3917
 URL: https://issues.apache.org/jira/browse/CALCITE-3917
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


Manually pruning nodes in rule can be dangerous. e.g.:
Logical transformation rule X matches RelNode A and generates RelNode B. Rule X 
thinks B is always better than A, prunes A in the rule. But if B is already in 
the MEMO, matched and pruned by another rule Y before X, and rule Y generates 
A. At the end, we may see CannotPlanException, because there is no available 
RelNode in the RelSet.

In this case, we'd better revive the node that is already pruned out by 
planner, even this can bring additional rule apply and overhead, but better 
than failure to plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3916) Apply rules bottom up by RelSet

2020-04-11 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3916:
--

 Summary: Apply rules bottom up by RelSet
 Key: CALCITE-3916
 URL: https://issues.apache.org/jira/browse/CALCITE-3916
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


Apply rules by leaf RelSet -> root RelSet order. For every RelNode in a RelSet, 
rule is matched and applied sequentially. No RuleQueue and DeferringRuleCall is 
needed anymore. This will make space pruning and rule mutual exclusivity check 
possible.

Rule that use AbstractConverter as operand is an exception, to keep backward 
compatibility, this kind of rule still needs top-down apply.

This should be done after CALCITE-3896.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Cannot create physical plan with join

2020-04-11 Thread Haisheng Yuan
I have to admit that this is the most error-prone point when creating physical 
operator for first time user. Perhaps we can add some highlighted notes or 
examples to site docs to prevent people making similar mistake again.

In Calcite, logical / physical operator is not only operator, but also plan 
expression. Physical operator is binding with physical traits, however, 
ideally, they might be better separated out.

On 2020/04/11 14:00:16, Roman Kondakov  wrote: 
> Hi Tim,
> 
> it looks like your physical converter rule for a Join node does not
> convert it's inputs to your custom FLOWDB convention. And because of it
> the PhysicalJoin is trying to get input rows from the LogicalScan.
> You have:
> 
> PhysicalJoin[FLOWDB]
>   LogicalTableScan[NONE] <- logical rels have infinite cost
>   LogicalTableScan[NONE] <- logical rels have infinite cost
> 
> but it should be
> 
> PhysicalJoin[FLOWDB]
>   PhysicalTableScan[FLOWDB]
>   PhysicalTableScan[FLOWDB]
> 
> In order to achieve it you need to convert both inputs of the
> PhysicalJoin node to the FLOWDB convention using  RelOptRule#convert()
> and RelTraitSet#replace(FLOWDBConvention.INSTANCE) methods. You can find
> examples in any join converter rule, i.e. BindableJoinRule#convert [1]
> 
> 
> [1]
> https://github.com/apache/calcite/blob/3755eb5871860f1fd5dc51990129784caa8ac0a4/core/src/main/java/org/apache/calcite/interpreter/Bindables.java#L476
> 
> -- 
> Kind Regards
> Roman Kondakov
> 
> 
> On 11.04.2020 14:22, Tim Fox wrote:
> > Hi All,
> > 
> > I have recently started using Calcite as the query parser/planner for a
> > side project. I have created a set of RelNodes corresponding to my physical
> > nodes, and a set of rules. I have created my own convention.
> > 
> > All works well for queries without a join - my physical nodes are
> > created fine (aggregates, projections, filters, table scans, all ok).
> > 
> > When I try and transform to my physical plan where the query contains a
> > join, I get the following error:
> > 
> > "There are not enough rules to produce a node with desired properties:
> > convention=FLOWDB, sort=[]. All the inputs have relevant nodes, however the
> > cost is still infinite."
> > 
> > (full error output at bottom of the post)
> > 
> > I stumbled upon this post when googling this:
> > 
> > https://issues.apache.org/jira/browse/CALCITE-3255
> > 
> > I have checked and I am specifying my convention when transforming to the
> > physical plan, and my rules seem to be set up ok.
> > 
> > There is one comment in the above linked issue that is perhaps relevant
> > 
> > "You should also supply the metadata in you convention nodes, so that our
> > metadata system can compute the cumulative cost correctly."
> > 
> > But I don't really understand what this means. Can someone explain to a
> > newb like me what metadata is required and how I provide it?
> > 
> > Many thanks,
> > 
> > full error report:
> > 
> > _INITIAL: There are not enough rules to produce a node with desired
> > properties: convention=FLOWDB, sort=[]. All the inputs have relevant nodes,
> > however the cost is still infinite.
> > 
> > Root: rel#76:Subset#3.FLOWDB.[]
> > 
> > Original rel:
> > 
> > LogicalProject(subset=[rel#76:Subset#3.FLOWDB.[]], sensor_id=[$0],
> > temp=[$2], name=[$4], country=[$5]): rowcount = 1500.0, cumulative cost =
> > {1500.0 rows, 6000.0 cpu, 0.0 io}, id = 74
> > 
> >   LogicalJoin(subset=[rel#73:Subset#2.NONE.[]], condition=[=($1, $3)],
> > joinType=[left]): rowcount = 1500.0, cumulative cost = {1500.0 rows, 0.0
> > cpu, 0.0 io}, id = 72
> > 
> > LogicalTableScan(subset=[rel#70:Subset#0.NONE.[]],
> > table=[[latest_sensor_readings]]): rowcount = 100.0, cumulative cost =
> > {100.0 rows, 101.0 cpu, 0.0 io}, id = 65
> > 
> > LogicalTableScan(subset=[rel#71:Subset#1.NONE.[]],
> > table=[[current_locations]]): rowcount = 100.0, cumulative cost = {100.0
> > rows, 101.0 cpu, 0.0 io}, id = 66
> > 
> > Sets:
> > 
> > Set#0, type: RecordType(VARCHAR sensor_id, BIGINT location_id, DOUBLE temp)
> > 
> > rel#70:Subset#0.NONE.[], best=null, importance=0.7291
> > 
> > rel#65:LogicalTableScan.NONE.[](table=[latest_sensor_readings]),
> > rowcount=100.0, cumulative cost={inf}
> > 
> > rel#84:Subset#0.FLOWDB.[], best=rel#83, importance=0.36455
> > 
> > rel#83:PhysicalTableScan.FLOWDB.[](table=[latest_sensor_readings]),
> > rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> > 
> > Set#1, type: RecordType(BIGINT location_id, VARCHAR name, VARCHAR country)
> > 
> > rel#71:Subset#1.NONE.[], best=null, importance=0.7291
> > 
> > rel#66:LogicalTableScan.NONE.[](table=[current_locations]), rowcount=100.0,
> > cumulative cost={inf}
> > 
> > rel#82:Subset#1.FLOWDB.[], best=rel#81, importance=0.36455
> > 
> > rel#81:PhysicalTableScan.FLOWDB.[](table=[current_locations]),
> > rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> > 
> > Set#2, type: RecordType(VARCHAR sensor_id, BIGINT l

[jira] [Created] (CALCITE-3911) JoinCommuteRule may generate wrong plan for SEMI/ANTI join

2020-04-10 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3911:
--

 Summary: JoinCommuteRule may generate wrong plan for SEMI/ANTI join
 Key: CALCITE-3911
 URL: https://issues.apache.org/jira/browse/CALCITE-3911
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Haisheng Yuan


JoinCommuteRule generates wrong plan for SEMI/ANTI join when swapOuter is true. 
Semi / Anti joins are not swappable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Github autolinks to JIRA issues

2020-04-08 Thread Haisheng Yuan
Thanks Francis, I have made a request:
https://issues.apache.org/jira/browse/INFRA-20096

On 2020/04/09 00:27:43, Francis Chuang  wrote: 
> We can make a request to infra to enable this.
> See [1] as an example.
> 
> [1] 
> https://issues.apache.org/jira/browse/INFRA-19655?jql=project%20%3D%20INFRA%20AND%20text%20~%20autolinks
> 
> On 9/04/2020 10:24 am, Haisheng Yuan wrote:
> > Is it possible to add autolinks to JIRA issues as shown in [1]?
> > 
> > [1] 
> > https://help.github.com/en/github/administering-a-repository/configuring-autolinks-to-reference-external-resources
> > 
> > - Haisheng
> > 
> 


Github autolinks to JIRA issues

2020-04-08 Thread Haisheng Yuan
Is it possible to add autolinks to JIRA issues as shown in [1]?

[1] 
https://help.github.com/en/github/administering-a-repository/configuring-autolinks-to-reference-external-resources

- Haisheng



Re: [DICUSS] Support building physical RelNode in Calcite

2020-04-07 Thread Haisheng Yuan
Thanks Xiening for moving this to dev list.

Unlike logical operators, with which many systems share the same structure, 
physical operators are pretty implementation dependent. RelBuilder provides the 
common interface to create aggregate, join. But what kind of physical 
aggregate? HashAgg, StreamAgg? How do I specify it as a local aggregate, or 
global aggregate? In many cases, these physical operator constructors have 
system dependent arguments, will the RelBuilder support these? Shuffle / 
Exchange operator of different system may also differ on their arguments.

The worst case is that only physical filter, project, sort can be created using 
the physical RelBuilder. 

- Haisheng

--
发件人:Xiening Dai
日 期:2020年04月08日 07:36:43
收件人:
主 题:[DICUSS] Support building physical RelNode in Calcite

Hi all,

In light of CALCITE-2970, I’d like to initiate a discussion.

Currently the framework itself does not have a way to create physical RelNode 
(RelNode with a particular convention). We completely rely on adapter rules to 
convert logical nodes into physical ones. There are a few major drawbacks with 
this approach -

1. During trait enforcement, we have to create logic node and then get it 
implemented by converter rule, even though we know the target convention. This 
results in the increase of memo size and planning time. CALCITE-2970 is one 
good example.

2. A lot of implementation rules today simply copy the inputs and create a 
physical node (examples like EnumerableProjectRule, EnumerableFilterRule, etc). 
If the framework was able to create physical node, a lot of these trivial rules 
could be eliminated.

3. There are a number of adapter rules today using RelBuilder and creating 
logical node. This is not desirable since in a lot of cases what they need are 
physical nodes instead. Creating logical nodes then getting them implemented by 
converters again is inefficient.

To solve this problem, there have been two proposals so far -

a) Extend current RelBuilder interface to support withConvention() syntax, 
which means you could build a physical node with the convention specified. This 
can be done through extending the RelNode factory withheld by the builder. 
Calcite user can register a particular factory for their own convention.

b) Extend Convention interface and add something like "RelNode enforce(RelNode 
input, RelTrait trait)". This would be used to address issue #1. A convention 
could implement this method to create exact physical node to be used for 
satisfying given trait.

I personally prefer a) since I believe this is a basic building block and it’s 
not just for enforcement. Also extending RelBuild feels nature to me - the 
intention was clearly there when this class was created (look at the class 
comments). There have been discussions about whether or not RelBuilder should 
support creating physical nodes. Although we have been using it solely for 
logical nodes building today, but that doesn’t mean we shouldn’t extend to 
physical ones. I haven't seen clear reasoning.

Would like to hear your thoughts.


Re: Re: Draft board report for April 2020

2020-04-05 Thread Haisheng Yuan
Looks good to me, thanks!

- Haisheng

--
发件人:Stamatis Zampetakis
日 期:2020年04月05日 23:52:54
收件人:
主 题:Re: Draft board report for April 2020

Thanks a lot for the feedback everyone, much appreciated.

As Danny mentioned, the amount of active committers is a problem that we
encounter quite often in the project. I am also rather busy this period
(both with work and family) and trying to help as much; unfortunately not
even close to how much I would like to. On the bright side, I do see some
motivated individuals who seem to be on the right track for committership.
Bringing new members should improve a bit the situation. Let's all do our
best to make this happen sooner rather than later (committers by reviewing
PRs and contributors by submitting high quality patches).

I don't know why I use passive voice so often; indeed active is far
superior.

Below you can find the updated version of the report based on your
suggestions.

## Description:
Apache Calcite is a highly customizable framework for parsing and planning
queries on data in a wide variety of formats. It allows database-like
access,
and in particular a SQL interface and advanced query optimization, for data
not residing in a traditional database.

Avatica is a sub-project within Calcite and provides a framework for
building
local and remote JDBC and ODBC database drivers. Avatica has an independent
release schedule and its own repository.

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Calcite was founded 2015-10-22 (4 years ago) There are currently 46
committers and 22 PMC members in this project. The Committer-to-PMC ratio is
roughly 2:1.

Community changes, past quarter:
- No new PMC members. Last addition was Haisheng Yuan on 2019-11-11.
- Feng Zhu was added as committer on 2020-02-29

## Project Activity:
Avatica 1.16.0 was released in the middle of December, including numerous
bug
fixes and security improvements while the build system has been migrated
from
Maven to gradle.

Calcite 1.22.0 was released at the beginning of March, including more than
250
resolved issues including long-awaited features such as support of SQL
hints,
important bug fixes in the core of the query planner, and many changes in
build and test infrastructure. This release was the biggest in the history
of
the project and one of the most complicated to pull out due to the big
number
of contributions.

Stamatis gave a talk about query planning and Calcite at the university of
Cergy-Pontoise, Paris, France, in March. The audience was not aware of the
existence of the project which shows that we should organize such events
more
often, at least in Europe, if we want to expand our community.

Finally, we were informed that the SuperSQL project by Tencent uses Calcite
in
order to provide a unified entrance to various data platforms such as RDBMS,
ElasticSearch, Hive, Flink, Spark, Presto, ClickHouse, etc.

## Community Health:

Activity levels on mailing lists (-8%), git (-24%) and JIRA (opened -20%,
closed
-30%) have decreased in the first quarter of 2020. Most of the big changes
 that started during the last quarter of 2019 (and initiated a lot of
 activity) have been finalized and incorporated in the release of Calcite
 1.22.0, making things calmer in the community. The big decrease of closed
 issues in JIRA may be explained by the frequency of releases that have
 slightly dropped. The number of active committers has also decreased the
past
 few months, with work and family constraints getting in the middle. To the
 above it is worth adding the period of holidays in December and January
that
 in general slows things down.

Following the decreased activity in the lists, git, and JIRA it is normal to
see the activity rates on pull requests drop (-20% opened, -31% closed) more
or less for the same reasons. Although our backlog keeps increasing (~174
open
pull requests), the community is very reactive commenting and reviewing
almost
every contribution. On the bright side, we continue to see promising
contributors who are on the right track for committership.

On Thu, Apr 2, 2020, 8:49 PM Julian Hyde  wrote:

> “A talk  was given ...”. You’re too modest, Stamatis. And as my copy
> editor said, passive voice should rarely be used.
>
> “Stamatis gave a talk ...” is better; fewer words, more information, more
> interesting to read.
>
> Julian
>
> > On Apr 1, 2020, at 3:08 PM, Stamatis Zampetakis 
> wrote:
> >
> > Attached below is a draft of this month's board report. I plan to
> submit it
> > on April 6. Please let me know if you have any additions or corrections.
> >
> > ## Description:
> > Apache Calcite is a highly customizable framework for parsing and
> planning
> > queries on data in a wide variety of formats. It allows database-like
> > access,
> > and in parti

[jira] [Created] (CALCITE-3896) Pass through parent trait requests to child operators

2020-04-04 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3896:
--

 Summary: Pass through parent trait requests to child operators
 Key: CALCITE-3896
 URL: https://issues.apache.org/jira/browse/CALCITE-3896
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


This is not on-demand trait requests as described in [mailing 
list|http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e],
 which requires the overhaul of the core planner. This ticket tries to enable 
VolcanoPlanner with basic and minimal ability to pass through parent trait 
request to child operators without rules, though may not be flexible or 
powerful, but should be able to work with current Calcite application with 
minimal changes.

The method for physical operators to implement would be:

{code:java}
interface PhysicalNode {
  RelNode passThrough(RelTraitSet required);
}
{code}

Given that Calcite's physical operators decides its child operators' traits 
when the physical operator is created in physical implementation rule, there 
are some drawback that can't be avoided. e.g., given the following plan:
{code:java}
StreamAgg on [a]
   +-- MergeJoin on [a, b, c]
   |--- TableScan foo
   +--- TableScan bar
{code}

Suppose the MergeJoin implementation rule generates several mergejoins that 
distributes by [a], [a,b], [a,b,c]. Then we pass parent operator StreamAgg's 
trait request to MergeJoin. Since MergeJoin[a] satisfies parent's request, 
nothing to do. Next pass request to MergeJoin[a,b], we get MergeJoin[a], then 
pass request to MergeJoin[a,b,c], we get MergeJoin[a] again. We know they are 
redundant and there is no need to pass through parent operator's trait request, 
but these MergeJoin operators are independent and agnostic of each other's 
existence.

The ideal way is that in physical implementation rule, during the creation of 
physical operator, it should not care about itself and its child operators' 
physical traits. But this is another different topic.

Anyway, better than nothing, once it is done, we can provide the option to 
obsolete or disable  {{AbstractConverter}}. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Join Pruning/Rewrites

2020-04-01 Thread Haisheng Yuan
In many cases, they are not equivalent, unless the top level join is a 
semi-join, e.g.
 (T1⋈T2)⋉(T2⟖T3) on T1.a=T3.a
can be optimized to
 (T1⋈T2)⋉T3 on T1.a=T3.a

It is better to give a concrete example, so that we can discuss case by case.

- Haisheng

--
发件人:Stamatis Zampetakis
日 期:2020年03月28日 01:47:30
收件人:
主 题:Re: Join Pruning/Rewrites

Hi Chris,

I think your question is equivalent to the following:

Does Calcite perform query minimization [1, 2] ?

I don't remember seeing any part in Calcite that does this at the moment
but I may be wrong.

In general the problem is NP-Hard, so no hope for treating it entirely for
arbitrary queries.
Technically it could be done for small queries, and for larger queries we
can adopt some heuristics (e.g., not look for minimal queries but try to
remove one-two-k tables and see if the queries are equivalent).
Certainly a part where contributions are very welcomed!

Best,
Stamatis

[1] http://webdam.inria.fr/Alice/pdfs/all.pdf
[2] http://pages.cs.wisc.edu/~paris/cs838-s16/lecture-notes/lecture2.pdf


On Fri, Mar 27, 2020 at 6:17 PM Chris Channing <
christopher.chann...@gmail.com> wrote:

> A note I forgot to mention earlier is that I’m starting with a
> RelBuilder...
>
> On Fri, 27 Mar 2020 at 15:59, Chris Channing <
> christopher.chann...@gmail.com>
> wrote:
>
> > Hey Folks,
> >
> > Is it currently possible to have Calcite optimize-away unnecessary joins?
> > The case I'm thinking of is where (hopefully the formatting holds up!):
> >
> > ...you have a bushy tree as follows:
> >
> >  -
> >  |   |
> >  |   |
> > || |-|
> > T1 T2  T2   T3
> >
> > ...which could be rewritten as a left-deep tree to avoid scanning T2
> again:
> >
> > |---|
> > |   |
> >   |--||
> > T1 T2 T3
> >
> > If so, does any have an example of how to do this please?
> >
> > Many thanks,
> > Chris
> >
> >
>



[jira] [Created] (CALCITE-3891) Remove use of Pair.zip in RelTraitSet.satisfies

2020-04-01 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3891:
--

 Summary: Remove use of Pair.zip in RelTraitSet.satisfies
 Key: CALCITE-3891
 URL: https://issues.apache.org/jira/browse/CALCITE-3891
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan
 Attachments: image-2020-04-01-19-05-58-306.png

It accounts too much time of the method, even this is a tiny portion overall.

 !image-2020-04-01-19-05-58-306.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Query planning takes forever on TPC-H queries #5 and #7

2020-04-01 Thread Haisheng Yuan
They are known issues. Calcite isn't doing well on join reordering, there are 
multiple reasons contributing to the inefficiency. Try to turn off the rule 
that does join associativity.

- Haisheng

--
发件人:Boyan Kolev
日 期:2020年04月01日 23:31:06
收件人:
主 题:Query planning takes forever on TPC-H queries #5 and #7

Hello,

I am running a TPC-H benchmark on a database system that uses Calcite as query 
engine.
However, two of the TPC-H queries (please see attached Query05.sql and 
Query07.sql) are failing to get planned by Calcite.
After waiting one hour for the result of the EXPLAIN, I simply interrupted.

To isolate the issues from our specific environment, I was able to reproduce 
them with Calcite alone and a set of 8 CSV files (only headers + one line of 
data) that defines the TPC-H schema.

It is easy to reproduce on a clean Calcite 1.22.0 deployment, by copying the 
attached tpchmodel.json and unzipping the attached tpch.zip into 
calcite/example/csv/src/test/resources.
Then, as in the tutorial, run the queries with ./sqlline launched from the 
calcite/example/csv folder and connected with:
!connect jdbc:calcite:model=src/test/resources/tpchmodel.json admin admin

My observations so far on the complexity of the queries are that:

- Query #5 has a cycle in the join conditions. Removing the condition "and 
s_nationkey = c_nationkey" destroys the cycle and makes the query buildable. 
The cycle, however, is not a problem by itself - keeping the cycle, but 
removing the table "region" and the associated 2 where clause conditions, also 
makes the query planning happen.

- Query #7 contains a disjunctive predicate. Without any of the sides of the OR 
expression, the query compiles without any problem.

Thanks in advance for any help and clarity on these issues!

Sincerely,
--
Boyan Kolev




[jira] [Created] (CALCITE-3889) Add apply(Mappings.Mapping) to RelTrait and RelTraitSet

2020-03-31 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3889:
--

 Summary: Add apply(Mappings.Mapping) to RelTrait and RelTraitSet
 Key: CALCITE-3889
 URL: https://issues.apache.org/jira/browse/CALCITE-3889
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


RelTrait Collation, Distribution have key indices, when we pass down the 
traitset to child or propagate to parent operator, we have to remap these keys. 
It would be nice to have {{apply(Mappings.Mapping)}} on RelTrait and 
RelTraitSet. RelDistribution already has the method, but we may want it on 
every RelTrait except Convention. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Re: Doubts about TPCH queries with volcano planner

2020-03-29 Thread Haisheng Yuan
If you enable these tests, most of them can run successfully, though maybe not 
so efficient. All you need to do is just enabling these tests.

- Haisheng

--
发件人:Stamatis Zampetakis
日 期:2020年03月29日 22:58:22
收件人:
主 题:Re: Re: Doubts about TPCH queries with volcano planner

Hey Aron,

Yes, we should fix these issues. Every @Disabled test is a future TODO
otherwise it wouldn't be there :)

Contributions are welcomed!

Stamatis

On Fri, Mar 27, 2020 at 6:36 AM JiaTao Tao  wrote:

> Hi Haishen, Chunwei
> Sorry for the pic, it just the content of TpchTest, here I quote directly:
>
> >
> >  TPCH02 @Disabled("Infinite planning")
> >  TPCH04 @Disabled("NoSuchMethodException: SqlFunctions.lt(Date, Date)")
> >  TPCH05 @Disabled("OutOfMemoryError")
> >  TPCH09 @Disabled("no method found")
> >  TPCH11 @Disabled("CannotPlanException")
> >  TPCH12 @Disabled("NoSuchMethodException: SqlFunctions.lt(Date, Date)")
> >  TPCH13 @Disabled("CannotPlanException")
> >  TPCH15 @Disabled("AssertionError")
> >  TPCH19 May be timeout
> >  TPCH22 @Disabled("IllegalArgumentException during decorrelation")
>
>
> And my statement may not be clear, what I wonder is that with calcite's
> default rule set (RelOptUtil#registerDefaultRules), why there are so many
> TPCH queries fails? IMO, we should fix these issues, cuz the default rule
> set means the general optimization.
>
> Regards!
>
> Aron Tao
>
>
> Haisheng Yuan  于2020年3月27日周五 上午11:41写道:
>
> > No one can see the picture. He has to paste the link to the picture.
> >
> > Calcite's volcano planner is production ready, without any question. You
> > need to customize the adaptor and rules to fit into your system.
> >
> > - Haisheng
> >
> > --
> > 发件人:Chunwei Lei
> > 日 期:2020年03月27日 11:28:31
> > 收件人:
> > 主 题:Re: Doubts about TPCH queries with volcano planner
> >
> > Calcite is a framework for paring and optimizing the query. The volcano
> > planner is always available if you leverage it correctly.
> >
> > Besides, am I the only person who cannot see the picture?
> >
> >
> > Best,
> > Chunwei
> >
> >
> > On Thu, Mar 26, 2020 at 1:30 PM JiaTao Tao  wrote:
> >
> > > Hi all
> > > Currently, I'm studying the volcano planner and found that many TPCH
> > > queries failed to optimize.
> > >
> > > See in org.apache.calcite.adapter.tpch.TpchTest:
> > >
> > > And here comes the question: Is volcano planner production ready? If
> so,
> > > why there are so many TPCH queries fails? Do we have plans to support
> > > these queries?
> > >
> > > [image: image.png]
> > >
> > >
> > > Regards!
> > >
> > > Aron Tao
> > >
> >
> >
>



[jira] [Created] (CALCITE-3886) Execute substitution rule according to the order they get matched

2020-03-28 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3886:
--

 Summary: Execute substitution rule according to the order they get 
matched
 Key: CALCITE-3886
 URL: https://issues.apache.org/jira/browse/CALCITE-3886
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


Currently the substitution rule is always appended to the head of the queue. We 
prefer to executing the rule according to the order they get matched. So we 
need a separate queue for substitution rule matches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Doubts about TPCH queries with volcano planner

2020-03-26 Thread Haisheng Yuan
No one can see the picture. He has to paste the link to the picture.

Calcite's volcano planner is production ready, without any question. You need 
to customize the adaptor and rules to fit into your system.

- Haisheng

--
发件人:Chunwei Lei
日 期:2020年03月27日 11:28:31
收件人:
主 题:Re: Doubts about TPCH queries with volcano planner

Calcite is a framework for paring and optimizing the query. The volcano
planner is always available if you leverage it correctly.

Besides, am I the only person who cannot see the picture?


Best,
Chunwei


On Thu, Mar 26, 2020 at 1:30 PM JiaTao Tao  wrote:

> Hi all
> Currently, I'm studying the volcano planner and found that many TPCH
> queries failed to optimize.
>
> See in org.apache.calcite.adapter.tpch.TpchTest:
>
> And here comes the question: Is volcano planner production ready? If so,
> why there are so many TPCH queries fails? Do we have plans to support
> these queries?
>
> [image: image.png]
>
>
> Regards!
>
> Aron Tao
>



[jira] [Created] (CALCITE-3879) Rel Id generator should not be static

2020-03-26 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3879:
--

 Summary: Rel Id generator should not be static
 Key: CALCITE-3879
 URL: https://issues.apache.org/jira/browse/CALCITE-3879
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


The rel id will continue increasing until the class is unloaded. We have 
observed that the id is reaching 100 million after running for just several 
days. Although we don't use the id for calculation, it might be OK for integer 
overflow, but it doesn't make sense to generate unique id globally, it will 
hard to read the dumped plan and the rel id in the log, because the rel id is 
so large. IMO, the id should be generated per query context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: NPE at VolcanoPlanner.setRoot()

2020-03-24 Thread Haisheng Yuan
Hi João,

Can you provide minimal reproducible test cases?
You can log a JIRA if you believe this is a bug.

- Haisheng

--
发件人:João Silva
日 期:2020年03月24日 23:40:12
收件人:
主 题:Re: NPE at VolcanoPlanner.setRoot()

Currently on 1.21.0. And no, I didn't implement any Convention.

Stamatis Zampetakis  escreveu no dia terça, 24/03/2020
à(s) 15:26:

> Hi João,
>
> Which Calcite version are you using?
>
> Did you implement your own Convention (and the method getInterface)?
>
> Best,
> Stamatis
>
> On Tue, Mar 24, 2020 at 4:12 PM João Silva  wrote:
>
> > I keep getting a NPE exception using the method setRoot() even though my
> > RelNode is not null. Does anyone have any idea about what could be the
> > problem?
> >
> > java.lang.NullPointerException
> > > at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1654)
> > > at
> > >
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.setRoot(VolcanoPlanner.java:296)
> > > at optimizer.VolcanoTest.main(VolcanoTest.java:134)
> > >
> >
> > Thank you.
> >
>



[jira] [Created] (CALCITE-3868) Remove redundant ruleSet and ruleNames in VolcanoPlanner

2020-03-23 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3868:
--

 Summary: Remove redundant ruleSet and ruleNames in VolcanoPlanner
 Key: CALCITE-3868
 URL: https://issues.apache.org/jira/browse/CALCITE-3868
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


ruleSet and ruleNames in VolcanoPlanner are redundant, remove them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3865) RelCollationTraitDef.canConvert should always return true

2020-03-19 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3865:
--

 Summary: RelCollationTraitDef.canConvert should always return true
 Key: CALCITE-3865
 URL: https://issues.apache.org/jira/browse/CALCITE-3865
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan
 Attachments: image-2020-03-19-16-44-25-621.png, 
image-2020-03-19-16-47-21-805.png

CALCITE-1148 introduced the following change to RelCollationTraitDef to fix 
RelTrait conversion bug, but it is just hiding the underlying issue and adding 
redundant and unnecessary check to planner.
 !image-2020-03-19-16-47-21-805.png! 

The root cause is that logical operators, especially LogicalSort can have 
traits, which is a bad design decision, and {{AggregateReduceFunctionsRule}} 
fails to adjust the column mapping in RelTraitSet. The newly created 
{{LogicalProject}} has collation on column 5 (it just copy its input's 
RelTraitSet blindly), but it only has 2 columns.
 !image-2020-03-19-16-44-25-621.png! 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2020-03-14 Thread Haisheng Yuan
Thanks for your detailed explanation, Vladimir.

Indeed, the only way to propagate traits in Calcite currently is using rules, 
which is a big pain. I can feel your pain. I tried to come up ways to implement 
the trait derivation and requirement in the framwork without breaking current 
usages, only turns out it is almost impossible. It has too many stakeholders, 
even a small change may incur opposition.

But before we get the real top-down cascades framework, there are still a lot 
you can do to improve your planner's performance. 

Since Calcite 1.22.0, I committed a change that enabes RelSubset to be used to 
trigger a rule, which can greatly reduce the number of rule calls for trait 
propagation. With your example, you need 2 rules:
1. Physical implementation rule
  match pattern: operand(LogicalProject.class)
  Produce PhysicalProject without trait
2. Project trait propagtion rule
  match pattern: operand(PhysicalProject.class, operand(RelSubset.class))
  Produce PhysicalProject with derived trait.

Since 1.23.0, we removed the rule match importances and ordering, I guess the 
can reduce the optimizatino time around 10~20% for some complex queries with 
many rule calls.

- Haisheng

--
发件人:Vladimir Ozerov
日 期:2020年03月15日 04:18:53
收件人:Haisheng Yuan
抄 送:dev@calcite.apache.org (dev@calcite.apache.org)
主 题:Re: Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching 
specific rels

Hi Haisheng,

You are right, the behavior I showed is not the default one, I should provide 
more context here. This is how we (Hazelcast) and at least Drill use the engine 
to ensure that the produced plan is optimal, I gave an example in [1]. 
In real distributed engines, we rely on physical properties heavily. Most 
notably, distribution and collation. And the fundamental problem with the 
VolcanoOptimizer, is that it cannot propagate traits in a controlled manner. 
This, in turn, forces us to use AbstractConverters and implement rules in ways, 
which appear strange to Calcite community :-). And this, in turn, leads to 
excessive rule calls and failure to plan complex queries. 

Let's consider the same tree again, but now assuming that this is not the 
complete tree, but a subtree, and there are some parent operators. Let's also 
assume that the ScanRule may produce two equivalent operators with different 
physical properties: PhysicalScan and PhysicalIndexScan[a ASC]. It is important 
to consider both alternatives in parent operators. Now let's consider two 
different ways to optimize that subtree.

1. Canonical Calcite way (default)
1.1 Perform initial rules ordering, parent rules fire first: [ProjectRule, 
ScanRule]
1.2 Invoke ProjectRule, which produces physical project without any physical 
traits
1.3 Invoke ScanRule, which produces, PhysicalScan and PhysicalIndexScan[a ASC]
Since ProjectRule was not aware of different scan alternatives, it missed 
collation, and resulting hypergraph looks like this:

-- PhysicalProject
 [PhysicalScan, PhysicalIndexScan[a ASC]]

This plan is suboptimal, because of parent operators cannot take advantage of 
collation.

2. Hazelast/Drill way:
2.1 Enable abstract converters
2.2 Rules are ordered in the same way as in example 1: [ProjectRule, ScanRule]
2.3 ProjectRule fires, enumerates physical implementations of the input. Since 
there are no physical inputs yet, it exits without any transformations
2.4 ScanRule fires and produces two physical scans
2.5 Abstract converters ensure that the ProjectRule is scheduled for execution 
again because it's input RelSet has new nodes
2.6 ProjectRule fires again, now having physical inputs, and generates one 
implementation per unique combination of physical input traits. 

As a result, we have two graphs now:

Graph 1:
-- PhysicalProject
 PhysicalScan

Graph 2:
-- PhysicalProject[a ASC]
 PhysicalIndexScan[a ASC]

Notice how we propagated physical collation. Now parent operators may take 
advantage of it, e.g. eliminate sorting, or perform streaming aggregation 
instead of blocking hash aggregation. 

This is the fundamental problem we have in Hazelcast: how to ensure the 
complete exploration of the search space without excessive rule calls. 

Very short summary: 
1. The default behavior of VolcanoOptimizer cannot explore the physical search 
space, so plans are not optimal
2. Abstract converters fix this if you follow a certain pattern in rule 
implementations (see 2.3), but generate too many rule calls, so join planning 
rules cannot be called together with other rules, which again lead to not 
optimal plans (yet, better than with p.1)
3. "Trait pull-up" proposal may fix it. But I have a feeling that pulling up 
possible trait combinations from a child node is indistinguishable from child 
node exploration, so it may be not very efficient again
4. A brand new optimizer implementation with recursive top-down approach may 
address

Re: Re: [DISCUSS] Refactor how planner rules are parameterized

2020-03-14 Thread Haisheng Yuan
I don't think it is worth the refactoring. People who want to customize the 
rule, in most cases, won't be satisfied by a different parameter, they most 
likely still need to rewrite (copy & paste) the rule with some slightly their 
own logic. For many Calcite users, the rule is not reusable even with flexible 
configurations.

- Haisheng

--
发件人:Stamatis Zampetakis
日 期:2020年03月14日 22:54:04
收件人:
主 题:Re: [DISCUSS] Refactor how planner rules are parameterized

Hello,

Apologies for the late reply but I just realised that I had written the
mail and never pressed the send button.

I think it is a nice idea and certainly a problem worth addressing. If I
understood well you're thinking something like the current constructor of
the RelBuilder [1] that accepts a Context parameter. Indeed it seems that
with this change even rules that are not designed to be configured can be
changed much more gracefully (without adding new constructors and breaking
changes).

On the other hand, some of the advantages that you mention can also be
turned into disadvantages. For instance, copying a rule without knowing the
values of the other parameters is a bit risky and might be harder to reason
about its correctness. Moreover, private constructors, final classes, etc.,
are primarily used for encapsulation purposes so allowing the state of the
rule escape somehow breaks the original design of the rule.

Another problem with respect to rules is cross convention matching and
transformations [2]. Many rules should not fire for operands that are in
different conventions; a typical example that comes in my mind is
FilterProjectTransposeRule [3]. In the same spirit most rules should not
generate mixed convention transformations. Although a different problem, I
am mentioning it here since it could affect the design of the new API.

Best,
Stamatis

[1]
https://github.com/apache/calcite/blob/f5a2fe9e360f38910f112288581040e0ced5/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L155

[2]
https://lists.apache.org/thread.html/da1860f99f8bfd6ec7d26626c428ce1c55480e7c61ae7f83060a40c2%40%3Cdev.calcite.apache.org%3E
[3]
https://github.com/apache/calcite/blob/7c27b147414c64505fa33c947100ece094caa15c/core/src/main/java/org/apache/calcite/rel/rules/FilterProjectTransposeRule.java#L57

On Thu, Feb 20, 2020 at 9:20 PM Michael Mior  wrote:

> This sounds reasonable to me. It also sounds like we could make this
> backwards compatible by retaining (but deprecating) the existing
> constructors and factory methods that will no longer be needed.
> --
> Michael Mior
> mm...@apache.org
>
> Le jeu. 20 févr. 2020 à 13:11, Julian Hyde  a écrit :
> >
> > I have an idea for a refactoring to RelOptRule. I haven’t fully thought
> it through, but I’m going to sketch it out here to see whether folks agree
> about the problems/solutions.
> >
> > It will be a breaking change (in the sense that people will have to
> change their code in order to get it to compile) but relatively safe (in
> that once the code compiles, it will have the same behavior as before).
> Also it will give Calcite developers and users a lot more flexibility going
> forward.
> >
> > The problem I see is that people often want different variants of
> planner rules. An example is FilterJoinRule, which has a 'boolean smart’
> parameter, a predicate (which returns whether to pull up filter
> conditions), operands (which determine the precise sub-classes of RelNode
> that the rule should match) and a relBuilderFactory (which controls the
> type of RelNode created by this rule).
> >
> > Suppose you have an instance of FilterJoinRule and you want to change
> ‘smart’ from true to false. The ‘smart’ parameter is immutable (good!) but
> you can’t easily create a clone of the rule because you don’t know the
> values of the other parameters. Your instance might even be (unbeknownst to
> you) a sub-class with extra parameters and a private constructor.
> >
> > So, my proposal is to put all of the config information of a RelOptRule
> into a single ‘config’ parameter that contains all relevant properties.
> Each sub-class of RelOptRule would have one constructor with just a
> ‘config’ parameter. Each config knows which sub-class of RelOptRule to
> create. Therefore it is easy to copy a config, change one or more
> properties, and create a new rule instance.
> >
> > Adding a property to a rule’s config does not require us to add or
> deprecate any constructors.
> >
> > The operands are part of the config, so if you have a rule that matches
> a EnumerableFilter on an EnumerableJoin and you want to make it match an
> EnumerableFilter on an EnumerableNestedLoopJoin, you can easily create one
> with one changed operand.
> >
> > The config is immutable and self-describing, so we can use it to
> automatically generate a unique description for each rule instance.
> >
> > Julian
> >
> > [1]
> https://github.com/apache/calcite/blob/5fa41609cb0fe310

Re: Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2020-03-14 Thread Haisheng Yuan
I agree that there should be no global rule queue, we should it do it on 
per-operator basis, which is also how other major Cascades frameworks do.

However, Calcite's VolcanoPlanner doesn't generate unnecessary rule calls as 
you described. The current process is:
1. global rule queue: ScanRule, ProjectRule
2. Call ScanRule, produce physical scan
3. Call ProjectRule, produce physical project.

Even with global rule queue of reversed order ProjectRule, ScanRule, there are 
still 2 rule calls. In your step 2, ProjectRule doesn't produce physical node, 
which is incorrect. Any rule is, and should be independent with each other 
rule. If your rule relies on other operators or rules to be explored first, 
then you should think about it twice.

- Haisheng

--
发件人:Vladimir Ozerov
日 期:2020年03月15日 01:50:10
收件人:dev@calcite.apache.org (dev@calcite.apache.org)
主 题:Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching specific 
rels

Hi Roman,

In my understanding, the proposed minor changes may only decrease the total
number of rule invocations slightly, but all principal problems remain the
same. In the top-down approach, you do not want to implement bottom logical
nodes unless it is requested explicitly by a parent operator.

It seems to me that the very first step to efficient optimizer could be a
new rule invocation infrastructure. There should be *no global rule queue*
at all. Instead, we may introduce the per-node rule queue. Then, the
optimizer performs a recursive top-down optimization dive, invoking the
rules for every operator. Consider the following simple tree:
-- LogicalProject
 LogicalScan

Assuming that we have two implementation rules ProjectRule, ScanRule, and
abstract converters enabled, VolcanoOptimizer will proceed as follows,
generating one unnecessary rule call:
1. Define global rule call queue: ProjectRule, ScanRule
2. Call ProjectRule, no new nodes are produced
3. Call ScanRule, produce physical scans, reschedule ProjectRule
4. Call ProjectRule again, produce the physical project

With the proposed approach, it will work differently:
1. Define per-operator queues:
LogicalProject -> ProjectRule
LogicalScan -> ScanRule
2. Call optimize(LogicalProject)
3. Invoke ProjectRule, which calls optimize(LogicalScan) on the input
4. Invoke ScanRule, produce physical scans, return control back to
ProjectRule
5. Produce the physical project, finish optimization

Now we have only 2 rule invocations as expected, and we reached the same
result as in the proposed minor changes. But the crucial difference is that
now we have well-defined control flow between operators: start at the top,
delegate to children. With this infrastructure in place, we will be able to
introduce more complex features, such as pruning, or partial exploration
later on.

But notice that this change will be incompatible with the current rules,
i.e. they should be re-written for the new optimization algorithm (e.g. see
step 3), which might be a blocker for the current Calcite users. So maybe
it is better to think of a new optimizer, rather than fixing
VolcanoOptimizer.

Regards,
Vladimir.


вт, 14 янв. 2020 г. в 23:52, Haisheng Yuan :

> On the other hand, if we don't preprocess and normalize the rel expression
> before going to valcano planner, still compute and keep logical/relational
> properties, like cardinality, for each operator, how can we achieve group
> seach space pruning? Given a physical group expression, its required
> property and upper bound cost C_limit, we need to get its child group
> cardinality, compute local cost C_local, so that we can calculate the
> child group's upper bound cost and pass it down.
>
> I can't figure out how we can do group pruning without shared relational
> properties.
>
> - Haisheng
>
> ------
> 发件人:Haisheng Yuan
> 日 期:2020年01月14日 12:06:17
> 收件人:dev@calcite.apache.org
> 主 题:Re: Re: [DISCUSS] Proposal to add API to force rules matching specific
> rels
>
> The example is valid if Calcite doesn't do normalization or preprocessing
> before going to VolcanoPlanner.
> But many databases and big data systems will try to preprocess the
> expression (push down predicates etc.) so that expressions in the same
> group can share the logical properties, for most case if not all. You may
> argue that it should be cost based, e.g. evaluating filter early can still
> be bad. It is true, but how accurate will the statistics be, how accurate
> will the cost model be?
>
> - Haisheng
>
> --
> 发件人:Julian Hyde
> 日 期:2020年01月13日 08:44:54
> 收件人:dev@calcite.apache.org
> 主 题:Re: [DISCUSS] Proposal to add API to force rules matching specific rels
>
> > MEMO group (RelS

Re: [DISCUSS] Adapter's performance is not that fast

2020-02-26 Thread Haisheng Yuan
For the first time it is indeed slow, because it has startup cost and code 
generation. Can you post how you test it (by single run or multiple runs) and 
how does the flame graph look like?

- Haisheng

--
发件人:Xiangwei Wei
日 期:2020年02月26日 14:01:16
收件人:
主 题:[DISCUSS] Adapter's performance is not that fast

Hi everyone.

My name is Xiangwei Wei and I'm new here. I wrote an adapter for Apache
IoTDB recently which is a database for time series data management to
support use of Apache Calcite.

However, when I finished it and tried to test its performance, I found the
adapter's performance was not that fast. It took much time compared with
the raw data query in the database backend. I thought maybe it was the
problem of my adapter. So I did the same test on the Cassandra Adapter
which is provided by Calcite source code. But it shew similar result.

Executing a simple query using Cassandra Adapter takes about 1600ms -
1800ms on my PC. However, it costs only 5 ms to do the raw data query in
the Cassandra database backend. Is this the general performance of
adapters? Or I made something wrong?

I did the test by writing a simple JDBC program to do query using standard
sql. For Cassandra Adapter, I used the data proviced in
"./cassandra/src/test/resources/twissandra.cql" and a simple sql statement
provided in the CassandraAdapterTest which is "select * from \"userline\"
where \"username\"='!PUBLIC!'".

-- 
Best,
Xiangwei Wei



[jira] [Created] (CALCITE-3819) Prune parent RelNode when merging child RelSet with parent RelSet

2020-02-24 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3819:
--

 Summary: Prune parent RelNode when merging child RelSet with 
parent RelSet
 Key: CALCITE-3819
 URL: https://issues.apache.org/jira/browse/CALCITE-3819
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


Suppose we have 2 RelSets:
RelSet A: rel1
RelSet B: rel2

rel1 is the parent of rel2.

If there is a transformation rule that transform rel1 to rel2, we will merge 
RelSet A and B. During merge process, we can safely prune rel1 to avoid further 
rule apply on rel1 and reduce search space, more importantly, avoid cyclic 
reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: [DISCUSS] Commit messages, again

2020-02-24 Thread Haisheng Yuan
> Should we expect at least one +1 from other committers before merging the
> PR? Maybe yes, maybe no.
> But at least we can leave more time to others to review the pr, rather than
> merging it rapidly.

Agreed. If the contributor or committer opens a PR (pull request), I view it as 
a
request for code review, receiving a +1 before merging might help a little bit.

But anyway, I agree that the commit messages should be self explaining and
as clear as possible.

- Haisheng

--
发件人:Chunwei Lei
日 期:2020年02月23日 11:33:15
收件人:
主 题:Re: [DISCUSS] Commit messages, again

Thanks for Julian sharing your point. I totally agree that commit messages
should make sense to end-users.
But personally, I believe the process/tool more than self-demand. So I
think a code review might help(just as Xiening said).

The individual doesn't have so much time to figure our every pr tries to
do. But does it mean no point in waiting for a review?
I don't think so. Conventions and requirements of this project are not only
for contributors, but also for committers.

Should we expect at least one +1 from other committers before merging the
PR? Maybe yes, maybe no.
But at least we can leave more time to others to review the pr, rather than
merging it rapidly.



Best,
Chunwei


On Sat, Feb 22, 2020 at 8:12 AM Julian Hyde  wrote:

> I agree with you both. If, as a reviewer, the JIRA case doesn’t tell you
> the purpose of the change — not what the implementation does, but the
> purpose — then you should simply say “I’m not going to review this until
> you improve the description”.
>
> We are software engineers. What we do is communicate. Not with computers,
> but with people. If a PR doesn’t communicate clearly, it is not ready.
>
> By the way, I’m not talking about English skills. A lot of us don’t have
> English as our first language. That’s OK. I’m not talking about mis-spelled
> words or poor grammar. Those are fine, and the reviewer could correct
> those. I’m talking about organizing your thoughts, and that transcends
> language.
>
> Julian
>
>
> > On Feb 21, 2020, at 2:40 PM, Xiening Dai  wrote:
> >
> > Hi Julian,
> >
> > If the reviewer doesn’t understand the commit message well, s/he might
> ask question which would be a good starting point to improve it.
> >
> > My question is more about the process, rather than specific person. Note
> that both Commit-Then-Review and Review-Then-Commit exist in Apache. And
> the C-T-R model is more recommended for a rapid-prototyping environment. I
> believe Calcite has well passed that stage.
> >
> >
> >> On Feb 21, 2020, at 1:05 PM, Julian Hyde  wrote:
> >>
> >> I don't think a code review would necessarily have solved this one.
> >> Especially if the reviewer focused on the code and tests, as reviewers
> >> often do.
> >>
> >> Danny always produces high quality work. That was part of the reason
> >> that I picked on him rather than someone else. I don't think the
> >> solution is for Danny should ask for review on every change he does. I
> >> think the solution is for committers (including committers who are
> >> reviewing their own work) to be aware that they are providing a
> >> one-line description of their change to someone who probably does not
> >> write Java code.
> >>
> >> Code review policy (CTR, RTC, etc.) could be a subject for a different
> >> email thread, but my opinion is that 'it depends'. We should not apply
> >> the same rule (CTR, RTC) for every commit. Committers should have
> >> discretion.
> >>
> >> Julian
> >>
> >> On Fri, Feb 21, 2020 at 12:51 PM Xiening Dai 
> wrote:
> >>>
> >>> I also notice that this particular change (
> https://github.com/apache/calcite/commit/938614cca8c30ed9ff48996ff0ae42e1ed4f1706)
> was committed without going through code review. Do we have any process in
> place for merging a change? With the size and complexity of Calcite
> project, I would expect at least one +1 from other committers before
> merging the PR. Issues with the commit message would have been raised up if
> there was a proper code review.
> >>>
> >>>
>  On Feb 21, 2020, at 9:47 AM, Julian Hyde  wrote:
> 
>  Committers, please remember that every time you accept a change, you
>  are writing the release notes. The commit messages will end up here:
>  https://calcite.apache.org/docs/history.html
> 
>  So, those commit messages need to make sense to end users.
> 
>  This is important because, as a project, we write very little
>  documentation. The only guide to what features are being added to the
>  project (other than reading the code) is those release notes. So,
>  those release notes need to be perfect.
> 
>  Case in point, this change:
> 
> https://github.com/apache/calcite/commit/938614cca8c30ed9ff48996ff0ae42e1ed4f1706
> ,
>  with message "Support hint option key as string literal"
> 
>  As a SQL user I might happen to know that identifiers are

Re: Re: Translation of SQL EXISTS

2020-02-19 Thread Haisheng Yuan
Hi Christian,

Yes, I do agree it is better to generate SEMI/ANTI during Sql-to-Rel conversion 
if possible.

With regard to the missing Rel-to-Sql conversion for SEMI/ANTI joins, you can 
log a JIRA and open a pull request for it, people are happy to take a look.

- Haisheng

--
发件人:Christian Beikov
日 期:2020年02月20日 03:11:02
收件人:Haisheng Yuan; Apache Calcite dev 
list
主 题:Re: Translation of SQL EXISTS


Hey Haisheng,
it is nice to have a rule that detects such patterns but do you agree that it 
would be better to generate SEMI- and ANTI-joins already during Sql-to-Rel 
conversion?
I implemented the Rel-to-Sql conversion for SEMI- and ANTI-join and it seems to 
work for my use cases, but it feels a bit ugly as I have to do some 
transformations which might be ad-hoc. I had to replace identifiers that refer 
to the outcome of the SEMI- and ANTI-joins with the select items of the LHS 
because the SEMI- and ANTI-join does not get an alias.
See for yourself: 
https://github.com/beikov/calcite/commit/6ca67e2e8df44c2081e71407ee6893a961f36271
I'll add a few tests and a comment, but could anyone please comment on whether 
this approach looks ok?
Regards,
Christian
Am 19.02.2020 um 19:03 schrieb Haisheng Yuan:
 
Hi Christian,

For the query in your example, Calcite first generates inner join plan with 
aggregate child, then through SemJoinRule transform the inner join to semi or 
antisemi join. The reason to have inner join is that it allows join 
commutativity, which is good for generating a potential better plan with 
nestedloop join or hash join.

Admittedly, this process in Calcite is counter intuitive. It should be in 
reverse order, first generate a semi or anti-semi join, then generate an 
inner/outer join.

- Haisheng

--
 发件人:Christian Beikov
 日 期:2020年02月19日 21:12:13
 收件人:
 主 题:Translation of SQL EXISTS

 Hello,

 I'm a bit confused about how the SQL EXISTS predicate is translated. I'd 
 assume that an EXISTS is translated in relational algebra to a SEMI- and 
 NOT EXISTS to an ANTI-join, but it's not.

 PlannerImpl p = new PlannerImpl(config);
 SqlNode sqlNode = p.parse("delete from _order o where exists (select 1 
 from order_position p where o.id = p.order_id)");
 p.validate(sqlNode);
 RelRoot rel = p.rel(sqlNode);
 RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
 SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
 sqlWriter.format(result.asStatement());

 Worse, when printing this, I only get DELETE FROM "public"."_order" i.e. 
 the EXISTS part is not rendered. This is the plan I get.

 LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
 flattened=[true])
LogicalProject(inputs=[0])
  LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
LogicalJoin(condition=[=($0, $1)], joinType=[inner])
  JdbcTableScan(table=[[adhoc, _order]])
  LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(exprs=[[$1, true]])
  JdbcTableScan(table=[[adhoc, order_position]])

 I'd expect something along the lines of

 LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
 flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[semi])
  JdbcTableScan(table=[[adhoc, _order]])
  JdbcTableScan(table=[[adhoc, order_position]])

 and for NOT EXISTS

 LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
 flattened=[true])
LogicalProject(inputs=[0])
LogicalJoin(condition=[=($0, $1)], joinType=[anti])
  JdbcTableScan(table=[[adhoc, _order]])
  JdbcTableScan(table=[[adhoc, order_position]])

 Am I missing something and the current aggregate function translation 
 makes sense?

 I constructed relational algebra structures for some other statements 
 with SEMI- and ANTI-joins and already noticed that these join types 
 weren't handled in 
 
org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),
 
 which I fixed locally. Is the lack of a translation intentional?

 Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an 
 over-simplification or would you say it's correct? As far as I 
 understood from https://en.wikipedia.org/wiki/Relational_algebra this is 
 correct.

 I'd be happy to contribute that back. I didn't look into the Sql-to-Rel 
 translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I 
 assume that's not that hard and I could add that.

 Regards,

 Christian





Re: Translation of SQL EXISTS

2020-02-19 Thread Haisheng Yuan
Hi Christian,

For the query in your example, Calcite first generates inner join plan with 
aggregate child, then through SemJoinRule transform the inner join to semi or 
antisemi join. The reason to have inner join is that it allows join 
commutativity, which is good for generating a potential better plan with 
nestedloop join or hash join.

Admittedly, this process in Calcite is counter intuitive. It should be in 
reverse order, first generate a semi or anti-semi join, then generate an 
inner/outer join.

- Haisheng

--
发件人:Christian Beikov
日 期:2020年02月19日 21:12:13
收件人:
主 题:Translation of SQL EXISTS

Hello,

I'm a bit confused about how the SQL EXISTS predicate is translated. I'd 
assume that an EXISTS is translated in relational algebra to a SEMI- and 
NOT EXISTS to an ANTI-join, but it's not.

PlannerImpl p = new PlannerImpl(config);
SqlNode sqlNode = p.parse("delete from _order o where exists (select 1 
from order_position p where o.id = p.order_id)");
p.validate(sqlNode);
RelRoot rel = p.rel(sqlNode);
RelToSqlConverter sqlConverter = new RelToSqlConverter(dialect);
SqlImplementor.Result result = sqlConverter.visitChild(0, rel.rel);
sqlWriter.format(result.asStatement());

Worse, when printing this, I only get DELETE FROM "public"."_order" i.e. 
the EXISTS part is not rendered. This is the plan I get.

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])
   LogicalProject(inputs=[0])
 LogicalProject(inputs=[0], exprs=[[CAST($1):BIGINT, CAST($2):BOOLEAN]])
   LogicalJoin(condition=[=($0, $1)], joinType=[inner])
 JdbcTableScan(table=[[adhoc, _order]])
 LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
   LogicalProject(exprs=[[$1, true]])
 JdbcTableScan(table=[[adhoc, order_position]])

I'd expect something along the lines of

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])
   LogicalProject(inputs=[0])
   LogicalJoin(condition=[=($0, $1)], joinType=[semi])
 JdbcTableScan(table=[[adhoc, _order]])
 JdbcTableScan(table=[[adhoc, order_position]])

and for NOT EXISTS

LogicalTableModify(table=[[adhoc, _order]], operation=[DELETE], 
flattened=[true])
   LogicalProject(inputs=[0])
   LogicalJoin(condition=[=($0, $1)], joinType=[anti])
 JdbcTableScan(table=[[adhoc, _order]])
 JdbcTableScan(table=[[adhoc, order_position]])

Am I missing something and the current aggregate function translation 
makes sense?

I constructed relational algebra structures for some other statements 
with SEMI- and ANTI-joins and already noticed that these join types 
weren't handled in 
org.apache.calcite.rel.rel2sql.RelToSqlConverter#visit(org.apache.calcite.rel.core.Join),
 
which I fixed locally. Is the lack of a translation intentional?

Is such a translation of SEMI- and ANTI-join to EXISTS and NOT EXISTS an 
over-simplification or would you say it's correct? As far as I 
understood from https://en.wikipedia.org/wiki/Relational_algebra this is 
correct.

I'd be happy to contribute that back. I didn't look into the Sql-to-Rel 
translation for EXISTS and NOT EXISTS to SEMI- and ANTI-join yet, but I 
assume that's not that hard and I could add that.

Regards,

Christian




[jira] [Created] (CALCITE-3757) When merging sets, relnodes may be reregistered multiple times

2020-01-27 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3757:
--

 Summary: When merging sets, relnodes may be reregistered multiple 
times
 Key: CALCITE-3757
 URL: https://issues.apache.org/jira/browse/CALCITE-3757
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Haisheng Yuan


In RelSet#mergeWith(), 

{code:java}
// merge subsets
for (RelSubset otherSubset : otherSet.subsets) {
  ..
  for (RelNode otherRel : otherSubset.getRels()) {
planner.reregister(this, otherRel);
  }
}
{code}

otherSubset.getRels() returns all the rels that satisfy the traitset. A 
relnode's traitset may satisfy many traitsets, hence it will be reregistered 
multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3756) RelSubset should not match operand(RelNode.class)

2020-01-23 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3756:
--

 Summary: RelSubset should not match operand(RelNode.class)
 Key: CALCITE-3756
 URL: https://issues.apache.org/jira/browse/CALCITE-3756
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Haisheng Yuan


Even RelSubset is a subclass of RelNode, but they are quite different. RelNode 
already fires these rules, it is not necessary to fire fore RelSubset. More 
importantly, it never worked due to CALCITE-3755.
In addition, it will waste time to try to iterate class operands of RelSubset 
and try to match pattern operand(RelNode), given that it never matches when 
ascending.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3755) Ascending rule match with RelSubset operand doesn't work

2020-01-23 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3755:
--

 Summary: Ascending rule match with RelSubset operand doesn't work
 Key: CALCITE-3755
 URL: https://issues.apache.org/jira/browse/CALCITE-3755
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Haisheng Yuan


Given a rule with match pattern {{operand(Filter, operand(RelSubset))}}, when 
we add a new RelSubset into the set, planner will fire rules for RelSubset. 
But in {{VolcanoRuleCall.matchRecurse}}, if !inputRels.contains(previous), it 
will skip. Obviously, the input rels will never contain a RelSubset, hence 
fails to match the rule.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3753) Always try to match and execute substitution rule first and remove rulematch ordering

2020-01-21 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3753:
--

 Summary: Always try to match and execute substitution rule first 
and remove rulematch ordering
 Key: CALCITE-3753
 URL: https://issues.apache.org/jira/browse/CALCITE-3753
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


Some rules e.g. ProjectMergeRule, PruneEmptyRule can be defined as 
SubstitutionRule, so that we can always try to match and execute them first 
(without deferring rule call). All the other rulematches doesn't need to be 
sorted and rules can be executed in any order they matched, since we are going 
to execute all of them anyway, sooner or later. Computing and comparing 
importances cause a lot of latency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3744) Duplicate RuleMatches when RelSet gets merged

2020-01-16 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3744:
--

 Summary: Duplicate RuleMatches when RelSet gets merged
 Key: CALCITE-3744
 URL: https://issues.apache.org/jira/browse/CALCITE-3744
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Haisheng Yuan


Some times RelSet can get merged. e.g.

Each number represet Rel id, and they are in different RelSet.
{code:java}
 14
   \  /
 2  / 
   \/
 3
{code}

Assume in the rulequeue, we have rule match with name of  12, 23, 43. For 
simplicity, omit the rule name, first digit represent the parent RelNode, 
second represent the child.

If after some rule, we merged the set of 3 into the set of 2, and RelNode 4's 
input is replaced by RelSubset of 2. We will retrigger rules and try to add 
rulematch 12, 42, 43 into rule queue.

12 will be filtered out, because there is duplicate in the RuleMatch names set. 
But for 43, it now has different rulematch digest with previous match, because 
RelNode #4's input RelSubset changed, its digest also changed. So we can't 
detect the duplication and will re-apply rulematch 43 twice.

Although we try to recompute rulematch's digest when popMatch(), but it seems 
too late. 

I think in RuleMatch digest, just using relnode's id should suffice. And we 
don't need to call recomputeDigest when popMatch.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2020-01-14 Thread Haisheng Yuan
On the other hand, if we don't preprocess and normalize the rel expression 
before going to valcano planner, still compute and keep logical/relational 
properties, like cardinality, for each operator, how can we achieve group seach 
space pruning? Given a physical group expression, its required property and 
upper bound cost C_limit, we need to get its child group cardinality, compute 
local cost C_local,  so that we can calculate the child group's upper bound 
cost and pass it down. 

I can't figure out how we can do group pruning without shared relational 
properties.

- Haisheng

--
发件人:Haisheng Yuan
日 期:2020年01月14日 12:06:17
收件人:dev@calcite.apache.org
主 题:Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

The example is valid if Calcite doesn't do normalization or preprocessing 
before going to VolcanoPlanner.
But many databases and big data systems will try to preprocess the expression 
(push down predicates etc.) so that expressions in the same group can share the 
logical properties, for most case if not all. You may argue that it should be 
cost based, e.g. evaluating filter early can still be bad. It is true, but how 
accurate will the statistics be, how accurate will the cost model be?

- Haisheng

--
发件人:Julian Hyde
日 期:2020年01月13日 08:44:54
收件人:dev@calcite.apache.org
主 题:Re: [DISCUSS] Proposal to add API to force rules matching specific rels

> MEMO group (RelSet) represents logically equivalent expressions.
> All the expressions in one group should share the same logical
> properties, e.g. functional dependency, constraint properties etc.
> But they are not sharing it. Computation is done for each individual operator.

It's good, and correct, that we compute for each individual operator.

Suppose that a RelSubset s contains RelNode r1 and we know that the
constraint x > 0 holds. Suppose that we also have r2 with constraint y
< 10, and we discover that r1 and r2 are equivalent and belong
together in s. Now we can safely say that both constraints (x > 0 and
y < 10) apply to both r1 and r2.

Deducing additional constraints in this way is a big win. The effort
to compute constraints for each RelNode is well-spent.

This kind of deduction applies to other logical properties (e.g.
unique keys) and it applies to RelSet as well as RelSubset.

Julian


On Sun, Jan 12, 2020 at 10:10 AM Roman Kondakov
 wrote:
>
> @Haisheng
>
> > Calcite uses Project operator and all kinds of ProjectXXXTranposeRule to 
> > prune unused columns.
>
> I also noticed that in most cases Project-related rules took significant
> part of the planning time. But I didn't explore this problem yet.
>
> > MEMO group (RelSet) represents logically equivalent expressions. All the 
> > expressions in one group should share the same logical properties, e.g. 
> > functional dependency, constraint properties etc. But they are not sharing 
> > it. Computation is done for each individual operator.
>
> I thought the equivalence of logical properties within groups (RelSets)
> are implicit. For example, in RelSet#addInternal it is always verified
> that the new added node has the same type as other members of the set.
>
> Anyway I absolutely agree with you that problems with traits
> propagation, rules matching and other that you mentioned in the previous
> e-mails should be solved in the first place. We need first to make
> Volcano optimizer work right and only then make some improvements like
> search space pruning.
>
> I would love to join this work to improve Volcano planner. Looking
> forward for design doc.
>
>
> --
> Kind Regards
> Roman Kondakov
>
>
> On 11.01.2020 11:42, Haisheng Yuan wrote:
> > Currently, Calcite uses Project operator and all kinds of 
> > ProjectXXXTranposeRule to prune unused columns. Every operator's output 
> > columns use index to reference child operators' columns. If there is a 
> > Project operator with child operator of a Filter, if we push project down 
> > under Filter, we will have Project-Filter-Project-FilterInput. All the 
> > newly generated relnodes will trigger rule matches. e.g. If we already did 
> > ReduceExpressionRule on filter, but due to the new filter RexCall's input 
> > ref index changed, we have to apply ReduceExpressionRule on the new filter 
> > again, even there is nothing can be reduced. Similarly new operator 
> > transpose/merge rule will be triggered. This can trigger a lot of rule 
> > matches.
> >
> > MEMO group (RelSet) represents logically equivalent expressions. All the 
> > expressions in one group should share the same logical properties, e.g. 
> > funct

Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2020-01-13 Thread Haisheng Yuan
The example is valid if Calcite doesn't do normalization or preprocessing 
before going to VolcanoPlanner.
But many databases and big data systems will try to preprocess the expression 
(push down predicates etc.) so that expressions in the same group can share the 
logical properties, for most case if not all. You may argue that it should be 
cost based, e.g. evaluating filter early can still be bad. It is true, but how 
accurate will the statistics be, how accurate will the cost model be?

- Haisheng

--
发件人:Julian Hyde
日 期:2020年01月13日 08:44:54
收件人:dev@calcite.apache.org
主 题:Re: [DISCUSS] Proposal to add API to force rules matching specific rels

> MEMO group (RelSet) represents logically equivalent expressions.
> All the expressions in one group should share the same logical
> properties, e.g. functional dependency, constraint properties etc.
> But they are not sharing it. Computation is done for each individual operator.

It's good, and correct, that we compute for each individual operator.

Suppose that a RelSubset s contains RelNode r1 and we know that the
constraint x > 0 holds. Suppose that we also have r2 with constraint y
< 10, and we discover that r1 and r2 are equivalent and belong
together in s. Now we can safely say that both constraints (x > 0 and
y < 10) apply to both r1 and r2.

Deducing additional constraints in this way is a big win. The effort
to compute constraints for each RelNode is well-spent.

This kind of deduction applies to other logical properties (e.g.
unique keys) and it applies to RelSet as well as RelSubset.

Julian


On Sun, Jan 12, 2020 at 10:10 AM Roman Kondakov
 wrote:
>
> @Haisheng
>
> > Calcite uses Project operator and all kinds of ProjectXXXTranposeRule to 
> > prune unused columns.
>
> I also noticed that in most cases Project-related rules took significant
> part of the planning time. But I didn't explore this problem yet.
>
> > MEMO group (RelSet) represents logically equivalent expressions. All the 
> > expressions in one group should share the same logical properties, e.g. 
> > functional dependency, constraint properties etc. But they are not sharing 
> > it. Computation is done for each individual operator.
>
> I thought the equivalence of logical properties within groups (RelSets)
> are implicit. For example, in RelSet#addInternal it is always verified
> that the new added node has the same type as other members of the set.
>
> Anyway I absolutely agree with you that problems with traits
> propagation, rules matching and other that you mentioned in the previous
> e-mails should be solved in the first place. We need first to make
> Volcano optimizer work right and only then make some improvements like
> search space pruning.
>
> I would love to join this work to improve Volcano planner. Looking
> forward for design doc.
>
>
> --
> Kind Regards
> Roman Kondakov
>
>
> On 11.01.2020 11:42, Haisheng Yuan wrote:
> > Currently, Calcite uses Project operator and all kinds of 
> > ProjectXXXTranposeRule to prune unused columns. Every operator's output 
> > columns use index to reference child operators' columns. If there is a 
> > Project operator with child operator of a Filter, if we push project down 
> > under Filter, we will have Project-Filter-Project-FilterInput. All the 
> > newly generated relnodes will trigger rule matches. e.g. If we already did 
> > ReduceExpressionRule on filter, but due to the new filter RexCall's input 
> > ref index changed, we have to apply ReduceExpressionRule on the new filter 
> > again, even there is nothing can be reduced. Similarly new operator 
> > transpose/merge rule will be triggered. This can trigger a lot of rule 
> > matches.
> >
> > MEMO group (RelSet) represents logically equivalent expressions. All the 
> > expressions in one group should share the same logical properties, e.g. 
> > functional dependency, constraint properties etc. But they are not sharing 
> > it. Computation is done for each individual operator.
> >
> > Without resolving those issue, space pruning won't help much.
> >
> > There are a lot of room for improvement. Hope the community can join the 
> > effort to make Calcite better.
> >
> > - Haisheng
> >
> > --
> > 发件人:Roman Kondakov
> > 日 期:2020年01月10日 19:39:51
> > 收件人:
> > 主 题:Re: [DISCUSS] Proposal to add API to force rules matching specific rels
> >
> > @Haisheng, could you please clarify what you mean by these points?
> >
> >> - the poor-design of column pruning,
> >> - lack of group properties etc.
> >
> > I guess I'm not aware of these problems.
> >


Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2020-01-11 Thread Haisheng Yuan
Currently, Calcite uses Project operator and all kinds of 
ProjectXXXTranposeRule to prune unused columns. Every operator's output columns 
use index to reference child operators' columns. If there is a Project operator 
with child operator of a Filter, if we push project down under Filter, we will 
have Project-Filter-Project-FilterInput. All the newly generated relnodes will 
trigger rule matches. e.g. If we already did ReduceExpressionRule on filter, 
but due to the new filter RexCall's input ref index changed, we have to apply 
ReduceExpressionRule on the new filter again, even there is nothing can be 
reduced. Similarly new operator transpose/merge rule will be triggered. This 
can trigger a lot of rule matches.

MEMO group (RelSet) represents logically equivalent expressions. All the 
expressions in one group should share the same logical properties, e.g. 
functional dependency, constraint properties etc. But they are not sharing it. 
Computation is done for each individual operator.

Without resolving those issue, space pruning won't help much.

There are a lot of room for improvement. Hope the community can join the effort 
to make Calcite better. 

- Haisheng

--
发件人:Roman Kondakov
日 期:2020年01月10日 19:39:51
收件人:
主 题:Re: [DISCUSS] Proposal to add API to force rules matching specific rels

@Haisheng, could you please clarify what you mean by these points?

> - the poor-design of column pruning, 
> - lack of group properties etc.

I guess I'm not aware of these problems.

-- 
Kind Regards
Roman Kondakov


On 08.01.2020 02:21, Haisheng Yuan wrote:
>> @Haisheng, are you doing something like that?
> Kind of, but not exactly. It is about on-demand trait propagation.
> 
> @Roman seems to be keen on space pruning for Calcite. But IMHO, for now, the 
> main reason of Calcite's poor performance is not lack of branch & bound space 
> puning, but 
> - rule applying on physical nodes, 
> - randomness of rule matching,
> - the poor-design of column pruning, 
> - lack of on-demand trait propagation, 
> - lack of group properties etc.
> 
> We tried a similar change with Roman's on our product. We totally removed 
> rule match importance and its comparison, split it into exploration, 
> implementation, enforcement 3 phases with specific top-down/bottom-up order, 
> it achieved almost 100% speedup.
> Even @vlsi's RexNode normalization can improve it to some degree.
> 
> Calcite currently generates only 1 join-order alternative for 6-way joins in 
> testJoinManyWay, not even top 10, 100  or N! ordering alternatives, but it 
> still can't finish within reasonable amount of time when abstract converter 
> is allowed. If there is only 1 join order alternative, the query optimizer 
> should finish the optimization quickly even for clique or chain queries with 
> 20 way joins, without space pruning. But this is not the case for Calcite.
> 
> Simply put it, space pruning is important for optimization, especially for 
> join-reordering, but not an urgent issue for Calcite.
> 
> - Haisheng
> 
> --
> 发件人:Roman Kondakov
> 日 期:2020年01月08日 02:39:19
> 收件人:
> 主 题:Re: [DISCUSS] Proposal to add API to force rules matching specific rels
> 
> I forgot to mention that this approach was inspired by Stamatis's idea [1]
> 
> [1]
> https://ponymail-vm.apache.org/_GUI_/thread.html/d8f8bc0efd091c0750534ca5cd224f4dfe8940c9d0a99ce486516fd5@%3Cdev.calcite.apache.org%3E
> 
> 


Re: Question about Volcano's planner root converters

2020-01-07 Thread Haisheng Yuan
> 1. Should we change the condition of root converters creation from the
direct traitset difference to the difference with considering traits
hierarchy (like [] includes [1])?

I think so. Specifically non-empty required trait diff. Singleton is the only 
diff in your example. It doesn't care about sort order.

> 2. Should we cover the case when the actual traits difference size is
more than 1? I.e. root traits are SINGLETON.[2] and neighbor traits are
HASH.[1].

Yes. This is quite common for MPP databases where there is a master node to 
gather all the data from segments. Note that it should not be N!. Ideally, each 
physical operator should have API to derive properties from child. e.g. 
Singleton/Gather physical operator will deliver singleton distribution, but 
will break sort order unless you gather merge the data, even the child is a 
Sort node. But Sort operator will keep the distribution of child operator.

So we can either do
Sort
  +-- GatherMotion
   +-- TableScan
or
GatherMerge
   +-- Sort
   +-- TableScan

- Haisheng

--
发件人:Roman Kondakov
日 期:2020年01月07日 18:35:00
收件人:
主 题:Question about Volcano's planner root converters

Hello!

I have a couple questions about method
VolcanoPlanner#ensureRootConverters [1]. In this method we first
calculate the difference of traits between root subset and other subsets
which belong to the same set as the root:

ImmutableList difference =
  root.getTraitSet().difference(subset.getTraitSet());

and then, if the differences list size equals to 1, we register a new
AbstractConverter from the given subset's trait to the root trait.

The problems I can see here are caused by the strict condition that
difference.size() should be equals to 1 in order to create converters to
the root traits. This requirement may often lead to CanNotPlanException.
Let's consider an example: In the root set  we have a root node with
traits of distribution SINGLETON and empty sortedness [] (I omitted
convention trait for simplicity) and Subset of physical nodes with
distribution HASH and sortedness [1]. Set dump would look like this in
the given case:

Set#1 <-- root set
  rel#29:Subset#1.SINGLETON.[] <-- root subset
[empty subset]
  rel#20:Subset#2.HASH.[1]
rel#27:IgniteTableScan.HASH.[1]

Root converters were not created here because the difference size
between traitsets was 2: HASH != SINGLETON and [] != [1]. But it looks
strange because an empty collation [] includes [1] as well as all
possible collations. So, the actual traits difference size is 1. And
converter from HASH to SINGLETON might be created here.

My questions are
1. Should we change the condition of root converters creation from the
direct traitset difference to the difference with considering traits
hierarchy (like [] includes [1])?
2. Should we cover the case when the actual traits difference size is
more than 1? I.e. root traits are SINGLETON.[2] and neighbor traits are
HASH.[1]. In this case we can create the chain of converters:

Subset.HASH.[1] ->
AbstractConverter[HASH->SINGLETON] ->
AbstractConverter[[1]->[2]] ->
RootSubset[SINGLETON.[2]]

This may help us to avoid CanNotPlanException in many cases at the cost
of creation of N! chains of abstract converters. I guess N should be
relatively small here (N <= 3) because traits dimensionality is usually
small.

What do you think?


[1]
https://github.com/apache/calcite/blob/963a266d7b3859d141964217f4d16cae350b10e1/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L675

-- 
Kind Regards
Roman Kondakov


Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2020-01-07 Thread Haisheng Yuan
> @Haisheng, are you doing something like that?
Kind of, but not exactly. It is about on-demand trait propagation.

@Roman seems to be keen on space pruning for Calcite. But IMHO, for now, the 
main reason of Calcite's poor performance is not lack of branch & bound space 
puning, but 
- rule applying on physical nodes, 
- randomness of rule matching,
- the poor-design of column pruning, 
- lack of on-demand trait propagation, 
- lack of group properties etc.

We tried a similar change with Roman's on our product. We totally removed rule 
match importance and its comparison, split it into exploration, implementation, 
enforcement 3 phases with specific top-down/bottom-up order, it achieved almost 
100% speedup.
Even @vlsi's RexNode normalization can improve it to some degree.

Calcite currently generates only 1 join-order alternative for 6-way joins in 
testJoinManyWay, not even top 10, 100  or N! ordering alternatives, but it 
still can't finish within reasonable amount of time when abstract converter is 
allowed. If there is only 1 join order alternative, the query optimizer should 
finish the optimization quickly even for clique or chain queries with 20 way 
joins, without space pruning. But this is not the case for Calcite.

Simply put it, space pruning is important for optimization, especially for 
join-reordering, but not an urgent issue for Calcite.

- Haisheng

--
发件人:Roman Kondakov
日 期:2020年01月08日 02:39:19
收件人:
主 题:Re: [DISCUSS] Proposal to add API to force rules matching specific rels

I forgot to mention that this approach was inspired by Stamatis's idea [1]

[1]
https://ponymail-vm.apache.org/_GUI_/thread.html/d8f8bc0efd091c0750534ca5cd224f4dfe8940c9d0a99ce486516fd5@%3Cdev.calcite.apache.org%3E


-- 
Kind Regards
Roman Kondakov


On 07.01.2020 21:14, Roman Kondakov wrote:
> Hello!
> 
> As Vladimir Ozerov mentioned, the general problem here is that
> VolcanoPlanner applies both logical and physical rules in a top-down
> manner. It's more like the original volcano paper approach [1]:
> 
>> In the Volcano search strategy, a first phase applied all
>> transformation rules to create all possible logical expressions for a query 
>> and all its subtrees. The second phase,
>> which performed the actual optimization, navigated within that network of 
>> equivalence classes and expressions,
>> applied implementation rules to obtain plans, and determined the best plan.
> The Cascades paper [2] says
> 
>> In the Cascades optimizer, this separation into two phases is abolished, 
>> because it is not useful to derive all
>> logically equivalent forms of all expressions, e.g., of a predicate. A group 
>> is explored using transformation rules
>> only on demand
> 
> Unfortunately, it is not clear from this paper what the actual "on
> demand" phrase means. But there is a great explanation of the Cascades
> optimization process can be found in [3]:
> 
>> It begins with the input query and, ..., 
>> proceeds top-down, using recursion on the inputs
>> of the current multiexpression MExpr. However, plan
>> costing actually proceeds bottom-up, based on the order
>> of the returns from the top-down recursive calls. 
> 
> and this is what VolcanoPlanner lacks for: when a logical rule is
> applied in a top-down way and a new LogicalRelNode is created, we need
> to understand whether the overall plan cost was improved by this move or
> not. In order to understand it we need to know the actual cost of the
> new plan. As [3] says:
> 
>> A plan is an expression made up entirely of physical operators.
> 
> Hence, to evaluate the plan cost we need to have fully implemented
> physical tree of operators for our new LogicalRelNode which was created
> during applying the logical rule above.
> 
> This process can be described with the pseudo-code:
> 
> for (LogicalRuleMatch rule : LogicalRuleMatchesSortedTopDown) {
> LogicalNode newNode = rule.apply();
> implementRecursivelyBottomUp(newNode);
> }
> 
> we first apply logical rules in a top-down fashion and then after each
> logical rule invocation we need to implement newly created nodes with
> the physical rules recursively in a bottom-up way.
> 
> The outcome here is when the physical converter rule is applied to a
> RelNode, all children of this node are already implemented, so their
> traits can easily be derived and the precise cost of them can be calculated.
> 
> I tried to simulate this behavior with minor changes in
> RuleQueue.RuleMatchImportanceComparator (see PR [4]) and it worked well
> for me. Tests started perform better. I counted planner ticks (see
> VolcanoPlanner#findBestExp) to evaluate performance improvements. For
> example, the number of ticks in JdbcTest#testJoinManyWay before
> comparator change:
> 
> ticks=11
> ticks=1041
> ticks=31362
> 
> and after:
> 
> ticks=11
> ticks=678
> ticks=26231
> 
> In some Apache Ignite test scenarios the number of ticks was reduced by
> half.
> 
> The essence of 

Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2020-01-07 Thread Haisheng Yuan
It is still on my radar.

I have been busy these days, but will send out a design doc in a few days. But 
just a heads up, the change would be larger than everyone's expected.

- Haisheng

--
发件人:Danny Chan
日 期:2020年01月07日 17:29:46
收件人:
主 题:Re: [DISCUSS] Proposal to add API to force rules matching specific rels

Hi, guys, it seems that the discussion silent now, so do we have some 
conclusion that can contribute to current code, i.e. the suggested API change 
or new abstraction ?

Or better, can someone give a design doc so that we can push and make that 
implemented ?

Personally I was always looking forward to the result, because Apache Flink 
suffers for the bad planning performance for Join re-order or traits 
auto-adapter.

Best,
Danny Chan
在 2019年11月20日 +0800 AM2:14,Vladimir Ozerov ,写道:
> HI Igor,
>
> Thank you for the details. Meanwhile, I solved it with separation of
> conversion rules from the physical optimization rules. So the first pass
> creates physical nodes with unknown physical properties (top-bottom), while
> subsequent processing of the leaf nodes triggers rules which convert "bad"
> physical nodes to "good" physical nodes with know distribution and
> collation.
>
> Regards,
> Vladimir.
>
> пн, 18 нояб. 2019 г. в 13:43, Seliverstov Igor :
>
> > Vladimir,
> >
> > Hope it may help you.
> >
> > Currently we applied the next way (just rough description):
> >
> > 1) We created an API to derive possible traits permutations on the basis
> > of children traits (quite similar to one, described in «On Demand trait
> > set request» topic)
> >
> > 2) added a general rule that copies Logical nodes, but requesting our
> > convention from their children (IGNITE convention, ANY distribution, EMPTY
> > collation) and sets importance of old Logical nodes to zero - so, we have a
> > Logical parent which input satisfies any possible distribution and no rules
> > matched to previous logical node any more.
> >
> > 3) Physical rules to create physical rel nodes only if physical traits may
> > be derived (there is no «barrier», described in one of previous messages) -
> > derived traits are a collection, we don’t create a physical rel node for
> > each possible traits set, also we may set zero importance for previously
> > created rel nodes to decrease search space.
> >
> > Now we know actual and required distribution, we don’t need
> > AbstractConverters and able just call TraitDef.convert() method inside a
> > rule.
> > A rule still able to produce the same output several times, but
> > «memorization" inside the planner solves it for us.
> >
> > preliminary tests show almost zero overhead of the approach.
> >
> > Regards,
> > Igor
> >
> >
> > > 14 нояб. 2019 г., в 14:49, Vladimir Ozerov 
> > написал(а):
> > >
> > > Hi Xing,
> > >
> > > Thanks for your suggestion. Yes, this may help, and if I get your idea
> > > right, I already had it in my reproducer:
> > > 1) Create the converted physical input:
> > >
> > https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/ProjectPhysicalRule.java#L49
> > > 2) Use it in case no physical children were found:
> > >
> > https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/ProjectPhysicalRule.java#L79
> > >
> > > This idea is borrowed from Apache Drill physical rules. But the problem
> > is
> > > that this approach leads to a suboptimal plan - parent node doesn't know
> > > the future distribution of a child node. And as a result, it doesn't know
> > > it's own distribution. So the final plan is constructed in that way:
> > > 1.1) Root enforced "SINGLETON" on its child:
> > > -> PhysicalRoot[SINGLETON]
> > > -> Converter[SINGLETON]
> > > -> PhysicalProject[*ANY*]
> > > -> PhysicalScan[REPLICATED]
> > >
> > > 1.2) But since the child (PhysicalProject) failed to infer distribution
> > > during rule call, it falls back to ANY distribution. In order to satisfy
> > > SINGLETON distribution of a parent, we inject an exchange in the final
> > plan:
> > > -> PhysicalRoot[SINGLETON]
> > > * -> Exchange[SINGLETON]*
> > > -> PhysicalProject[*ANY*]
> > > -> PhysicalScan[REPLICATED]
> > >
> > > 2) But this is a suboptimal plan. The optimal plan is:
> > > -> PhysicalRoot[SINGLETON]
> > > -> PhysicalProject[REPLICATED]
> > > -> PhysicalScan[REPLICATED]
> > >
> > > You may observe it in my tests:
> > > 1)
> > >
> > https://github.com/devozerov/calcite-optimizer/blob/master/src/test/java/devozerov/OptimizerTest.java#L46
> > > -
> > > works as you described and produces not optimal plan with exchange
> > > 2)
> > >
> > https://github.com/devozerov/calcite-optimizer/blob/master/src/test/java/devozerov/OptimizerTest.java#L30
> > > -
> > > rely on AbstractConverter-s and produce an optimal plan with bottom-up
> > > trait propagation at the cost of significantly increased planning time
> > >
> > > Regards,
> > > Vladimir.
> > >
> > > пт, 8 нояб. 2019 г. в 16:

[jira] [Created] (CALCITE-3676) VolcanoPlanner. dumpGraphviz should handle exception gracefully

2020-01-03 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3676:
--

 Summary: VolcanoPlanner. dumpGraphviz should handle exception 
gracefully
 Key: CALCITE-3676
 URL: https://issues.apache.org/jira/browse/CALCITE-3676
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Haisheng Yuan


In case VolcanoPlanner can't generate plan due to some RelSubset doesn't have 
best rel, it will dump the sets and Graphviz. Since there is error during 
planning, we want to see the sets in the exception message, but when dumping 
Graphviz, it might encounter another exception (we can't guarantee all the sets 
and subsets are correctly generated for Graphviz), causing failure to dump Sets 
info, which is rather helpful in production system.


{code:java}
Caused by: java.lang.AssertionError
at 
org.apache.calcite.util.PartiallyOrderedSet.findParentsChildren(PartiallyOrderedSet.java:318)
at 
org.apache.calcite.util.PartiallyOrderedSet.findParents(PartiallyOrderedSet.java:308)
at 
org.apache.calcite.util.PartiallyOrderedSet.add(PartiallyOrderedSet.java:226)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.dumpGraphviz(VolcanoPlanner.java:1320)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.dump(VolcanoPlanner.java:1194)
at 
org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:606)
at 
org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:307)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:649)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3668) VolcanoPlanner doesn't match all the RelSubSet in matchRecursive

2020-01-02 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3668:
--

 Summary: VolcanoPlanner doesn't match all the RelSubSet in 
matchRecursive
 Key: CALCITE-3668
 URL: https://issues.apache.org/jira/browse/CALCITE-3668
 Project: Calcite
  Issue Type: Bug
Reporter: Haisheng Yuan


If I have a rule to match pattern with Filter-RelSubset, VolcanoPlanner only 
matches 1 RelSubset in the RelSet, instead of all the subsets.


{code:java}
 if (operand.getMatchedClass() == RelSubset.class) {
// If the rule wants the whole subset, we just provide it
successors = ImmutableList.of(subset);  
  } else {
successors = subset.getRelList();
  }
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Draft board report for January 2020

2020-01-01 Thread Haisheng Yuan
+1, looks good to me.
Thanks.

- Haisheng

--
发件人:Francis Chuang
日 期:2020年01月02日 04:54:46
收件人:
主 题:Re: Draft board report for January 2020

+1, looks good, Stamatis!

On 1/01/2020 9:18 pm, Stamatis Zampetakis wrote:
> Attached below is a draft of this month's board report. I plan to submit it
> on January 7. Please let me know if you have any additions or corrections.
> 
> ## Description:
> Apache Calcite is a highly customizable framework for parsing and planning
> queries on data in a wide variety of formats. It allows database-like
> access,
> and in particular a SQL interface and advanced query optimization, for data
> not
> residing in a traditional database.
> 
> Avatica is a sub-project within Calcite and provides a framework for
> building
> local and remote JDBC and ODBC database drivers. Avatica has an independent
> release schedule and its own repository.
> 
> ## Issues:
> There are no issues requiring board attention.
> 
> ## Membership Data:
> Apache Calcite was founded 2015-10-22 (4 years ago).
> There are currently 45 committers and 22 PMC members in this project.
> The Committer-to-PMC ratio is roughly 2:1.
> 
> Community changes, past quarter:
> - Danny Chen was added to the PMC on 2019-10-30.
> - Haisheng Yuan was added to the PMC on 2019-11-11.
> - Stamatis Zampetakis was appointed as PMC chair on 2019-12-18,
> continuing the tradition of the project of rotating the chair every year.
> - No new committers. Last addition was Mohamed Mohsen on 2019-09-17.
> 
> ## Project Activity:
> Calcite 1.21.0 was released in the middle of September, including more than
> 100
> resolved issues and maintaining a release cadence of roughly one release per
> quarter.
> 
> Calcite 1.22.0 is under preparation and is expected to be released inside
> January while at the moment contains more than 230 commits and 150 resolved
> issues.
> 
> Avatica 1.16.0 was released in the middle of December, including numerous
> bug
> fixes and security improvements while the build system has been migrated
> from
> maven to gradle.
> 
> The build and test infrastructure has been modernized for both Calcite and
> Avatica, with the migration from maven to gradle, JUnit4 to JUnit5, and the
> introduction of GitHub actions as part of the CI. The changes shall improve
> developers experience, code quality, and protect better against
> regressions.
> 
> Members of the project participated in ApacheCon EU on October and Flink
> Forward
> Asia on November, representing the community, and presenting talks about
> Calcite.
> 
> Finally, the Hazelcast system has decided to adopt Calcite for query
> planning.
> 
> ## Community Health:
> 
> Activity levels on mailing lists (37%), git (40%) and JIRA (opened 15%,
> closed
> 19%) have increased significantly in the last quarter. One reason is the
> modernization of the build and test infrastructure for both Calcite and
> Avatica,
> which triggered  many discussions and follow-up tasks. Another reason, is
> the
> changes in the roster of the PMC and open discussions about the future of
> the
> project. Last but not least, is the involvement of new people in the
> community
> bringing up new challenges and ideas for improvements.
> 
> The rates of pull requests being closed and merged on Github has increased
> by
> 16%, as we work to clear our backlog. Nevertheless, the number of open pull
> requests is still big since the number of committers who get involved in
> reviews
> is rather small. Furthermore, there are pull requests which are stale,
> work in progress, or proposals that make the numbers look even bigger. On
> the
> positive side every pull request receives comments within a couple of days
> after
> being submitted and there are many which get merged without too much effort
> showing that the project attracts skilled developers who may turn into
> committers quite soon.
> 


Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

2019-12-30 Thread Haisheng Yuan
Normalizing scalar expressions is helpful, especially for deduplicating derived 
constraints that is pushed down to child relations. We should not stop making 
improvements just because there are many plan changes.

How about adding optimizer config option to enable/disable the feature?

Normalizing every binary operator might seem not necessary, but simply 
normalizing equal expression with input refs or literals alone can still help a 
lot, I guess, because many of them are generated from equivalent classes.

- Haisheng

--
发件人:Rui Wang
日 期:2019年12月31日 07:09:44
收件人:
主 题:Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

I think the concern of breaking plan comparison has appeared more than
once. Not only to this proposal but also to others (e.g. replace "$" with
"_" in names).

From a think of another perspective, the widely used practice of
string comparison based plan evaluation also reduces the flexibility of
making changes in Calcite. We need to carefully think about if a change has
a large impact that affects toString(), or something similar.

So if Calcite community could offer another solution for downstream
projects, and announce that don't guarantee the backward compatibility of
toString(), then it could be in a better situation.

A few candidate solutions:
1. offer utils that convert a plan represented in string to Relnodes.
2. use RelBuilder to build plans and does comparison.

and then just maintain the backward compatibility of the solution.

just my two cents.


-Rui


On Mon, Dec 30, 2019 at 12:20 PM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> The change improves slow tests from 80 min to 60, and the changes are
> minimal
>
> Vladimir
>



Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

2019-12-29 Thread Haisheng Yuan
> For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are different 
> joins, however, they are equivalent.

How is the alternative generated? I would rather check how to stop generating 
this alternative than reordering predicates. 

- Haisheng

--
发件人:Vladimir Sitnikov
日 期:2019年12月30日 15:04:25
收件人:Apache Calcite dev list
主 题:Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Danny>How much cases are there in production ? This example itself seems
very marginalized. I’m not against with it, I’m suspicious about the value
of the feature.

It improves JdbcTest#testJoinManyWay 2 times or so.

master.
JdbcTest#testJoinManyWay: 5.8sec
https://travis-ci.org/apache/calcite/jobs/630602718#L1646

with normalization fix:
JdbcTest#testJoinManyWay: 3.1sec
https://travis-ci.org/apache/calcite/jobs/630719276#L1432

The fix is vital for the proper support of EnumerableMergeJoin as well.
If EnumerableMergeJoin is activated, then JdbcTest#testJoinManyWay can't
complete within 5 minutes (see PR 1702)

Vladimir



Re: Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

2019-12-29 Thread Haisheng Yuan
>> =(CAST(PREV(UP.$0, 0)):INTEGER NOT NULL, 100)
I don't see the value of reordering this kind of expression. In GPDB, var vs 
constant comparison is reordered for input ref only, without additional 
function calls, like $1=5, because it is transformed into range constraints for 
rex simplification. Perhaps Calcite doesn't need this.


>> What do you think re "$n.field = 42" where $n.field is a dot operator I'm 
>> not fond of adding complicated checks there, however, I think I can move the 
>> literal to the right if the other side is a non-literal.

Sounds good. I am also fine to keep it unchanged.


>> "sort by numeric value" for cases like $1=$8, so that semantics would be 
>> more or less automatic.
I think this is good enough for Calcite, and it has same effect with what I 
said. Because operand with smaller index always come from left relation, if the 
two come from different relations.


- Haisheng

--
发件人:Vladimir Sitnikov
日 期:2019年12月30日 04:33:11
收件人:Apache Calcite dev list
主 题:Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Just in case, my motivation of comparing by string length first is for the
cases like below:

=(CAST(PREV(UP.$0, 0)):INTEGER NOT NULL, 100)

vs

=(100, CAST(PREV(UP.$0, 0)):INTEGER NOT NULL)

As for me, the second one is easier to understand, do the expression starts
with simpler bits, and
the complicated parts are put later.

In the same way, it is not amusing to see cases like
AND(...,...,...,...,,,..,  null, )

Vladimir



Re: Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

2019-12-29 Thread Haisheng Yuan
I recommend the way GPDB does.

Normalize the the logical plan expression in the preprocessing phase: 
- variable always left, constant always right for applicable binary operators;
- for join conditions, left operand always comes from left relation, right 
operand always comes from right relation for reversable binary operators.

- Haisheng

--
发件人:Enrico Olivelli
日 期:2019年12月30日 03:28:38
收件人:
主 题:Re: [DISCUSS] CALCITE-2450 reorder predicates to a canonical form

Il dom 29 dic 2019, 20:09 Vladimir Sitnikov 
ha scritto:

> Hi,
>
> We have a 1-year old issue with an idea to sort RexNode operands so they
> are consistent.
>
> For instance, "x=5" and "5=x" have the same semantics, so it would make
> sense to stick to a single implementation.
> A discussion can be found in
> https://issues.apache.org/jira/browse/CALCITE-2450
>
> We do not normalize RexNodes, thus it results in excessive planning time,
> especially when the planner is trying to reorder joins.
> For instance, it thinks Join(A, B, $0=$1) and Join(A, B, $1=$0) are
> different joins, however, they are equivalent.
>
> The normalization does not seem to cost much, however, it enables me to
> activate more rules (e.g. EnumerabeMergeRule),
> so it is good as it enables to consider more sophisticated plans.
>
> I see two approaches:
> a) Normalize in RexNode constructor. This seems easy to implement, however,
> there's a catch
> if someone assumed that the order of operands would be the same as the one
> that was passed to the constructor.
> I don't think there are such assumptions in the wild, but there might be.
> The javadoc for the relevant methods says nothing regarding the operand
> order.
> However, the good thing would be RexNode would feel the same in the
> debugger and in its toString representation.
>
> b) Normalize at RexCall#computeDigest only.
> In other words, keep the operands unsorted, but make sure the digest is
> created as if the operands were sorted.
> This seems to be the most transparent change, however, it might surprise
> that `toString` does not match to whatever is seen in the debugger.
>
> In any case, making `RexCall#toString` print sorted representation would
> alter lots of tests.
> For :core it is like 5540 tests completed, 358 failed, 91 skipped :((
>
> WDYT?
>

I really would love this feature.

Just my 2 cents
Enrico



> Hopefully, making the RexNode representation sorted would reduce the number
> of `$1=$0` vs `$0=$1` plan diffs.
>
> Vladimir
>



Re: Re: [ANNOUNCE] New Calcite PMC chair: Stamatis Zampetakis

2019-12-18 Thread Haisheng Yuan
Congratulations Stamatis!

- Haisheng

--
发件人:Michael Mior
日 期:2019年12月19日 07:23:51
收件人:
主 题:Re: [ANNOUNCE] New Calcite PMC chair: Stamatis Zampetakis

Congratulations Stamatis!
--
Michael Mior
mm...@apache.org

Le mer. 18 déc. 2019 à 16:43, Kevin Risden  a écrit :
>
> Congrats Stamatis!
>
> Kevin Risden
>
>
> On Wed, Dec 18, 2019 at 4:12 PM Francis Chuang 
> wrote:
>
> > Calcite community members,
> >
> > I am pleased to announce that we have a new PMC chair and VP as per our
> > tradition of rotating the chair once a year. I have resigned, and
> > Stamatis was duly elected by the PMC and approved unanimously by the Board.
> >
> > Please join me in congratulating Stamatis!
> >
> > -Francis
> >


Re: [DISCUSS] Tests vs multiline strings

2019-12-14 Thread Haisheng Yuan
The plan becomes not so easier to read, which is more important to me. 
Unless we change $ to other symbol, I am inclined to keep using Java.

I am not sure why schemas in MaterializationTest require column names to be 
quoted, If we can change that, queries will be much easier to read (even for 
multiline strings), like RelOptRulesTest does.

- Haisheng

--
发件人:Vladimir Sitnikov
日 期:2019年12月15日 06:09:37
收件人:Apache Calcite dev list
主 题:[DISCUSS] Tests vs multiline strings

Hi,

Calcite tests sometimes require multiline strings.
For instance: input SQL, output plan.

TL;DR: let's use Kotlin to improve test code readability. What do you think?

Unfortunately, Java <14 lacks multiline strings, so the current tests are
not that pretty :(
They are hard to read, hard to understand, and it is hard to maintain them.

Sample test (from MaterializationTest):

  @Test public void testFilterQueryOnProjectView5() {
checkMaterialize(
"select \"deptno\" - 10 as \"x\", \"empid\" + 1 as ee, \"name\"\n"
+ "from \"emps\"",
"select \"name\", \"empid\" + 1 as e\n"
+ "from \"emps\" where \"deptno\" - 10 = 2",
HR_FKUK_MODEL,
CalciteAssert.checkResultContains(
"EnumerableCalc(expr#0..2=[{inputs}], expr#3=[2], "
+ "expr#4=[=($t0, $t3)], name=[$t2], E=[$t1],
$condition=[$t4])\n"
+ "  EnumerableTableScan(table=[[hr, m0]]"));
  }

What do you think if we migrate those types of tests to Kotlin?
Note: I'm well aware there are Quidem-based tests, and those do not replace
Java-based ones.

Here's how the same test might look like in Kotlin:

@Test
fun `filter query on project view5`() {
checkMaterialize(
"""select "deptno" - 10 as "x", "empid" + 1 as ee, "name" from
"emps" """,
"""select "name", "empid" + 1 as e from "emps" where "deptno" -
10 = 2""",
HR_FKUK_MODEL,
CalciteAssert.checkResultContains(
"""
EnumerableCalc(expr#0..2=[{inputs}], expr#3=[2],
expr#4=[=(${'$'}t0, ${'$'}t3)], name=[${'$'}t2], E=[${'$'}t1],
${'$'}condition=[${'$'}t4])
  EnumerableTableScan(table=[[hr, m0]]
""".trimIndent()
)
)
}

Pros:
* SQLs are much easier to read since all the slashes are gone
* Multiline text is much easier to follow
* ctrl+c/v works much better: I can copy a substring, even multiple lines
from the middle, and it just works
* PR review would be simplified as SQL in PRs would be easier to read

Cons:
* A bit of a sad point is $ is a special symbol there, and it needs to be
escaped.
A (not that?) crazy idea might be to make Calcite use another character
instead of $ (as $ is used in string templating languages quite often
anyway),
* If a class is converted from Java to Kotlin, then default Git annotate
fails to attribute lines to commits (== the file looks like a brand new one)
* Something else?

I don't suggest to go ahead and convert all the things, however, in my
opinion, converting at least some of the tests or using multiline strings
for newly added tests makes sense.

What do you think?

Vladimir



Re: Quicksql

2019-12-11 Thread Haisheng Yuan
Nope, it doesn't use any adapters. It just submits partial SQL query to 
different engines.

If query contains table from single source, e.g.
select count(*) from hive_table1, hive_table2 where a=b;
then the whole query will be submitted to hive.

Otherwise, e.g.
select distinct a,b from hive_table union select distinct a,b from mysql_table;

The following query will be submitted to Spark and executed by Spark:
select a,b from spark_tmp_table1 union select a,b from spark_tmp_table2;

spark_tmp_table1: select distinct a,b from hive_table 
spark_tmp_table2: select distinct a,b from mysql_table

On 2019/12/11 04:27:07, "Juan Pan"  wrote: 
> Hi Haisheng,
> 
> 
> > The query on different data source will then be registered as temp spark 
> > tables (with filter or join pushed in), the whole query is rewritten as SQL 
> > text over these temp tables and submitted to Spark.
> 
> 
> Does it mean QuickSQL also need adaptors to make query executed on different 
> data source? 
> 
> 
> > Yes, virtualization is one of Calcite’s goals. In fact, when I created 
> > Calcite I was thinking about virtualization + in-memory materialized views. 
> > Not only the Spark convention but any of the “engine” conventions (Drill, 
> > Flink, Beam, Enumerable) could be used to create a virtual query engine.
> 
> 
> Basically, i like and agree with Julian’s statement. It is a great idea which 
> personally hope Calcite move towards.
> 
> 
> Give my best wishes to Calcite community. 
> 
> 
> Thanks,
> Trista
> 
> 
>  Juan Pan
> 
> 
> panj...@apache.org
> Juan Pan(Trista), Apache ShardingSphere
> 
> 
> On 12/11/2019 10:53,Haisheng Yuan wrote:
> As far as I know, users still need to register tables from other data sources 
> before querying it. QuickSQL uses Calcite for parsing queries and optimizing 
> logical expressions with several transformation rules. The query on different 
> data source will then be registered as temp spark tables (with filter or join 
> pushed in), the whole query is rewritten as SQL text over these temp tables 
> and submitted to Spark.
> 
> - Haisheng
> 
> --
> 发件人:Rui Wang
> 日 期:2019年12月11日 06:24:45
> 收件人:
> 主 题:Re: Quicksql
> 
> The co-routine model sounds fitting into Streaming cases well.
> 
> I was thinking how should Enumerable interface work with streaming cases
> but now I should also check Interpreter.
> 
> 
> -Rui
> 
> On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:
> 
> The goal (or rather my goal) for the interpreter is to replace
> Enumerable as the quick, easy default convention.
> 
> Enumerable is efficient but not that efficient (compared to engines
> that work on off-heap data representing batches of records). And
> because it generates java byte code there is a certain latency to
> getting a query prepared and ready to run.
> 
> It basically implements the old Volcano query evaluation model. It is
> single-threaded (because all work happens as a result of a call to
> 'next()' on the root node) and cannot handle branching data-flow
> graphs (DAGs).
> 
> The Interpreter operates uses a co-routine model (reading from queues,
> writing to queues, and yielding when there is no work to be done) and
> therefore could be more efficient than enumerable in a single-node
> multi-core system. Also, there is little start-up time, which is
> important for small queries.
> 
> I would love to add another built-in convention that uses Arrow as
> data format and generates co-routines for each operator. Those
> co-routines could be deployed in a parallel and/or distributed data
> engine.
> 
> Julian
> 
> On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
>  wrote:
> 
> What is the ultimate goal of the Calcite Interpreter?
> 
> To provide some context, I have been playing around with calcite + REST
> (see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest <
> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
> detail of my experiments)
> 
> 
> —Z
> 
> On Dec 9, 2019, at 9:05 PM, Julian Hyde  wrote:
> 
> Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
> 
> See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> <
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-fra

Re: Re: Quicksql

2019-12-10 Thread Haisheng Yuan
As far as I know, users still need to register tables from other data sources 
before querying it. QuickSQL uses Calcite for parsing queries and optimizing 
logical expressions with several transformation rules. The query on different 
data source will then be registered as temp spark tables (with filter or join 
pushed in), the whole query is rewritten as SQL text over these temp tables and 
submitted to Spark.

- Haisheng

--
发件人:Rui Wang
日 期:2019年12月11日 06:24:45
收件人:
主 题:Re: Quicksql

The co-routine model sounds fitting into Streaming cases well.

I was thinking how should Enumerable interface work with streaming cases
but now I should also check Interpreter.


-Rui

On Tue, Dec 10, 2019 at 1:33 PM Julian Hyde  wrote:

> The goal (or rather my goal) for the interpreter is to replace
> Enumerable as the quick, easy default convention.
>
> Enumerable is efficient but not that efficient (compared to engines
> that work on off-heap data representing batches of records). And
> because it generates java byte code there is a certain latency to
> getting a query prepared and ready to run.
>
> It basically implements the old Volcano query evaluation model. It is
> single-threaded (because all work happens as a result of a call to
> 'next()' on the root node) and cannot handle branching data-flow
> graphs (DAGs).
>
> The Interpreter operates uses a co-routine model (reading from queues,
> writing to queues, and yielding when there is no work to be done) and
> therefore could be more efficient than enumerable in a single-node
> multi-core system. Also, there is little start-up time, which is
> important for small queries.
>
> I would love to add another built-in convention that uses Arrow as
> data format and generates co-routines for each operator. Those
> co-routines could be deployed in a parallel and/or distributed data
> engine.
>
> Julian
>
> On Tue, Dec 10, 2019 at 3:47 AM Zoltan Farkas
>  wrote:
> >
> > What is the ultimate goal of the Calcite Interpreter?
> >
> > To provide some context, I have been playing around with calcite + REST
> (see https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest <
> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest> for
> detail of my experiments)
> >
> >
> > —Z
> >
> > > On Dec 9, 2019, at 9:05 PM, Julian Hyde  wrote:
> > >
> > > Yes, virtualization is one of Calcite’s goals. In fact, when I created
> Calcite I was thinking about virtualization + in-memory materialized views.
> Not only the Spark convention but any of the “engine” conventions (Drill,
> Flink, Beam, Enumerable) could be used to create a virtual query engine.
> > >
> > > See e.g. a talk I gave in 2013 about Optiq (precursor to Calcite)
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> <
> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
> >.
> > >
> > > Julian
> > >
> > >
> > >
> > >> On Dec 9, 2019, at 2:29 PM, Muhammad Gelbana 
> wrote:
> > >>
> > >> I recently contacted one of the active contributors asking about the
> > >> purpose of the project and here's his reply:
> > >>
> > >> From my understanding, Quicksql is a data virtualization platform. It
> can
> > >>> query multiple data sources altogether and in a distributed way;
> Say, you
> > >>> can write a SQL with a MySql table join with an Elasticsearch table.
> > >>> Quicksql can recognize that, and then generate Spark code, in which
> it will
> > >>> fetch the MySQL/ES data as a temporary table separately, and then
> join them
> > >>> in Spark. The execution is in Spark so it is totally distributed.
> The user
> > >>> doesn't need to aware of where the table is from.
> > >>>
> > >>
> > >> I understand that the Spark convention Calcite has attempts to
> achieve the
> > >> same goal, but it isn't fully implemented yet.
> > >>
> > >>
> > >> On Tue, Oct 29, 2019 at 9:43 PM Julian Hyde  wrote:
> > >>
> > >>> Anyone know anything about Quicksql? It seems to be quite a popular
> > >>> project, and they have an internal fork of Calcite.
> > >>>
> > >>> https://github.com/Qihoo360/ 
> > >>>
> > >>>
> > >>>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> > >>> <
> > >>>
> https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
> > 
> > >>>
> > >>> Julian
> > >>>
> > >>>
> > >
> >
>



Re: Re: Re: Re: Volcano's problem with trait propagation: current state and future

2019-12-09 Thread Haisheng Yuan
Hi Vladimir,

Sorry for my late reply.
WRT join planning, it is not required to put join reordering rule into the HEP 
planner. It can also be put into Volcano planner. Indeed, it is not ideal for 
the join ordering rule to generate a single plan. We can create a nother rule 
to generate multiple alternatives and put the rule into Volcano planner. This 
way you can get what you want.

The pull-up trait is not the essence of on-demand trait request, the main idea 
is link [1]. 

>> 4.1) If the logical cost exceeds "maxCost", we stop and return. The whole 
>> logical subspace is pruned even before exploration.
In many cases, the search space you pruned is just the specific operator, 
because the child operator should be a MEMO group, other parent operators might 
still need to explore it, especially when the JoinReorderingRule only generate 
a single logical optimal join order.

>> 4.2) Returned physical children are already registered in proper set/subset, 
>> but are not used for any pattern-matching, and doesn't trigger more rule 
>> calls!
That is the problem of Calcite's default behaviour. Most of the rules' default 
INSTANCE provided by Calcite not only match logical operators but also physical 
operators. I am against that. I am not sure if you have created your own rule 
instances or not.

>> 4.3) Implementation rule checks the cost of the physical child.
During implementation rule, it is possiple that we are not able to calculate 
the cost yet. Depending on the rule match order, if it is top-down rule 
matching, the child operators are still logical. If it is bottom-up rule 
matching, the child operators are still not enforced, say we generate a 
MergeJoin with 2 children not sorted yet, how do we estimate the cost?

>> If it is greater than any other already observed child with the same traits
How can we observe it inside the implementation rule?

[1] 
http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e

- Haisheng

--
发件人:Vladimir Ozerov
日 期:2019年12月06日 18:00:01
收件人:Haisheng Yuan
抄 送:dev@calcite.apache.org (dev@calcite.apache.org)
主 题:Re: Re: Re: Volcano's problem with trait propagation: current state and 
future

"all we know is their collations" -> "all we know is their traits"

пт, 6 дек. 2019 г. в 12:57, Vladimir Ozerov :

Hi Haisheng,

Thank you for your response. Let me elaborate my note on join planning first - 
what I was trying to say is not that rules on their own have some deficiencies. 
What I meant is that with current planner implementation, users tend to 
separate join planning from the core optimization process like this in the 
pseudo-code below. As a result, only one join permutation is considered during 
physical planning, even though join rule may potentially generate multiple 
plans worth exploring:

RelNode optimizedLogicalNode = doJoinPlanning(logicalNode);
RelNode physicalNode = doPhysicalPlanning(optimizedLogicalNode);

Now back to the main question. I re-read your thread about on-demand trait 
propagation [1] carefully. I'd like to admit that when I was reading it for the 
first time about a month ago, I failed to understand some details due to poor 
knowledge of different optimizer architectures. Now I understand it much 
better, and we definitely concerned with exactly the same problem. I feel that 
trait pull-up might be a step in the right direction, however, it seems to me 
that it is not the complete solution. Let me try to explain why I think so.

The efficient optimizer should try to save CPU as much as possible because it 
allows us to explore more plans in a sensible amount of time. To achieve that 
we should avoid redundant operations, and detect and prune inefficient paths 
aggressively. As far as I understand the idea of trait pull-up, we essentially 
explore the space of possible physical properties of children nodes without 
forcing their implementation. But after that, the Calcite will explore that 
nodes again, now in order to execute implementation rules. I.e. we will do two 
dives - one to enumerate the nodes (trait pull-up API), and the other one to 
implement them (implementation rules), while in Cascades one dive should be 
sufficient since exploration invokes the implementation rules as it goes. This 
is the first issue I see.

The second one is more important - how to prune inefficient plans? Currently, 
nodes are implemented independently and lack of context doesn't allow us to 
estimates children's costs when implementing the parent, hence branch-and-bound 
is not possible. Can trait pull-up API "List 
deriveTraitSets(RelNode, RelMetadataQuery)" help us with this? If the children 
nodes are not implemented before the pull-up, all we know is their collations, 
but not their costs

[jira] [Created] (CALCITE-3576) Remove Enumerable convention check in FilterIntoJoinRule

2019-12-06 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3576:
--

 Summary: Remove Enumerable convention check in FilterIntoJoinRule
 Key: CALCITE-3576
 URL: https://issues.apache.org/jira/browse/CALCITE-3576
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan
 Fix For: 1.22.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Re: Volcano's problem with trait propagation: current state and future

2019-12-05 Thread Haisheng Yuan
Oh, I forgot to mention that the join planning/reordering is not a big problem. 
Calcite just use the rule to generate a single alternative plan, which is not 
ideal.  But we can't say Calcite is doing wrong.

Ideally, we expect it generates multiple (neither all, nor single) bipartie 
graphs. The join reordering rule will cut each part into bipartie recursively 
and apply JoinCommutativity rule to generate more alternatives for each RelSet. 
It is just a different strategy. We can modify the rule, or create new join 
reordering rule to generate multiple plan alternatives.

- Haisheng

--
发件人:Haisheng Yuan
日 期:2019年12月06日 09:07:43
收件人:Vladimir Ozerov; dev@calcite.apache.org 
(dev@calcite.apache.org)
主 题:Re: Re: Volcano's problem with trait propagation: current state and future

Generally agree with what Vladimir said. I think what Calcite has is logical 
optimization or exploration, there are almost no physical optimization, Calcite 
leaves it to third party implementators. One of my friends at University of 
Wisconsin–Madison database research group told me that they gave up the idea of 
using Calcite in their project due to this reason.

Currently physical properties are requested in implementation rules, or even 
logical exploration rules, But each rule is independent, the pattern-matched 
expression is not aware of what does the parent operator want. Using 
AbstractConverter doesn't help, and is not promising.

>> You shouldn't regiester all logical rules in the planner simultaneously,... 
>> as Drill does.
That is because Calcite does too many redundant or duplicate rule matching, 
like all kinds of transpose (can't be avoided due to current design), matching 
physical operators.

>> decoupling the logical planning from the physical one looks a bit weird to 
>> me because it violates the idea of Cascades framework.
Orca optimizer fully adopted the design principle of Cascades framework except 
that it separates into 3 phases: logical exploration, physical implementation, 
and optimization (property enforcing). And it might be easier if we want to 
enable parallel optimization by seperating into 3 phases. Orca does 
branch-and-bound in optimization phase, before actual property derivation and 
enforcement, IIRC. It is highly efficient, works pretty well, and 
battlefield-tested by many large financial and insurance companies.

In my last thread about on-demand trait request, I gave the high-level general 
API for physical operators to derive and require physical properties, which is 
similar to Orca's design. But seems like the proposal of API change gets no 
love.

- Haisheng

--
发件人:Vladimir Ozerov
日 期:2019年12月05日 22:22:43
收件人:dev@calcite.apache.org (dev@calcite.apache.org)
主 题:Re: Volcano's problem with trait propagation: current state and future

AbstractConverter-s are attractive because they effectively emulate
straightforward recursive top-down optimization (Volcano/Cascades). But
instead of doing it with a recursive method call, which preserves the
context, we do this in Calcite as a sequence of unrelated rule calls, thus
losing the context. So with my current understanding, it could be thought
of not as a search space explosion, but rather than the inefficient
implementation of an otherwise straightforward parent->child->parent
navigation, since we achieve this navigation indirectly through the rule
queue, rather than through a normal method call. In any case, the net
result is wasted CPU. Perhaps this is not exponential waste, but some
multiplication of otherwise optimal planning. As I mentioned, in our
experiments with TPC-H, we observed a constant factor between 6-9x between
the number of joins and the number of join implementation rule invocations.
It doesn't growth past 9 even for complex queries, so I hope that this is
not an exponent :-)

Speaking of logical vs physical optimization, IMO it makes sense in some
cases. E.g. when doing predicate pushdown, you do not want to consider
intermediate logical tree states for implementation, until the predicate
reaches its final position. So running separate logical planning phase with
Volcano optimizer makes total sense to me, because it effectively prunes a
lot of not optimal logical plans before they reach the physical planning
stage. The real problem to me is that we forced to remove join planning
from the physical optimization stage. Because the goal of join planning not
to generate a single optimal plan, like with predicate pushdown, but rather
to generate a set of logical plans all of which should be implemented and
estimated. And with AbstractConverter-s this is not possible because of
their multiplicator increases the rate of search space growth, making join
planning inapplicable even for the small number of relations. So we have to
mov

Re: Re: Volcano's problem with trait propagation: current state and future

2019-12-05 Thread Haisheng Yuan
Generally agree with what Vladimir said. I think what Calcite has is logical 
optimization or exploration, there are almost no physical optimization, Calcite 
leaves it to third party implementators. One of my friends at University of 
Wisconsin–Madison database research group told me that they gave up the idea of 
using Calcite in their project due to this reason.

Currently physical properties are requested in implementation rules, or even 
logical exploration rules, But each rule is independent, the pattern-matched 
expression is not aware of what does the parent operator want. Using 
AbstractConverter doesn't help, and is not promising.

>> You shouldn't regiester all logical rules in the planner simultaneously,... 
>> as Drill does.
That is because Calcite does too many redundant or duplicate rule matching, 
like all kinds of transpose (can't be avoided due to current design), matching 
physical operators.

>> decoupling the logical planning from the physical one looks a bit weird to 
>> me because it violates the idea of Cascades framework.
Orca optimizer fully adopted the design principle of Cascades framework except 
that it separates into 3 phases: logical exploration, physical implementation, 
and optimization (property enforcing). And it might be easier if we want to 
enable parallel optimization by seperating into 3 phases. Orca does 
branch-and-bound in optimization phase, before actual property derivation and 
enforcement, IIRC. It is highly efficient, works pretty well, and 
battlefield-tested by many large financial and insurance companies.

In my last thread about on-demand trait request, I gave the high-level general 
API for physical operators to derive and require physical properties, which is 
similar to Orca's design. But seems like the proposal of API change gets no 
love.

- Haisheng

--
发件人:Vladimir Ozerov
日 期:2019年12月05日 22:22:43
收件人:dev@calcite.apache.org (dev@calcite.apache.org)
主 题:Re: Volcano's problem with trait propagation: current state and future

AbstractConverter-s are attractive because they effectively emulate
straightforward recursive top-down optimization (Volcano/Cascades). But
instead of doing it with a recursive method call, which preserves the
context, we do this in Calcite as a sequence of unrelated rule calls, thus
losing the context. So with my current understanding, it could be thought
of not as a search space explosion, but rather than the inefficient
implementation of an otherwise straightforward parent->child->parent
navigation, since we achieve this navigation indirectly through the rule
queue, rather than through a normal method call. In any case, the net
result is wasted CPU. Perhaps this is not exponential waste, but some
multiplication of otherwise optimal planning. As I mentioned, in our
experiments with TPC-H, we observed a constant factor between 6-9x between
the number of joins and the number of join implementation rule invocations.
It doesn't growth past 9 even for complex queries, so I hope that this is
not an exponent :-)

Speaking of logical vs physical optimization, IMO it makes sense in some
cases. E.g. when doing predicate pushdown, you do not want to consider
intermediate logical tree states for implementation, until the predicate
reaches its final position. So running separate logical planning phase with
Volcano optimizer makes total sense to me, because it effectively prunes a
lot of not optimal logical plans before they reach the physical planning
stage. The real problem to me is that we forced to remove join planning
from the physical optimization stage. Because the goal of join planning not
to generate a single optimal plan, like with predicate pushdown, but rather
to generate a set of logical plans all of which should be implemented and
estimated. And with AbstractConverter-s this is not possible because of
their multiplicator increases the rate of search space growth, making join
planning inapplicable even for the small number of relations. So we have to
move them to the logical planning stage and pick only one permutation for
physical planning.


чт, 5 дек. 2019 г. в 15:35, Roman Kondakov :

> Vladimir,
>
> thank you for bringing it up. We are facing the same problems in Apache
> Ignite project
> and it would be great if Apache Calcite community will propose a
> solution for this
> issue.
>
> From my point of view an approach with abstract converters looks more
> promising, but as
> you mentioned it suffers from polluting the search space. The latter can
> be mitigated by
> splitting a planning stage into the several phases: you shouldn't
> register all logical rules in the planner simultaneously - it looks like
> it is better to have several iterations of planning stage with different
> sets of rules, as Drill does.
>
> Also I'd like to mention that decoupling the logical planning from the
> physical one looks
> a bit weird to me because it violates the idea of Cascad

Re: Release managers

2019-11-29 Thread Haisheng Yuan
Hi Stamatis,

I can volunteer to be the release manager for v1.23 or v1.24.

- Haisheng

--
发件人:Stamatis Zampetakis
日 期:2019年11月30日 00:53:34
收件人:
主 题:Release managers

Hi all,

The pre-selection of release managers [1] has worked very well so far. It
would be great if we could have some volunteers for the next releases.

Release Target date Release manager
=== === ===
1.192019-03Kevin
1.202019-06Michael
1.212019-09Stamatis
=== === ===
1.222019-12Andrei
1.232020-02
1.242020-04
1.252020-06
1.262020-08

As a reminder the RM does not have to be a PMC member (commitership is
necessary).

Best,
Stamatis

[1]
https://lists.apache.org/thread.html/c6fba3f6585139ba6919baf71835d32eeea8ca187621aa5c06a26f8c@%3Cdev.calcite.apache.org%3E



Re: Question about SortJoinTransposeRule and Inner Joins

2019-11-25 Thread Haisheng Yuan
Yes, we can.

Currently the rule applies only on left/right outer join, because the 
cardinality of join will always be greater than or equal with the cardinality 
of outer relation. For inner join, the join cardinality may be much less, in 
which case, sorting on the join output might be cheaper. You can change the 
code to apply to inner join too.

But be aware that this rule is a hack, because it may generate useless plan, 
when the inner join is implemented as a MergeJoin that breaks the ordering of 
the input.

- Haisheng

--
发件人:Scott Reynolds
日 期:2019年11月26日 07:28:40
收件人:
主 题:Question about SortJoinTransposeRule and Inner Joins

The performance of our queries are dependent on our ability to push the
filter and sort into the RPC layer. Today the planner's
SortJoinTransposeRule

pushes the sort through the join for LEFT OUTER and RIGHT OUTER  joins. The
logic comment states the following
// 1) If join is not a left or right outer, we bail out
// 2) If sort is not a trivial order-by, and if there is
// any sort column that is not part of the input where the
// sort is pushed, we bail out
// 3) If sort has an offset, and if the non-preserved side
// of the join is not count-preserving against the join
// condition, we bail out

I am wondering if we can actually push the sort down through the INNER JOIN
if all the sort conditions are on one side of the join.

SELECT b.title, b.published_date, b.sales
FROM Book b
INNER JOIN Author a ON b.author = a.id
ORDER BY b.published_date, b.sales, b.title



Re: Re: [DISCUSS] New Apache Calcite Chair

2019-11-25 Thread Haisheng Yuan
Thank you for your service as a chair, Francis, I really appreciate it.

+1 for Stamatis, who has been instrumental in issue discussions, answering 
questions and making Calcite a more friendly community together with other 
contributors. Looking forward to working with you more closely.

- Haisheng

--
发件人:Rui Wang
日 期:2019年11月26日 04:02:25
收件人:
主 题:Re: [DISCUSS] New Apache Calcite Chair

+1!

Thanks for Francis's contribution as PMC chair in the past year. Looking
forward to Stamatisto to be PMC chair in the next year!


On Sun, Nov 24, 2019 at 6:57 PM Chunwei Lei  wrote:

> Francis, thank you for your great job. I really respect it.
>
> I am +1 for Stamatis being the PMC chair. He is such a humble,
> energetic person.
> I believe he can bring more excellent ideas about our project.
>
>
>
>
>
> Best,
> Chunwei
>
>
> On Mon, Nov 25, 2019 at 9:26 AM Danny Chan  wrote:
>
> > Thanks for the great work, Francis, you are a so nice person with mild
> > personality. Generally you keep the project in pretty good shape ~
> >
> > I’m + 1 for Stamatis being the PMC chair, he is always knowledgeable,
> also
> > he made some bylaws for our project.
> >
> > Best,
> > Danny Chan
> > 在 2019年11月25日 +0800 AM9:13,Julian Hyde ,写道:
> > > First of all, thank you, Francis, for your year of service as chair.
> The
> > project is busier than ever, and also more welcoming than ever to new
> > members. Your personality has set the tone for the project, and your
> energy
> > has driven some much-needed modernization.
> > >
> > > I believe Stamatis would be an excellent choice, and that he should be
> > the candidate on the ballot.
> > >
> > > I want to call out Danny Chan and Haisheng Yuan. They have been very
> > active and helpful in the project over the past few months, have become
> PMC
> > members, and maybe we should be consider them for chairs in future years.
> > >
> > > Julian
> > >
> > >
> > > > On Nov 21, 2019, at 1:48 PM, Francis Chuang <
> francischu...@apache.org>
> > wrote:
> > > >
> > > > Hey everyone,
> > > >
> > > > It's Calcite's tradition to rotate the chair every year. When we had
> > the State of the Project discussion [1] a month ago, there was some
> > consensus towards nominating Stamatis Zampetakis as our next chair.
> > > >
> > > > Stamatis has been a prolific contributor to the project and I believe
> > he would be a very good choice to chair the project in the coming year.
> > > >
> > > > Please let us know your thoughts!
> > > >
> > > > Francis
> > > >
> > > > [1]
> >
> https://lists.apache.org/thread.html/e85dbc88546687571c5dd9fd999a19d0eae52004c3b3e5b92c67d9bb@%3Cdev.calcite.apache.org%3E
> > >
> >
>



Re: Re: Re: CALCITE-2905: Maven -> Gradle: any thoughts

2019-11-21 Thread Haisheng Yuan
Thanks, Danny. Will try with newer version.

- Haisheng

--
发件人:Danny Chan
日 期:2019年11月21日 10:50:12
收件人:
主 题:Re: Re: CALCITE-2905: Maven -> Gradle: any thoughts

Haisheng, upgrade to IDEA 2019.3 EAP solves the problem for me.

Best,
Danny Chan
在 2019年11月21日 +0800 AM9:24,Haisheng Yuan ,写道:
> Hi Vladimir,
>
> I had the same error:
> Unable to load class 'org.gradle.api.internal.plugins.DefaultConvention'.
>
> IntelliJ IDEA 2018.3.4 (Community Edition)
> Build #IC-183.5429.30, built on January 28, 2019
> JRE: 1.8.0_152-release-1343-b26 x86_64
> JVM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
> macOS 10.14.4
>
> - Haisheng
>
> --
> 发件人:Danny Chan
> 日 期:2019年11月20日 17:31:18
> 收件人:
> 主 题:Re: CALCITE-2905: Maven -> Gradle: any thoughts
>
> When I use the IDEA 2019.9 version, it works ~
>
> But the travis test is failing for the slow test: here is my log 
> https://travis-ci.org/apache/calcite/jobs/614406384?utm_medium=notification&utm_source=github_status
>
> Best,
> Danny Chan
> 在 2019年11月20日 +0800 PM2:28,Vladimir Sitnikov ,写道:
> > > Here is the stacktrace:
> > java.lang.NoClassDefFoundError: org/gradle/api/internal/plugin
> > s/DefaultConvention
> > at org.jetbrains.plugins.gradle.tooling.builder.ProjectExtensio
> > nsDataBuilderImpl.buildAll(ProjectExtensionsDataBuilderImpl.groovy:50)
> >
> > What is IDEA version?
> >
> > Vladimir
>



Re: Re: CALCITE-2905: Maven -> Gradle: any thoughts

2019-11-20 Thread Haisheng Yuan
Hi Vladimir,

I had the same error:
Unable to load class 'org.gradle.api.internal.plugins.DefaultConvention'.

IntelliJ IDEA 2018.3.4 (Community Edition)
Build #IC-183.5429.30, built on January 28, 2019
JRE: 1.8.0_152-release-1343-b26 x86_64
JVM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
macOS 10.14.4

- Haisheng

--
发件人:Danny Chan
日 期:2019年11月20日 17:31:18
收件人:
主 题:Re: CALCITE-2905: Maven -> Gradle: any thoughts

When I use the IDEA 2019.9 version, it works ~

But the travis test is failing for the slow test: here is my log 
https://travis-ci.org/apache/calcite/jobs/614406384?utm_medium=notification&utm_source=github_status

Best,
Danny Chan
在 2019年11月20日 +0800 PM2:28,Vladimir Sitnikov ,写道:
> > Here is the stacktrace:
> java.lang.NoClassDefFoundError: org/gradle/api/internal/plugin
> s/DefaultConvention
> at org.jetbrains.plugins.gradle.tooling.builder.ProjectExtensio
> nsDataBuilderImpl.buildAll(ProjectExtensionsDataBuilderImpl.groovy:50)
>
> What is IDEA version?
>
> Vladimir



[jira] [Created] (CALCITE-3521) CalciteSystemProperty can't load config file

2019-11-19 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3521:
--

 Summary: CalciteSystemProperty can't load config file
 Key: CALCITE-3521
 URL: https://issues.apache.org/jira/browse/CALCITE-3521
 Project: Calcite
  Issue Type: Bug
Reporter: Haisheng Yuan


I am using calcite as a standalone library, want to specify default charset and 
collation through global config file {{saffron.properties}}.
But it failed to load property file. I think it should use 
{{Thread.currentThread().getContextClassLoader()}} to get class loader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: [DISCUSS] Disable Travis and Appveyor for Avatica and Avatica-Go

2019-11-18 Thread Haisheng Yuan
Agree, AppVeyor takes more time for queueing.
We can remove AppVeyor and keep Travis for a while.

- Haisheng

--
发件人:Vladimir Sitnikov
日 期:2019年11月19日 05:42:54
收件人:Apache Calcite dev list
主 题:Re: [DISCUSS] Disable Travis and Appveyor for Avatica and Avatica-Go

Francis, thanks for raising this up.

+1 for disabling AppVeyor

I think we should need to keep Travis around for a while to evaluate how
GitHub Actions work.
Frankly speaking, I find Travis logs more responsive and a bit easier to
see at the moment (e.g. they are colored, etc, etc).

>integration in Avatica fails on every commit because the appvoyer.yml
>was removed.

I tried to keep Appveyor for Calcite, and in practice the queueing time
there is extremely high.
It can easily be Travis: ok, Actions: ok, AppVeyor: queued :-/

Vladimir



Re: Top-down pass of an optimized plan

2019-11-12 Thread Haisheng Yuan
Hi Makis,

You can use HepPlanner and your customized transoformation rule to rewrite the 
optimized plan. Top-down or Bottom-up can also be specified.
Many systems have this kind of post processing phase. 

- Haisheng

--
发件人:Serafeim (Makis) Papadias
日 期:2019年11月13日 05:37:37
收件人:
主 题:Top-down pass of an optimized plan

Dear all,

I am performing a Volcano optimisation pass first and I receive the root of the 
optimised plan from the standard Program in [1]. Afterwards, I want to perform 
a top-down pass to the optimised plan and change it a bit by placing certain 
operators at specific points.

Do you have any ideas on how to do this properly in Calcite?

Thanks in advance.

Best,
Makis


[1] — https://calcite.apache.org/apidocs/org/apache/calcite/tools/Programs.html 



Re: Re: [ANNOUNCE] Haisheng Yuan joins Calcite PMC

2019-11-11 Thread Haisheng Yuan
Thanks, all!

It's my great honour to be a member of Apache Calcite project. Let's 
collaborate to continue working on it and make it better.

Thanks again.
- Haisheng

--
发件人:XING JIN
日 期:2019年11月12日 10:20:16
收件人:
主 题:Re: [ANNOUNCE] Haisheng Yuan joins Calcite PMC

Congratulations Haisheng ~
You well deserved !

Kevin Risden  于2019年11月12日周二 上午3:13写道:

> Congrats and welcome!
>
> Kevin Risden
>
>
> On Mon, Nov 11, 2019 at 2:10 PM Chunhui Shi  wrote:
>
> > Congratulations!
> >
> > On Mon, Nov 11, 2019 at 10:09 AM Jinfeng Ni  wrote:
> >
> > > Congratulations!
> > >
> > >
> > > On Tue, Nov 12, 2019 at 1:23 AM Rui Wang  wrote:
> > > >
> > > > Congrats HaiSheng!
> > > >
> > > >
> > > > -Rui
> > > >
> > > > On Mon, Nov 11, 2019 at 8:05 AM Stamatis Zampetakis <
> zabe...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Congrats Haisheng!
> > > > >
> > > > > Reviews, code contributions, design discussions, helping users, and
> > > many
> > > > > more things for improving the project.
> > > > >
> > > > > Personally, I also learn a lot from our interactions.
> > > > >
> > > > > All these are much appreciated; keep it up!!
> > > > >
> > > > > Best,
> > > > > Stamatis
> > > > >
> > > > > On Mon, Nov 11, 2019, 4:17 PM Michael Mior 
> wrote:
> > > > >
> > > > > > Welcome and congratulations HaiSheng!
> > > > > > --
> > > > > > Michael Mior
> > > > > > mm...@apache.org
> > > > > >
> > > > > > Le dim. 10 nov. 2019 à 22:45, Francis Chuang
> > > > > >  a écrit :
> > > > > > >
> > > > > > > I'm pleased to announce that Haisheng has accepted an
> invitation
> > to
> > > > > > > join the Calcite PMC. Haisheng has been a consistent and
> helpful
> > > > > > > figure in the Calcite community for which we are very grateful.
> > We
> > > > > > > look forward to the continued contributions and support.
> > > > > > >
> > > > > > > Please join me in congratulating Haisheng!
> > > > > > >
> > > > > > > - Francis (on behalf of the Calcite PMC)
> > > > > >
> > > > >
> > >
> >
>



[jira] [Created] (CALCITE-3492) Exception thrown when terms has 1 RexNode in RexUtil.simplifyOrs()

2019-11-11 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3492:
--

 Summary: Exception thrown when terms has 1 RexNode in 
RexUtil.simplifyOrs()
 Key: CALCITE-3492
 URL: https://issues.apache.org/jira/browse/CALCITE-3492
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


This is a regression caused by the refactoring in 1.20.0.

When {{terms}} has only 1 element, it will throw an exception. We discovered 
the issue when trying to upgrade Calcite.
{code:java}
   public static RexNode simplifyOrs(RexBuilder rexBuilder,
   List terms) {
 return new RexSimplify(rexBuilder, RelOptPredicateList.EMPTY, EXECUTOR)
.simplifyUnknownAs(rexBuilder.makeCall(SqlStdOperatorTable.OR, terms),
 RexUnknownAs.UNKNOWN);
   }
{code}
We can't assume the {{terms}} always has more than 1 element. Although this 
method is deprecated, I think we need to correct the regression. Maybe we don't 
need a test case, given it is a deprecated method.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Re: Optimizer: All the inputs have relevant nodes, however the cost is still infinite.

2019-11-08 Thread Haisheng Yuan
Yes, looks like EnumerableTableFunctionScan doesn't override computeSelfCost.

- Haisheng

--
发件人:Haisheng Yuan
日 期:2019年11月09日 04:01:19
收件人:Apache Calcite dev list
主 题:Re: Re: Optimizer: All the inputs have relevant nodes, however the cost is 
still infinite.

It is not surprising to get an infinitive cost, since the operators in the plan 
are logical operators, which need to be converted to physical operators to be 
costed. Did you try to add some implementation rules to the rule set, e.g. 
EnumerableProjectRule, EnumerableTableFunctionScanRule, etc..

- Haisheng

--
发件人:Xiening Dai
日 期:2019年11月09日 03:35:03
收件人:
主 题:Re: Optimizer: All the inputs have relevant nodes, however the cost is 
still infinite.

Are you able to provide a test case to repro on current code base?

From the error message, the given subset’s best RelNode is not available, aka 
cost is infinite. There could be multiple cases leading to it.


> On Nov 8, 2019, at 11:28 AM, Rui Wang  wrote:
> 
> My current guess is it could because of `computeSelfCost` of
> `EnumerableTableFunctionScan`.
> 
> 
> -Rui
> 
> On Fri, Nov 8, 2019 at 10:04 AM Rui Wang  wrote:
> 
>> Hello community:
>> 
>> I am stuck an optimizer error while working on Calcite-3272. I am less
>> familiar with Calcite's optimizer and struggled to debug the following
>> error message. Is there some hints on what direction I should go? I can
>> also provide more information if that is helpful.
>> 
>> 
>> 
>>> java.sql.SQLException: Error while executing SQL "select * FROM
>> TABLE(TUMBLE(TABLE ORDERS, 'ROWTIME', INTERVAL '1' MINUTE))": There are not
>> enough rules to produce a node with desired properties:
>> convention=ENUMERABLE, sort=[]. All the inputs have relevant nodes, however
>> the cost is still infinite.
>>> Root: rel#26:Subset#2.ENUMERABLE.[]
>>> Original rel:
>>> LogicalProject(ROWTIME=[$0], ID=[$1], PRODUCT=[$2], UNITS=[$3],
>> wstart=[$4], wend=[$5]): rowcount = 100.0, cumulative cost =
>> {1.7976931348623157E308 rows, 1.7976931348623157E308 cpu,
>> 1.7976931348623157E308 io}, id = 12
>>>  LogicalTableFunctionScan(invocation=[TUMBLE($3, 'ROWTIME',
>> 6:INTERVAL MINUTE)], rowType=[RecordType(TIMESTAMP(0) ROWTIME, INTEGER
>> ID, VARCHAR(10) PRODUCT, INTEGER UNITS, TIMESTAMP(0) wstart, TIMESTAMP(0)
>> wend)]): rowcount = 100.0, cumulative cost = {1.7976931348623157E308 rows,
>> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}, id = 10
>>>LogicalProject(ROWTIME=[$0], ID=[$1], PRODUCT=[$2], UNITS=[$3]):
>> rowcount = 100.0, cumulative cost = {200.0 rows, 501.0 cpu, 0.0 io}, id = 8
>>>  LogicalTableScan(table=[[ORINOCO, ORDERS]]): rowcount = 100.0,
>> cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 2
>> 
>> 
>> 
>> -Rui
>> 



Re: Re: Optimizer: All the inputs have relevant nodes, however the cost is still infinite.

2019-11-08 Thread Haisheng Yuan
It is not surprising to get an infinitive cost, since the operators in the plan 
are logical operators, which need to be converted to physical operators to be 
costed. Did you try to add some implementation rules to the rule set, e.g. 
EnumerableProjectRule, EnumerableTableFunctionScanRule, etc..

- Haisheng

--
发件人:Xiening Dai
日 期:2019年11月09日 03:35:03
收件人:
主 题:Re: Optimizer: All the inputs have relevant nodes, however the cost is 
still infinite.

Are you able to provide a test case to repro on current code base?

From the error message, the given subset’s best RelNode is not available, aka 
cost is infinite. There could be multiple cases leading to it.


> On Nov 8, 2019, at 11:28 AM, Rui Wang  wrote:
> 
> My current guess is it could because of `computeSelfCost` of
> `EnumerableTableFunctionScan`.
> 
> 
> -Rui
> 
> On Fri, Nov 8, 2019 at 10:04 AM Rui Wang  wrote:
> 
>> Hello community:
>> 
>> I am stuck an optimizer error while working on Calcite-3272. I am less
>> familiar with Calcite's optimizer and struggled to debug the following
>> error message. Is there some hints on what direction I should go? I can
>> also provide more information if that is helpful.
>> 
>> 
>> 
>>> java.sql.SQLException: Error while executing SQL "select * FROM
>> TABLE(TUMBLE(TABLE ORDERS, 'ROWTIME', INTERVAL '1' MINUTE))": There are not
>> enough rules to produce a node with desired properties:
>> convention=ENUMERABLE, sort=[]. All the inputs have relevant nodes, however
>> the cost is still infinite.
>>> Root: rel#26:Subset#2.ENUMERABLE.[]
>>> Original rel:
>>> LogicalProject(ROWTIME=[$0], ID=[$1], PRODUCT=[$2], UNITS=[$3],
>> wstart=[$4], wend=[$5]): rowcount = 100.0, cumulative cost =
>> {1.7976931348623157E308 rows, 1.7976931348623157E308 cpu,
>> 1.7976931348623157E308 io}, id = 12
>>>  LogicalTableFunctionScan(invocation=[TUMBLE($3, 'ROWTIME',
>> 6:INTERVAL MINUTE)], rowType=[RecordType(TIMESTAMP(0) ROWTIME, INTEGER
>> ID, VARCHAR(10) PRODUCT, INTEGER UNITS, TIMESTAMP(0) wstart, TIMESTAMP(0)
>> wend)]): rowcount = 100.0, cumulative cost = {1.7976931348623157E308 rows,
>> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}, id = 10
>>>LogicalProject(ROWTIME=[$0], ID=[$1], PRODUCT=[$2], UNITS=[$3]):
>> rowcount = 100.0, cumulative cost = {200.0 rows, 501.0 cpu, 0.0 io}, id = 8
>>>  LogicalTableScan(table=[[ORINOCO, ORDERS]]): rowcount = 100.0,
>> cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 2
>> 
>> 
>> 
>> -Rui
>> 


Re: Re: Question about Interpreter and Corelations

2019-11-08 Thread Haisheng Yuan
I am afraid this query can't be easily decorrelated.

- Haisheng

--
发件人:Zoltan Farkas
日 期:2019年11月08日 22:46:53
收件人:
主 题:Re: Question about Interpreter and Corelations

Thanks Julian,

Any idea how could I de-corelate the query?
RelDecorrelator.decorrelateQuery does not seem to alter the plan.

Will log a jira for the interpreter issue.

thank you

—Z



> On Nov 7, 2019, at 5:08 PM, Julian Hyde  wrote:
> 
> I don’t recall what the interpreter is currently capable of, but you should 
> log a bug.
> 
> I wonder whether if you could get the query to work in the interpreter if you 
> decorrelated the query first. 
> 
> Julian
> 
>> On Nov 7, 2019, at 11:16, Zoltan Farkas  wrote:
>> 
>> for a test query with the following plan:
>> 
>> LogicalProject(name=[$1], friends=[$4])
>> LogicalCorrelate(correlation=[$cor0], joinType=[inner], 
>> requiredColumns=[{0}])
>>   LogicalTableScan(table=[[characters]])
>>   Collect(field=[EXPR$0])
>> LogicalProject(name=[$2])
>>   LogicalJoin(condition=[=($0, $1)], joinType=[inner])
>> LogicalProject(characterId2=[$1])
>>   LogicalFilter(condition=[=($0, $cor0.characterId)])
>>LogicalTableScan(table=[[friendships]])
>> LogicalProject(characterId=[$0], name=[$1])
>>   LogicalTableScan(table=[[characters]])
>> 
>> I get :
>> 
>> java.lang.nsupportedOperationException
>>   at 
>> o.a.c.i.JaninoRexCompiler.lambda$compile$0(JaninoRexCompiler.java:94)[calcite-core-1.21.0.jar:1.21.0]
>>   at o.a.c.a.e.RexToLixTranslator.translate0(RexToLixTranslator.java:714)[^]
>>   at ^.translate(^:199)[^]
>>   at ^.translate0(^:684)[^]
>>   at ^.translate(^:199)[^]
>>   at ^.translate(^:194)[^]
>>   ...
>> 
>> when trying to use the Interpreter.
>> 
>> Is this a current limitation?  
>> 
>> the query is :
>> 
>> select name,
>> ARRAY(select c2.name from friendships f, characters c2
>>where f.characterId1 = c.characterId and 
>> f.characterId2 = c2.characterId) as friends
>> from characters c
>> 
>> let me know
>> 
>> thank you
>> 
>> —Z


<    1   2   3   4   5   >