[VOTE] Release Apache Calcite 1.38.0 (release candidate 1)

2024-10-09 Thread Julian Hyde
Hi all,

I have created a build for Apache Calcite 1.38.0, release
candidate 1.

Thanks to everyone who has contributed to this release.

You can read the release notes here:
https://github.com/apache/calcite/blob/calcite-1.38.0-rc1/site/_docs/history.md

The commit to be voted upon:
https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=e5e7faeff5985bc1b2342144b2bd31ca8ea84d3a

Its hash is e5e7faeff5985bc1b2342144b2bd31ca8ea84d3a

Tag:
https://github.com/apache/calcite/tree/calcite-1.38.0-rc1

The artifacts to be voted on are located here:
https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.38.0-rc1
(revision 72241)

The hashes of the artifacts are as follows:
3751b8426b772f1eb20e75c9768cafbecc663cccfd3df016f83d4b70acb34fdb77d60f3f5f990b699d466775e4749f799f8885b502b88e85a313c8b25825
*apache-calcite-1.38.0-src.tar.gz

A staged Maven repository is available for review at:
https://repository.apache.org/content/repositories/orgapachecalcite-1235/org/apache/calcite/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/jhyde.asc
https://www.apache.org/dist/calcite/KEYS

To create the jars and test Apache Calcite: "gradle build"
(Requires an appropriate Gradle/JDK installation.)

Please vote on releasing this package as Apache Calcite 1.38.0.
Everyone is welcome to vote. Your vote should contain the line
'code name (role)', where
  * 'code' is +1, 0 or -1 as below;
  * 'name' is your ASF id if you have one, else your real name;
  * 'role' is 'community', 'committer' or 'PMC member'.

[ ] +1 Release this package as Apache Calcite 1.38.0
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...

The vote is open for the next 72 hours (closing at
2024-10-13 03:00 UTC) and passes if a majority of
at least three +1 votes by PMC members are cast.

Here is my vote:

  +1 jhyde (PMC member)

Julian Hyde (release manager)


Re: [DISCUSS] Towards Calcite 1.38.0

2024-10-09 Thread Julian Hyde
For RC2 we need fixes for 6616, 6617 and 6620. I know 6616 is good.

Mihai, are the PRs 6617 and 6620 ready to merge?

If so, I'll merge them, adjust the release notes and launch RC2.

Julian

On Thu, Sep 26, 2024 at 11:00 AM Julian Hyde  wrote:
>
> Where are we on 1.38? Down to 2: 4512 and 5156. Please review and merge today.
>
> I have draft release notes. Please review them in the PR:
> https://github.com/apache/calcite/pull/3979.
>
> Julian
>
> Case status:
>
> 1. https://issues.apache.org/jira/browse/CALCITE-6161 equalsDeep
>
> No longer in 1.38
>
> 2. https://issues.apache.org/jira/browse/CALCITE-5156 implicit number type 
> cast
>
> No longer in 1.38
>
> 3. https://issues.apache.org/jira/browse/CALCITE-6560 (supportsNegativeScale)
>
> I don't think this one is going to make it. I am seeing a classloader
> deadlock when I run the tests, and that makes me very nervous. Also, I
> would like to fix the related bug
> https://issues.apache.org/jira/browse/CALCITE-6598 at the same time.
>
> 4. https://issues.apache.org/jira/browse/CALCITE-6300 Function
> MAP_VALUES/MAP_KEYS gives exception
>
> No longer in 1.38 because the proposed solution was hacky
>
> 5. https://issues.apache.org/jira/browse/CALCITE-4512 group by expression
>
> https://github.com/apache/calcite/pull/3929 needs review
>
> 6. https://issues.apache.org/jira/browse/CALCITE-5156 Support implicit
> number type cast for IN Sub-query
>
> On Wed, Sep 18, 2024 at 1:21 AM Ruben Q L  wrote:
> >
> > I'll try to look at a few PRs in the next days.
> >
> > I've also updated our Jira dashboard [1] (version: 1.38) to have a clear
> > overview about the status of the issues for the next release.
> >
> > [1]
> > https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12333950
> >
> >
> > On Tue, Sep 17, 2024 at 8:16 PM Mihai Budiu  wrote:
> >
> > > I am now at Current, the confluent conference, but I hope to find some
> > > time on Friday to do a few.
> > > 
> > > From: Julian Hyde 
> > > Sent: Tuesday, September 17, 2024 12:08:00 PM
> > > To: dev@calcite.apache.org 
> > > Subject: Re: [DISCUSS] Towards Calcite 1.38.0
> > >
> > > I merged fixes for 4 cases yesterday. Can I get some help reviewing
> > > and merging PRs (especially those in category 1 and 2)?
> > >
> > > There are about 6 regular committers who merge a few PRs before each
> > > release. (You know who you are -- and I am profoundly grateful! Your
> > > work keeps the project going.) If you could each do 2 or 3 PRs by the
> > > end of the weekend, we can get this release done.
> > >
> > > Julian
> > >
> > >
> > > On Mon, Sep 16, 2024 at 10:56 AM Julian Hyde 
> > > wrote:
> > > >
> > > > Yeah, sorry, I don’t often use the GitHub ‘approve’ button. My verbal
> > > approvals are often conditional. The ‘approve’ button doesn’t allow 
> > > nuance.
> > > >
> > > > > On Sep 15, 2024, at 3:26 PM, Mihai Budiu  wrote:
> > > > >
> > > > > You have not listed CALCITE-4918,
> > > https://issues.apache.org/jira/browse/CALCITE-4918 for which I have a PR,
> > > which I think you have approved using comments
> > > https://github.com/apache/calcite/pull/3947, but not using the official
> > > "approve" button.
> > > > >
> > > > > I was hoping other people would weigh on this design and
> > > implementation.
> > > > >
> > > > > Mihai
> > > > >
> > > > > 
> > > > > From: Julian Hyde 
> > > > > Sent: Sunday, September 15, 2024 3:12 PM
> > > > > To: dev@calcite.apache.org 
> > > > > Subject: Re: [DISCUSS] Towards Calcite 1.38.0
> > > > >
> > > > > I've logged https://issues.apache.org/jira/browse/CALCITE-6582 to
> > > > > track the release.
> > > > >
> > > > > There are 26 open bugs [1] marked to fix for 1.38. I have organized
> > > > > them into 5 categories. Can people please review and merge those in
> > > > > category 1? To get a release vote next week, so let's get busy
> > > > > reviewing and merging bugs this week.
> > > > >
> > > > > Category 1. The following have a PR available that seems close to
> > > > > ready. If these are good, let's get these merged!
> >

Re: [CANCEL] [VOTE] Release Apache Calcite 1.38.0 (release candidate 0)

2024-10-07 Thread Julian Hyde
Mihai,

Thanks for jumping on this.

If you need more time then we could consider backing out the change that caused 
the problem. It’s in no one’s interests to hurry a fix. If necessary, we can 
make a release this week and then make another in two or three weeks.

Julian


> On Oct 7, 2024, at 2:02 PM, Mihai Budiu  wrote:
> 
> I am working on 6617, but I only have a reproduction for one of the two 
> reported problems. Was hoping it's quick, but I am discovering some 
> complications, I have discussed some of them in the JIRA case.
> 
> Mihai
> ____
> From: Julian Hyde 
> Sent: Monday, October 7, 2024 1:55 PM
> To: dev@calcite.apache.org 
> Subject: [CANCEL] [VOTE] Release Apache Calcite 1.38.0 (release candidate 0)
> 
> Thank you to Stamatis, Ruben and Mihai for discovering and logging cases
> 6616 [1] and 6617 [2].
> 
> I agree that both are blockers, so I hereby cancel the vote.
> 
> I am the cause of 6616 (I switched from JDK 8 to JDK 21 for the release
> build) so I have assigned it to myself to fix before RC1.
> 
> Can someone volunteer to fix 6617, and give an ETA for a fix? As soon as we
> have a fix I create RC1 and start a vote.
> 
> The main branch remains CLOSED. Let’s not introduce instability by fixing
> things that are not broken.
> 
> Thank you to everyone else who inspected the RC and voted. If you were
> planning to check the release and vote, please continue to do so. It’s
> possible that you will discover another issue, and the sooner we do that,
> the sooner we shall have a good release.
> 
> Julian
> 
> 
> [1] https://issues.apache.org/jira/browse/CALCITE-6616
> [2] https://issues.apache.org/jira/browse/CALCITE-6617
> 
> 
> On Oct 7, 2024, at 3:45 AM, Ruben Q L  wrote:
> 
> Thanks Julian for preparing this release.
> 
> - Release notes: ok (minor detail: IIANM this release comes 5 months after
> 1.37, not 6)
> - Checksum: ok
> - Signature: ok
> - Diff source release and git repository: ok
> - Build + tests: ok
> - Calcite-based application test suite: KO
> It seems there is a regression (kudos to Claude Brisson for discovering
> it), the issue has been logged here:
> https://issues.apache.org/jira/browse/CALCITE-6617
> 
> -1 rubenql (PMC member)
> 
> Best,
> Ruben
> 
> 
> 
> On Mon, Oct 7, 2024 at 10:05 AM Stamatis Zampetakis 
> wrote:
> 
> -1 zabetak (PMC member)
> 
> I am changing my vote cause it seems that maven released artifacts are
> unusable by projects using JDK < 21. I logged CALCITE-6616 [1] for
> tracking this further.
> 
> [1] https://issues.apache.org/jira/browse/CALCITE-6616
> 
> On Mon, Oct 7, 2024 at 9:28 AM Stamatis Zampetakis 
> wrote:
> 
> 
> Ubuntu 20.04.6 LTS, jdk1.8.0_261, Gradle wrapper, Gradle 8.7
> 
> * Checked signatures and checksums OK
> * Checked diff between repo and artifacts (diff -qr calcite-src
> calcite-git) OK (LICENSE file between the release sources and git tag
> contains small differences)
> * Went over release note OK
> * Checked README has correct version and build instructions OK
> * Checked NOTICE OK
> * Checked LICENSE OK (Changes from CALCITE-6098 are not present)
> * All source files have ASF headers (grep -RiL "Licensed to the
> Apache Software Foundation") OK
> * No unexpected binary files OK (find . -type f -exec file {} \; |
> grep -v text)
> * Built from source artifacts and run tests (gradle build) OK
> * Built from git tag and run tests (./gradlew build) OK
> * Checked LICENSE, NOTICE, for calcite-core-1.38.0.jar in nexus OK
> 
> +1 zabetak (PMC member)
> 
> Best,
> Stamatis
> 
> 
> 
> 
> On Sat, Oct 5, 2024 at 10:49 PM Julian Hyde  wrote:
> 
> 
> Thanks for that research, Francis. After the release I'll fix
> release/build.gradle.kts (which generates the vote email) to generate
> the correct link. I'll also update the 'Your vote should contain the
> line' part of the email.
> 
> On Sat, Oct 5, 2024 at 2:00 AM Francis Chuang <
> 
> francischu...@apache.org> wrote:
> 
> 
> I think the tag was already pushed previously, because I am pretty
> 
> sure
> 
> I saw the tag on GitHub when I voted earlier in the day.
> 
> The problem is that there's a bug in GitBox that redirects the tag to
> 
> https://github.com/apache/calcite/commit/refs/tags/calcite-1.38.0-rc0
> 
> when the correct link is
> https://github.com/apache/calcite/tree/calcite-1.38.0-rc0
> 
> On 5/10/2024 4:52 pm, Julian Hyde wrote:
> 
> The Tag's web link is invalid.
> 
> 
> Sorry, I forgot to push the tag. It should work now.
> 
> (Please continue to vote. My mistake does not invalidate 

[CANCEL] [VOTE] Release Apache Calcite 1.38.0 (release candidate 0)

2024-10-07 Thread Julian Hyde
Thank you to Stamatis, Ruben and Mihai for discovering and logging cases
6616 [1] and 6617 [2].

I agree that both are blockers, so I hereby cancel the vote.

I am the cause of 6616 (I switched from JDK 8 to JDK 21 for the release
build) so I have assigned it to myself to fix before RC1.

Can someone volunteer to fix 6617, and give an ETA for a fix? As soon as we
have a fix I create RC1 and start a vote.

The main branch remains CLOSED. Let’s not introduce instability by fixing
things that are not broken.

Thank you to everyone else who inspected the RC and voted. If you were
planning to check the release and vote, please continue to do so. It’s
possible that you will discover another issue, and the sooner we do that,
the sooner we shall have a good release.

Julian


[1] https://issues.apache.org/jira/browse/CALCITE-6616
[2] https://issues.apache.org/jira/browse/CALCITE-6617


On Oct 7, 2024, at 3:45 AM, Ruben Q L  wrote:

Thanks Julian for preparing this release.

- Release notes: ok (minor detail: IIANM this release comes 5 months after
1.37, not 6)
- Checksum: ok
- Signature: ok
- Diff source release and git repository: ok
- Build + tests: ok
- Calcite-based application test suite: KO
It seems there is a regression (kudos to Claude Brisson for discovering
it), the issue has been logged here:
https://issues.apache.org/jira/browse/CALCITE-6617

-1 rubenql (PMC member)

Best,
Ruben



On Mon, Oct 7, 2024 at 10:05 AM Stamatis Zampetakis 
wrote:

-1 zabetak (PMC member)

I am changing my vote cause it seems that maven released artifacts are
unusable by projects using JDK < 21. I logged CALCITE-6616 [1] for
tracking this further.

[1] https://issues.apache.org/jira/browse/CALCITE-6616

On Mon, Oct 7, 2024 at 9:28 AM Stamatis Zampetakis 
wrote:


Ubuntu 20.04.6 LTS, jdk1.8.0_261, Gradle wrapper, Gradle 8.7

* Checked signatures and checksums OK
* Checked diff between repo and artifacts (diff -qr calcite-src
calcite-git) OK (LICENSE file between the release sources and git tag
contains small differences)
* Went over release note OK
* Checked README has correct version and build instructions OK
* Checked NOTICE OK
* Checked LICENSE OK (Changes from CALCITE-6098 are not present)
* All source files have ASF headers (grep -RiL "Licensed to the
Apache Software Foundation") OK
* No unexpected binary files OK (find . -type f -exec file {} \; |
grep -v text)
* Built from source artifacts and run tests (gradle build) OK
* Built from git tag and run tests (./gradlew build) OK
* Checked LICENSE, NOTICE, for calcite-core-1.38.0.jar in nexus OK

+1 zabetak (PMC member)

Best,
Stamatis




On Sat, Oct 5, 2024 at 10:49 PM Julian Hyde  wrote:


Thanks for that research, Francis. After the release I'll fix
release/build.gradle.kts (which generates the vote email) to generate
the correct link. I'll also update the 'Your vote should contain the
line' part of the email.

On Sat, Oct 5, 2024 at 2:00 AM Francis Chuang <

francischu...@apache.org> wrote:


I think the tag was already pushed previously, because I am pretty

sure

I saw the tag on GitHub when I voted earlier in the day.

The problem is that there's a bug in GitBox that redirects the tag to

https://github.com/apache/calcite/commit/refs/tags/calcite-1.38.0-rc0

when the correct link is
https://github.com/apache/calcite/tree/calcite-1.38.0-rc0

On 5/10/2024 4:52 pm, Julian Hyde wrote:

The Tag's web link is invalid.


Sorry, I forgot to push the tag. It should work now.

(Please continue to vote. My mistake does not invalidate the vote.)

On Fri, Oct 4, 2024 at 9:26 PM Xiong Duan 

wrote:


+1 Xiong Duan(Committer)

The Tag's web link is invalid.

Julian Hyde  于2024年10月5日周六 11:36写道:


Hi all,

I have created a build for Apache Calcite 1.38.0, release
candidate 0.

Thanks to everyone who has contributed to this release.

You can read the release notes here:

https://github.com/apache/calcite/blob/calcite-1.38.0-rc0/site/_docs/history.md


The commit to be voted upon:

https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=d3d309f5f05b1d28e59440d3f4d69d12e6d6efb9


Its hash is d3d309f5f05b1d28e59440d3f4d69d12e6d6efb9

Tag:

https://gitbox.apache.org/repos/asf?p=calcite.git;a=tag;h=refs/tags/calcite-1.38.0-rc0


The artifacts to be voted on are located here:

https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.38.0-rc0

(revision 72135)

The hashes of the artifacts are as follows:

bba61cfd946898250c001b514902bcb84b6be8357c2be491f14fb3dc3cd3702d329976fb82d536e673c63da00bcc4d2a4bdbd9db3c31c7d41fb95211c0ef84bb

*apache-calcite-1.38.0-src.tar.gz

A staged Maven repository is available for review at:

https://repository.apache.org/content/repositories/orgapachecalcite-1233/org/apache/calcite/


Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/jhyde.asc
https://www.apache.org/dist/calcite/KEYS

To create the jars and test Apache Calcite: "gradle build"

Re: Checked arithmetic

2024-10-07 Thread Julian Hyde
The operator table doesn’t seem to me like the right place. There’s only one 
PLUS operator even though it applies to several numeric types. We would want 
different behaviors for different types; if we did that via the operator table, 
we would need several PLUS operators and a complicated resolution process to 
choose among those operators.

> On Oct 7, 2024, at 12:01 PM, James Starr  wrote:
> 
> Having it at the operator table level would make it easier to match each
> SQL dialect's behavior, as that is currently how operators are selected for
> different dialects.  The API between operator tables and the
> SqlValidator/SqlToRel supports selecting different operators based on
> types.  Though, the API extending operator tables might not.
> 
> On Fri, Oct 4, 2024 at 3:51 PM Stamatis Zampetakis 
> wrote:
> 
>> For the record here is what the SQL Standard says regarding operations
>> on numbers:
>> 
>> "Operations on numbers are performed according to the normal rules of
>> arithmetic, within implementation-defined limits, except as provided
>> for in Subclause 6.27, “”."
>> 
>> Section 6.27 defines some errors with respect to division but not much
>> more than that.
>> 
>> Best,
>> Stamatis
>> 
>> On Fri, Oct 4, 2024 at 10:01 PM Mihai Budiu  wrote:
>>> 
>>>  *
>>> I think "safe" means "no exception", so in case of overflow safe
>> operators return NULL. But here we are talking about real exceptions. (At
>> least the existing "safe" operations in Calcite work this way).
>>>  *
>>> Having an operator table that uses a different implementation for
>> arithmetic sounds to me like a recipe for confusion. Moreover, this would
>> not allow mixing checked and non-checked arithmetic in the same program
>> (not sure whether there are use cases for this, though)
>>>  *
>>> The visitor proposal would use the same operator table, but switch
>> operations to checked/unchecked depending on the type system. Moreover,
>> using this visitor is optional.
>>>  *
>>> For aggregation we already have two kinds of issues:
>>> *
>>> the result type of aggregation is unspecified by the standard; Calcite
>> uses the same type as the input data type
>> https://issues.apache.org/jira/browse/CALCITE-6324, and
>>> *
>>> The type of intermediate results in aggregates is unspecified
>> https://issues.apache.org/jira/browse/CALCITE-6516,
>> https://issues.apache.org/jira/browse/CALCITE-6427
>>>  *
>>> Moreover, I think the standard says that the order of aggregation is
>> unspecified; this already makes a difference in the absence of overflow for
>> floating point, where addition is not associative
>>> 
>>> I think the aggregates can be solved in general by allowing them to
>> compute on a wider data type and then casting the final result.  This would
>> work even if using checked arithmetic. Maybe the problem is that Calcite
>> already made a choice for both these types (intermediate results, and final
>> results), and it's not the same as other databases are making. But that's a
>> separate problem.
>>> 
>>> From: James Starr 
>>> Sent: Friday, October 4, 2024 12:09 PM
>>> To: dev@calcite.apache.org 
>>> Subject: Re: Checked arithmetic
>>> 
>>> I am reasonably sure checked arithmetic is left to implementation to
>> define
>>> due to the mixed concerns of different DBs.  If you define checked
>>> arithmetic, it limits certain optimizations that may be made, such as
>>> pushing an agg below a join.  For instance, if a rewrite because a
>> rewrite
>>> could change the order of operations such that an overflow does or does
>> not
>>> happen, what is the correct answer in this case?  The error, the garbage
>>> result from an overflow, or the result that did not hit the overflow?
>>> 
>>> I believe some systems have CURRENCY and BIG DECIMAL types, and this
>> might
>>> be the reason.  Maybe the CURRENCY types disallow rewrites to at least
>> make
>>> the query repeatable and consistent.
>>> 
>>> An alternative to a visitor, would be to use a different operator table
>>> with safe operators.
>>> 
>>> Considering Calcite as planner, I think the operator/type needs to expose
>>> if over-follow is defined or not, as opposed to checked or unchecked
>> which
>>> is an implementation issue.
>>> 
>>> James
>> 



[DISCUSS] Tidying up Calcite's branches

2024-10-05 Thread Julian Hyde
The Calcite git repository seems to be cluttered with unused branches.
I am especially tired of 'maint-0.4.12', which is eleven years old but
comes up every time I type 'git log origin/mai'.

I analyzed our branches. There are 37 branches in origin (see
https://github.com/apache/calcite/branches/all).

4 branches have a specific purpose:
 * HEAD (points to main)
 * main
 * master (deprecated, redirects to main)
 * site

29 release branches, prefixed either maint- or branch-:
 * branch-1.0
 * branch-1.1
 * ...
 * branch-1.24
 * branch-avatica-1.8
 * branch-avatica-1.9
 * maint-0.4.12
 * maint-0.4.15

3 seem to be orphans, pushed by mistake:
 * review
 * master
 * issue6607

1 is related to dependabot PR
 * dependabot/bundler/site/nokogiri-1.15.6

Of the 29 release branches, 26 might be called 'trivial'. They are
direct ancestors of current HEAD and are exactly one commit after a
tag, because that is how the Maven release process worked. For example
branch-1.12 is one commit after the tag calcite-1.12.0:

$ git log --abbrev-commit --pretty=format:'%h -%d %s (%cr) <%an>' -n3
origin/branch-1.12
0f92251082 - (origin/branch-1.12) [maven-release-plugin] prepare for
next development iteration (8 years ago) 
ea7ace18cd - (tag: calcite-1.12.0) [maven-release-plugin] prepare
release calcite-1.12.0 (8 years ago) 
baa13d696f - [CALCITE-1657] Release Calcite 1.12.0 (8 years ago) 

The 3 other release branches might be called 'maintenance branches',
because they are several commits away from the main branch. They are:
 * maint-0.4.12 (5,459 commits behind and 20 ahead of main)
 * maint-0.4.15 (5,431 commits behind and 4 ahead of main)
 * branch-1.22 (1,869 commits behind and 1 ahead of main)

I propose to delete the 26 trivial release branches and rename the
maintenance branches:
 * maint-0.4.12 becomes release/0.4.12
 * maint-0.4.15 to release/0.4.15
 * branch-1.22 to release/1.22.

I will not delete any tags. And I will not do anything until release
1.38 is complete.

Any thoughts on this?

Julian


Re: [Discussion] Measures that span multiple tables

2024-10-05 Thread Julian Hyde
There are a lot of people who think that the semantic layer should
just have a simple query language - pick a few dimensions and
measures, choose a sort order, and you're done - but I think the
semantic layer should have a real query language, equivalent in
strength to SQL, and powerful enough to build models.

I don't want to introduce a proprietary language (like DAX or LOD) so
we're left with SQL or SPARQL. Some syntactic sugar on top of SQL
could find paths through the graph and convert them to joins. That's
probably good enough.

On Sat, Oct 5, 2024 at 5:23 AM Barak Alon  wrote:
>
> For the record - I agree that the semantic layer is probably a better place
> for this. But I would've said something similar about measures before I
> read your paper haha. So I'm pushing on this mostly out of curiosity.
>
> Agreed - this is essentially a virtual cube. And MDX (or SPARQL) might be a
> better language for this type of query - but alas, SQL is king.
>
> > There are so many columns it’s impossible to give them meaningful names.
>
> Is that right? Maybe I'm thinking of it slightly differently.
>
> Yes, something has to flatten the graph. I think that's the job of the
> semantic layer. And it's got to choose meaningful names - hard, but not
> impossible. At least if thinking from the perspective of a particular
> business domain.
>
> The semantic layer first has to flatten each cube and conform the
> dimensions:
>
> ```
> crew view orders_cube as
> select
>   SUM(o.revenue) as measure sum_revenue,
>   l.country as origin_country
> from orders o
> left join customers c
>   on o.customerId = c.id
> left join locations l
>   on c.locationId = l.id
>
>
> crew view clicks_cube as
> select
>   COUNT(DISTINCT c.cookieId) as measure unique_clickers,
>   l.country as origin_country
> from clicks c
> left join sessions s
>   on c.sessionId = s.id
> left join locations l
>   on s.locationId = l.id
>
>
> create view __all as
> select
>   coalesce(o.origin_country, c.origin_country) as origin_country,
>   o.sum_revenue,
>   c.unique_clickers,
> from orders_cube o
> full outer join clicks_cube c
>   on o.origin_country = c.origin_country
> ```
>
> Yes, this is done at DDL time. And yes, this certainly can't answer all
> types of queries. But I think it works for simple slicing and dicing on a
> virtual cube?
>
> On Fri, Oct 4, 2024 at 6:52 PM Julian Hyde  wrote:
>
> > Let’s review where we are.
> >  * We have several fact tables (each representing a business process -
> > taking orders, shipping orders, canceling orders, returning products,
> > storing products in a warehouse) and we have surrounded each fact table
> > with dimension tables and defined some measures in a ’star join’ view or
> > ‘cube’.
> >  * We can join several cubes (or regular tables) in a single query,
> > joining on their common dimensions. If the query has no GROUP BY the number
> > of rows will be astronomical, but if there is a GROUP BY the number of rows
> > will be reasonable, and because measures manage their own grain their
> > values will be correct.
> >
> > This is, in essence, what you can achieve with virtual cubes in MDX [
> > https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms709837(v=vs.85)
> > ], and furthermore you don’t have to do it at DDL time, you can do it at
> > query time.
> >
> > This looks like a graph: cubes connected, via its dimensions, to entities
> > (date, customer, product, warehouse, campaign). If two cubes have at least
> > one entity in common you can join them. A query would seem to be just a
> > list of attributes (of entities) and a list of measures, provided that
> > there is a path joining them.
> >
> > But we have a problem. The problem is that as the graph gets larger, it
> > begins to have (undirected) cycles: there is more than one path between
> > some pairs of entities. For example, you are interested in customers and
> > related products. It is ambiguous whether you mean customers that have
> > purchased a product, or customers who have been targeted in a marketing
> > campaign for a product.
> >
> > That’s the problem with Barak’s ‘all’ table which represents a
> > flattened-graph. It’s not the large number of columns (the union of columns
> > from all the cubes and entities). Nor is it the colossal number of rows
> > (effectively the cartesian product of all entities). It’s that in order to
> > flatten the graph you need to repeat each entity several times, once for
> > each path. For a given product there are not just customers; there are
> &g

Re: [VOTE] Release Apache Calcite 1.38.0 (release candidate 0)

2024-10-05 Thread Julian Hyde
Thanks for that research, Francis. After the release I'll fix
release/build.gradle.kts (which generates the vote email) to generate
the correct link. I'll also update the 'Your vote should contain the
line' part of the email.

On Sat, Oct 5, 2024 at 2:00 AM Francis Chuang  wrote:
>
> I think the tag was already pushed previously, because I am pretty sure
> I saw the tag on GitHub when I voted earlier in the day.
>
> The problem is that there's a bug in GitBox that redirects the tag to
> https://github.com/apache/calcite/commit/refs/tags/calcite-1.38.0-rc0
> when the correct link is
> https://github.com/apache/calcite/tree/calcite-1.38.0-rc0
>
> On 5/10/2024 4:52 pm, Julian Hyde wrote:
> >> The Tag's web link is invalid.
> >
> > Sorry, I forgot to push the tag. It should work now.
> >
> > (Please continue to vote. My mistake does not invalidate the vote.)
> >
> > On Fri, Oct 4, 2024 at 9:26 PM Xiong Duan  wrote:
> >>
> >> +1 Xiong Duan(Committer)
> >>
> >> The Tag's web link is invalid.
> >>
> >> Julian Hyde  于2024年10月5日周六 11:36写道:
> >>>
> >>> Hi all,
> >>>
> >>> I have created a build for Apache Calcite 1.38.0, release
> >>> candidate 0.
> >>>
> >>> Thanks to everyone who has contributed to this release.
> >>>
> >>> You can read the release notes here:
> >>> https://github.com/apache/calcite/blob/calcite-1.38.0-rc0/site/_docs/history.md
> >>>
> >>> The commit to be voted upon:
> >>> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=d3d309f5f05b1d28e59440d3f4d69d12e6d6efb9
> >>>
> >>> Its hash is d3d309f5f05b1d28e59440d3f4d69d12e6d6efb9
> >>>
> >>> Tag:
> >>> https://gitbox.apache.org/repos/asf?p=calcite.git;a=tag;h=refs/tags/calcite-1.38.0-rc0
> >>>
> >>> The artifacts to be voted on are located here:
> >>> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.38.0-rc0
> >>> (revision 72135)
> >>>
> >>> The hashes of the artifacts are as follows:
> >>> bba61cfd946898250c001b514902bcb84b6be8357c2be491f14fb3dc3cd3702d329976fb82d536e673c63da00bcc4d2a4bdbd9db3c31c7d41fb95211c0ef84bb
> >>> *apache-calcite-1.38.0-src.tar.gz
> >>>
> >>> A staged Maven repository is available for review at:
> >>> https://repository.apache.org/content/repositories/orgapachecalcite-1233/org/apache/calcite/
> >>>
> >>> Release artifacts are signed with the following key:
> >>> https://people.apache.org/keys/committer/jhyde.asc
> >>> https://www.apache.org/dist/calcite/KEYS
> >>>
> >>> To create the jars and test Apache Calcite: "gradle build"
> >>> (requires an appropriate Gradle/JDK installation)
> >>>
> >>> Please vote on releasing this package as Apache Calcite 1.38.0.
> >>> Everyone is welcome to vote. Your vote should contain the line
> >>> 'code name (role)', where
> >>>   * 'code' is +1, 0 or -1 as below;
> >>>   * 'name' is your ASF id if you have one, else your real name;
> >>>   * 'role' is 'community', 'committer' or 'PMC member'.
> >>>
> >>> [ ] +1 Release this package as Apache Calcite 1.38.0
> >>> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> >>> [ ] -1 Do not release this package because...
> >>>
> >>> The vote is open for the next 96 hours (closing at 0400 UTC on Wed Oct
> >>> 9th) and passes if a majority of at least three +1 votes by PMC
> >>> members are cast.
> >>>
> >>> Here is my vote:
> >>>
> >>>   +1 jhyde (PMC member)
> >>>
> >>> Julian Hyde (release manager)
>


Re: [VOTE] Release Apache Calcite 1.38.0 (release candidate 0)

2024-10-04 Thread Julian Hyde
> The Tag's web link is invalid.

Sorry, I forgot to push the tag. It should work now.

(Please continue to vote. My mistake does not invalidate the vote.)

On Fri, Oct 4, 2024 at 9:26 PM Xiong Duan  wrote:
>
> +1 Xiong Duan(Committer)
>
> The Tag's web link is invalid.
>
> Julian Hyde  于2024年10月5日周六 11:36写道:
> >
> > Hi all,
> >
> > I have created a build for Apache Calcite 1.38.0, release
> > candidate 0.
> >
> > Thanks to everyone who has contributed to this release.
> >
> > You can read the release notes here:
> > https://github.com/apache/calcite/blob/calcite-1.38.0-rc0/site/_docs/history.md
> >
> > The commit to be voted upon:
> > https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=d3d309f5f05b1d28e59440d3f4d69d12e6d6efb9
> >
> > Its hash is d3d309f5f05b1d28e59440d3f4d69d12e6d6efb9
> >
> > Tag:
> > https://gitbox.apache.org/repos/asf?p=calcite.git;a=tag;h=refs/tags/calcite-1.38.0-rc0
> >
> > The artifacts to be voted on are located here:
> > https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.38.0-rc0
> > (revision 72135)
> >
> > The hashes of the artifacts are as follows:
> > bba61cfd946898250c001b514902bcb84b6be8357c2be491f14fb3dc3cd3702d329976fb82d536e673c63da00bcc4d2a4bdbd9db3c31c7d41fb95211c0ef84bb
> > *apache-calcite-1.38.0-src.tar.gz
> >
> > A staged Maven repository is available for review at:
> > https://repository.apache.org/content/repositories/orgapachecalcite-1233/org/apache/calcite/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/jhyde.asc
> > https://www.apache.org/dist/calcite/KEYS
> >
> > To create the jars and test Apache Calcite: "gradle build"
> > (requires an appropriate Gradle/JDK installation)
> >
> > Please vote on releasing this package as Apache Calcite 1.38.0.
> > Everyone is welcome to vote. Your vote should contain the line
> > 'code name (role)', where
> >  * 'code' is +1, 0 or -1 as below;
> >  * 'name' is your ASF id if you have one, else your real name;
> >  * 'role' is 'community', 'committer' or 'PMC member'.
> >
> > [ ] +1 Release this package as Apache Calcite 1.38.0
> > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > [ ] -1 Do not release this package because...
> >
> > The vote is open for the next 96 hours (closing at 0400 UTC on Wed Oct
> > 9th) and passes if a majority of at least three +1 votes by PMC
> > members are cast.
> >
> > Here is my vote:
> >
> >  +1 jhyde (PMC member)
> >
> > Julian Hyde (release manager)


[VOTE] Release Apache Calcite 1.38.0 (release candidate 0)

2024-10-04 Thread Julian Hyde
Hi all,

I have created a build for Apache Calcite 1.38.0, release
candidate 0.

Thanks to everyone who has contributed to this release.

You can read the release notes here:
https://github.com/apache/calcite/blob/calcite-1.38.0-rc0/site/_docs/history.md

The commit to be voted upon:
https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=d3d309f5f05b1d28e59440d3f4d69d12e6d6efb9

Its hash is d3d309f5f05b1d28e59440d3f4d69d12e6d6efb9

Tag:
https://gitbox.apache.org/repos/asf?p=calcite.git;a=tag;h=refs/tags/calcite-1.38.0-rc0

The artifacts to be voted on are located here:
https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.38.0-rc0
(revision 72135)

The hashes of the artifacts are as follows:
bba61cfd946898250c001b514902bcb84b6be8357c2be491f14fb3dc3cd3702d329976fb82d536e673c63da00bcc4d2a4bdbd9db3c31c7d41fb95211c0ef84bb
*apache-calcite-1.38.0-src.tar.gz

A staged Maven repository is available for review at:
https://repository.apache.org/content/repositories/orgapachecalcite-1233/org/apache/calcite/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/jhyde.asc
https://www.apache.org/dist/calcite/KEYS

To create the jars and test Apache Calcite: "gradle build"
(requires an appropriate Gradle/JDK installation)

Please vote on releasing this package as Apache Calcite 1.38.0.
Everyone is welcome to vote. Your vote should contain the line
'code name (role)', where
 * 'code' is +1, 0 or -1 as below;
 * 'name' is your ASF id if you have one, else your real name;
 * 'role' is 'community', 'committer' or 'PMC member'.

[ ] +1 Release this package as Apache Calcite 1.38.0
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...

The vote is open for the next 96 hours (closing at 0400 UTC on Wed Oct
9th) and passes if a majority of at least three +1 votes by PMC
members are cast.

Here is my vote:

 +1 jhyde (PMC member)

Julian Hyde (release manager)


Re: [Discussion] Measures that span multiple tables

2024-10-04 Thread Julian Hyde
iew
> on this many-to-many join (too many columns, too much complexity)."
> 
> I want to dig into what such a view would look like. I'm imagining:
> 
> ```
> create view __all as
> select
>  coalesce(o.productId, v.productId) as productId,
>  coalesce(o.color, v.color) as color,
>  -- ... many more
> 
>  o.sum_revenue,
>  v.total_on_hand,
>  o.sum_revenue / v.total_on_hand as measure revenue_per_total,  -- a
> measure that spans multiple tables
>  -- ... many more
> 
> from o
> full outer join v
>  on o.productId = v.productId
>  and o.color = v.color
>  -- ... many more
> 
> -- .. many more
> ```
> 
> A lot of challenges here:
> Efficiently dealing with so many columns?
> Is it possible for an optimizer rule to prune the unused join conditions?
> Is there sleeker syntax that would make defining such a view more tenable?
> 
> Does such complexity belong as a fundamental concept like Measures or in
> some semantic layer with custom rewrite rules? I don't know. But I do think
> there is value in such an abstraction, and it would be cool if it was
> standardized.
> 
> On Tue, Oct 1, 2024 at 6:36 PM Julian Hyde  <mailto:jhyde.apa...@gmail.com>> wrote:
> 
>> Good question - thanks for asking.
>> 
>> The short answer is that measures that span tables work, but they aren’t
>> quite as convenient as measures in a single table.
>> 
>> When designing Measures in SQL I knew that measures sometimes span tables
>> - star schemas are a case in point - but decided to make measures belong to
>> one table for simplicity. This allowed me to put them into the same
>> namespace as columns, so I can write ’select e.avg_sal from emp as e’.
>> (Measures are not columns, but SQL semantic analyzers already know how to
>> look up a column within the scope of a table.)
>> 
>> My goal was to allow dimensional expressions, such as revenue-this-year
>> minus revenue-last-year, so we need to know what are the dimensions that
>> affect the value of a measure. By associating a measure with a table we can
>> say that the (non-measure) columns of that table are its dimensions.
>> 
>> One case of 'measures spanning tables’ is joining a table with a measure
>> to a regular table. For example, suppose the orders table has an
>> avg_shipping_cost measure, and we write this query:
>> 
>>  select o.zipcode, o.avg_shipping_cost, count(distinct i.color)
>>  from orders as o
>>join orderItems as i on o.id <http://o.id/> <http://o.id/> = i.orderId
>>  group by o.zipcode
>> 
>> If I’d written avg(o.shipping_cost), orders with many items would be
>> weighted more heavily in the total than orders with few items. But measures
>> are ‘locked’ to the grain of their table.
>> 
>> This is a crucial property of measures. It allows me to create a wide join
>> view over a star schema - say of orders, order-items, customers and
>> products - and the measures in that view will be well-behaved. Users get
>> the benefits of the join without having to remember the right join
>> conditions.
>> 
>> By the way, if I really want a weighted average shipping, I can unpack
>> convert the shipping-cost measure into a value, join it to the order-items,
>> and then re-package it as a measure again:
>> 
>>  select o.zipcode, avg(o.avg_shipping_cost) as measure
>> weighted_avg_shipping_cost,
>>  count(distinct i.color)
>>  from orders as o
>>join orderItems as i on o.id <http://o.id/> <http://o.id/> = i.orderId
>>  group by o.zipcode
>> 
>> Now, another case of ‘measures spanning tables’ is common dimensions (also
>> known as conformed dimensions). Let’s suppose I have an orders table and an
>> inventory table, both with measures, and both with a date column (orderDate
>> and inventoryDate) and a reference to a product. I can write a query that
>> joins these tables:
>> 
>>  select o.sum_revenue, v.total_on_hand
>>  from orders as o
>>join inventory as v
>>on o.orderDate = v.inventoryDate
>>and  o.productId = v.productId
>>  where o.color = ‘Red'
>> 
>> This is a many-to-many join — like joining two cubes in an OLAP system —
>> but the measures’ grain-locking ensures prevents double-counting.
>> 
>> Notice that I have had to manually put the common dimensions into a join
>> condition. There is nothing in the definition of the orders.sum_revenue and
>> inventory.total_on_hand measures that connects their date dimensions. I
>> have to remember to make that connection.
>> 
>> Where 

Re: Checked arithmetic

2024-10-04 Thread Julian Hyde
> This will probably require implementing some new Expression methods in 
> linq4j, since I don't think many of the checked arithmetic functions exist

I would be surprised if new Expression methods are required; I think new 
methods in SqlFunctions (involved via Expressions.call) will be sufficient. 
While unchecked PLUS will translate to Java ‘+’, checked PLUS should translate 
to a new method SqlFunctions.plusChecked(int, int).

> On Oct 4, 2024, at 10:38 AM, Mihai Budiu  wrote:
> 
> The visitor approach seems to be the most unintrusive.
> 
> My plan is as follows:
> 
> 
>  *
> Prototype a RelNode visitor that replaces unchecked arithmetic with checked 
> arithmetic. The visitor will be parameterized by configuration indicating 
> whether DECIMAL or INTEGER arithmetic operations should be checked.
>  *
> This will probably require implementing some new Expression methods in 
> linq4j, since I don't think many of the checked arithmetic functions exist
>  *
> Test the visitor in our compiler, which requires checked arithmetic. We have 
> tests imported from Postgres which fail because of this.
>  *
> If this approach works, contribute the visitor to Calcite
>  *
> Change the Calcite type system to include configuration options for checked 
> arithmetic and insert the visitor in the compilation flow after validation
> 
> Regarding the way checked arithmetic is implemented for DECIMAL and INTEGER 
> types, I believe that these require different solutions. For INTEGER all 
> operations are supposed to be checked, whereas for DECIMAL only CAST 
> operations to DECIMAL results are supposed to be checked.
> 
> Egveny, indeed, a checked result is supposed to work the way you describe. 
> And I am pretty sure Calcite today does not do that.
> 
> The semantics of arithmetic is important if one uses the 
> PROJECT_REDUCE_EXPRESSIONS rule, but there may be cases where other program 
> simplifications rely on the semantics of arithmetic.
> 
> This approach would not work if there are already arithmetic expression 
> evaluations performed during the SqlToRel conversion.
> 
> I will copy some of this discussion in the JIRA issue.
> 
> Mihai
> 
> 
> From: stanilovsky evgeny 
> Sent: Thursday, October 3, 2024 10:34 PM
> To: dev@calcite.apache.org 
> Subject: Re: Checked arithmetic
> 
> Mihai, thanks for this discussion !
> my opinion : select Integer.MAX_VALUE + 1 i.e select 2147483647 + 1 need
> to raise overflow exception
> and select 2147483647::bigint + 1 need to return correct result.
> 
>> I don't know what the SQL standard says. I suspect that most DBMS use
>> checked arithmetic. Because, as you say, people use a DBMS for
>> transactions such as managing bank accounts.
>> 
>> But some people want to use SQL as a high-performance distributed
>> data-centric computation language, and these people will probably want
>> unchecked arithmetic.
>> 
>> So, I agree with you. Whether to use checked arithmetic should be a
>> property of the type system - probably a property of each type. So
>> someone could have, say, checked DECIMAL and unchecked INT32. If we
>> introduce checked types, they would remain unchecked in the default
>> type system, thus ensuring backwards compatibility.
>> 
>> Implementing checked arithmetic would be hard. One way to do it would
>> be to have a checked version of each operator - similar to the
>> SAFE_xxx operators [
>> https://issues.apache.org/jira/browse/CALCITE-5591 ] - and have a
>> visitor that switches built-in operators to the checked versions.
>> 
>> Generating correct SQL for other dialects will be even harder. To
>> safely omit bounds-checking, we would need to know that the bounds on
>> the Calcite type are identical to the bounds of the underlying type.
>> 
>> Julian
>> 
>> 
>> On Thu, Oct 3, 2024 at 2:23 PM Mihai Budiu  wrote:
>>> 
>>> Hello all,
>>> 
>>> What is the semantics of arithmetic overflow in SQL?
>>> I assumed that SQL is supposed to use checked arithmetic, but it seems
>>> like this behavior is configurable for some database systems. Having
>>> checked arithmetic seems to be the in the spirit of SQL to provide
>>> exact results. You don't want to use wrap-around arithmetic when you
>>> manage your bank account.
>>> 
>>> For example, if you look at an operator like Multiply in
>>> RexImplTable.java:
>>> 
>>>  defineBinary(MULTIPLY, Multiply, NullPolicy.STRICT, "multiply");
>>> 
>>> dispatches to:
>>> 
>>>  /**
>>>   * A multiplication operation, such as (a * b), without
>>>   * overflow checking, for numeric operands.
>>>   */
>>>  Multiply(" * ", false, 3, false),
>>> 
>>> This suggests that Calcite adopts unchecked arithmetic. And indeed,
>>> today Calcite ignores arithmetic overflows when evaluating expressions.
>>> Moreover, since it uses Java as a runtime, and Java has no arithmetic
>>> on Short or Byte, all computations are done using integer or long.
>>> Which means that lots of potential overflows are completely ignored. I
>>> hav

Re: Serialisation of optimised query plans

2024-10-03 Thread Julian Hyde
No, that would not be acceptable. EnumerableInterpretable#getBindable
is a package-protected method with no javadoc or tests. If you "relax"
the visibility modifier it becomes a public API and Calcite committers
are responsible for maintaining its behavior in perpetuity.

You need to come up with a better proposal that is a net benefit to Calcite.


On Thu, Oct 3, 2024 at 5:54 AM Austin Richardson
 wrote:
>
> Hi Stamatis,
>
> Thank you for the reply.
>
> For some additional context: we're experimenting with this
> (de)serialisation because we're interested in storing optimised query plans
> in a distributed cache. Another idea we're exploring is caching the
> generated source code for those query plans, and then compiling it
> as-needed later. If possible, we'd like to re-use the Calcite code
> responsible for this, though we'd need to relax the visibility modifier of
> EnumerableInterpretable#getBindable
> 
> from
> package-private to public so it'd be invokable from our code.
>
> Do you have any thoughts on whether that'd be acceptable?
>
> Thank you,
> Austin
>
> On Tue, 1 Oct 2024 at 09:10, Stamatis Zampetakis  wrote:
>
> > Hey Austin,
> >
> > Indeed the most popular serialization/deserialization method for query
> > plans is by using the RelJsonReader and RelJsonWriter. However, we are
> > lacking a bit of test coverage around this feature so when new
> > RelNodes are added it's easy to miss the required changes. Moreover,
> > it's quite common for other projects to plug-in their own custom
> > RelNode's so serialization happens outside the classes present in
> > calcite core. We would happily accept contributions in this area so
> > feel free to create a JIRA ticket and submit a PR.
> >
> > Best,
> > Stamatis
> >
> > On Mon, Sep 30, 2024 at 11:55 AM Austin Richardson
> >  wrote:
> > >
> > > Hi Calcite devs,
> > >
> > > Our team is currently experimenting with ideas for serialising and
> > > deserialising optimised query plans using RelJsonWriter and
> > RelJsonReader.
> > > We noticed this is possible for pre-optimisation logical plans, but ran
> > > into issues when trying to do so for post-optimisation execution plans.
> > For
> > > example, LogicalJoin implements the required constructor for
> > serialisation
> > > here
> > > <
> > https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/logical/LogicalJoin.html#%3Cinit%3E(org.apache.calcite.rel.RelInput)
> > >,
> > > but the specific JOIN implementations (e.g.
> > EnumerableBatchNestedLoopJoin)
> > > do not.
> > >
> > > Considering this, is there a preferred way for serialising and
> > > deserialising optimised query plans? We wanted to ensure we aren’t
> > missing
> > > something here.
> > >
> > > Thank you,
> > >
> > > Austin
> >


Re: Checked arithmetic

2024-10-03 Thread Julian Hyde
I don't know what the SQL standard says. I suspect that most DBMS use
checked arithmetic. Because, as you say, people use a DBMS for
transactions such as managing bank accounts.

But some people want to use SQL as a high-performance distributed
data-centric computation language, and these people will probably want
unchecked arithmetic.

So, I agree with you. Whether to use checked arithmetic should be a
property of the type system - probably a property of each type. So
someone could have, say, checked DECIMAL and unchecked INT32. If we
introduce checked types, they would remain unchecked in the default
type system, thus ensuring backwards compatibility.

Implementing checked arithmetic would be hard. One way to do it would
be to have a checked version of each operator - similar to the
SAFE_xxx operators [
https://issues.apache.org/jira/browse/CALCITE-5591 ] - and have a
visitor that switches built-in operators to the checked versions.

Generating correct SQL for other dialects will be even harder. To
safely omit bounds-checking, we would need to know that the bounds on
the Calcite type are identical to the bounds of the underlying type.

Julian


On Thu, Oct 3, 2024 at 2:23 PM Mihai Budiu  wrote:
>
> Hello all,
>
> What is the semantics of arithmetic overflow in SQL?
> I assumed that SQL is supposed to use checked arithmetic, but it seems like 
> this behavior is configurable for some database systems. Having checked 
> arithmetic seems to be the in the spirit of SQL to provide exact results. You 
> don't want to use wrap-around arithmetic when you manage your bank account.
>
> For example, if you look at an operator like Multiply in RexImplTable.java:
>
>   defineBinary(MULTIPLY, Multiply, NullPolicy.STRICT, "multiply");
>
> dispatches to:
>
>   /**
>* A multiplication operation, such as (a * b), without
>* overflow checking, for numeric operands.
>*/
>   Multiply(" * ", false, 3, false),
>
> This suggests that Calcite adopts unchecked arithmetic. And indeed, today 
> Calcite ignores arithmetic overflows when evaluating expressions. Moreover, 
> since it uses Java as a runtime, and Java has no arithmetic on Short or Byte, 
> all computations are done using integer or long. Which means that lots of 
> potential overflows are completely ignored. I have fixed this problem 
> recently for Decimals, but the problem persists for all integer types.
>
> Ideally whether arithmetic is checked or not should be a property of the type 
> system.
> However, this will make the implementation quite complex, since there are 
> lots of places where Calcite generates arithmetic expressions.
>
> I think this is a long-standing bug in Calcite, which I'd like to fix. But 
> what is the right solution?
> I have filed a related issue: 
> https://issues.apache.org/jira/browse/CALCITE-6379
>
> Mihai
>
>


Re: vulnerabilities: dependency scan

2024-10-02 Thread Julian Hyde
Can someone remind me — is there a Gradle task to update dependencies? 
(Analogous to 'mvn versions:update-properties’.)

If so, we should do this every release.

Julian



> On Oct 2, 2024, at 9:06 AM, Xiong Duan  wrote:
> 
> Hi, Hugh Pearse. Thanks for checking the dependency's vulnerabilities
> in Calcite. It is precious. We can create an ISSUE in JIRA.
> 
> Hugh Pearse  于2024年10月2日周三 15:56写道:
>> 
>> Our security team found these issues:
>> 
>> 
>>   - Scan of *https://github.com/apache/calcite.git
>>   * on *Sep 27, 2024*
>>   Version Scanned: *latest*
>> 
>> Vulnerabilities
>> SeverityPkgNameInstalled VersionFixed VersionVulnerability IDReference
>> HIGH webrick 1.7.0 >= 1.8.2 CVE-2024-47220
>> https://avd.aquasec.com/nvd/cve-2024-47220
>> MEDIUM nokogiri 1.14.3 1.15.6, 1.16.2 GHSA-vcc3-rw6f-jv97
>> https://github.com/advisories/GHSA-vcc3-rw6f-jv97
>> MEDIUM nokogiri 1.14.3 ~> 1.15.6, >= 1.16.2 GHSA-xc9x-jj77-9p9j
>> https://github.com/advisories/GHSA-xc9x-jj77-9p9j
>> MEDIUM rexml 3.2.5 >= 3.2.7 CVE-2024-35176
>> https://avd.aquasec.com/nvd/cve-2024-35176
>> MEDIUM rexml 3.2.5 >= 3.3.2 CVE-2024-39908
>> https://avd.aquasec.com/nvd/cve-2024-39908
>> MEDIUM rexml 3.2.5 >= 3.3.3 CVE-2024-41123
>> https://avd.aquasec.com/nvd/cve-2024-41123
>> MEDIUM rexml 3.2.5 >= 3.3.3 CVE-2024-41946
>> https://avd.aquasec.com/nvd/cve-2024-41946
>> MEDIUM rexml 3.2.5 >= 3.3.6 CVE-2024-43398
>> https://avd.aquasec.com/nvd/cve-2024-43398
>> 
>> From,
>> Hugh Pearse



Re: [Discussion] Measures that span multiple tables

2024-10-01 Thread Julian Hyde
Good question - thanks for asking.

The short answer is that measures that span tables work, but they aren’t quite 
as convenient as measures in a single table.

When designing Measures in SQL I knew that measures sometimes span tables - 
star schemas are a case in point - but decided to make measures belong to one 
table for simplicity. This allowed me to put them into the same namespace as 
columns, so I can write ’select e.avg_sal from emp as e’. (Measures are not 
columns, but SQL semantic analyzers already know how to look up a column within 
the scope of a table.)

My goal was to allow dimensional expressions, such as revenue-this-year minus 
revenue-last-year, so we need to know what are the dimensions that affect the 
value of a measure. By associating a measure with a table we can say that the 
(non-measure) columns of that table are its dimensions.

One case of 'measures spanning tables’ is joining a table with a measure to a 
regular table. For example, suppose the orders table has an avg_shipping_cost 
measure, and we write this query:

  select o.zipcode, o.avg_shipping_cost, count(distinct i.color)
  from orders as o
join orderItems as i on o.id  = i.orderId
  group by o.zipcode

If I’d written avg(o.shipping_cost), orders with many items would be weighted 
more heavily in the total than orders with few items. But measures are ‘locked’ 
to the grain of their table.

This is a crucial property of measures. It allows me to create a wide join view 
over a star schema - say of orders, order-items, customers and products - and 
the measures in that view will be well-behaved. Users get the benefits of the 
join without having to remember the right join conditions.

By the way, if I really want a weighted average shipping, I can unpack convert 
the shipping-cost measure into a value, join it to the order-items, and then 
re-package it as a measure again:

  select o.zipcode, avg(o.avg_shipping_cost) as measure 
weighted_avg_shipping_cost,
  count(distinct i.color)
  from orders as o
join orderItems as i on o.id  = i.orderId
  group by o.zipcode

Now, another case of ‘measures spanning tables’ is common dimensions (also 
known as conformed dimensions). Let’s suppose I have an orders table and an 
inventory table, both with measures, and both with a date column (orderDate and 
inventoryDate) and a reference to a product. I can write a query that joins 
these tables:

  select o.sum_revenue, v.total_on_hand
  from orders as o
join inventory as v
on o.orderDate = v.inventoryDate
and  o.productId = v.productId
  where o.color = ‘Red'

This is a many-to-many join — like joining two cubes in an OLAP system — but 
the measures’ grain-locking ensures prevents double-counting.

Notice that I have had to manually put the common dimensions into a join 
condition. There is nothing in the definition of the orders.sum_revenue and 
inventory.total_on_hand measures that connects their date dimensions. I have to 
remember to make that connection.

Where to store those definitions? I don’t know. I don’t think it would be a 
good idea to build a view on this many-to-many join (too many columns, too much 
complexity).

One idea is to store these common dimensions in the semantic layer. Perhaps a 
'semantic layer' should have entities like ‘product’, ‘date’, ‘customer’ and 
record which columns on the ‘cube’ tables connect to each entity. And then the 
semantic layer can help people to write SQL. In my opinion, measures are not 
really the semantic layer — they are already there, as part of the tables, but 
their existence means that the semantic layer doesn’t have to worry about grain 
and double-counting, and so can generate simpler SQL.

Julian



> On Sep 30, 2024, at 3:00 PM, Barak Alon  wrote:
> 
> Hey there -
> 
> I've found my way here via Measures in SQL
> . I'm a long time
> admirer of Calcite, but I'm not very familiar with internals - apologies if
> I stumble.
> 
> I work on Airbnb's Minerva
> 
> project,
> an internal semantic layer for star schemas. Measures in SQL is a
> wonderfully elegant idea, and I've started exploring if we can introduce
> them to our system.
> 
> However, our users often write queries that span multiple tables/stars,
> requiring a drill across operation. Our system handles this by exposing a
> virtual table that pretends as if our entire semantic layer is a single,
> wide fact table. Incompatible measure/dimension combinations result in
> runtime errors.
> 
> This need feels pretty common and a logical extension to some of Measures'
> strengths - making SQL more powerful and expressive for "top down" queries.
> 
> I'm curious - are there plans to support measures that span multiple tables
> in Calcite?
> 
> - Barak



[jira] [Created] (CALCITE-6602) Deadlock while loading class SqlTypeFixture

2024-09-27 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6602:


 Summary: Deadlock while loading class SqlTypeFixture
 Key: CALCITE-6602
 URL: https://issues.apache.org/jira/browse/CALCITE-6602
 Project: Calcite
  Issue Type: Bug
Affects Versions: 1.37.0
Reporter: Julian Hyde


I am seeing deadlocks during initialization of {{interface RelDataTypeSystem}} 
and {{class RelDataTypeSystemImpl}}. The command {{./gradlew :core:test --tests 
RelDataTypeSystemTest}} hangs about 90% of the time for me. Here's part of the 
thread dump running under JDK 21.
 
{noformat}
"ForkJoinPool-1-worker-7" #30 [1161671] daemon prio=5 os_prio=0 cpu=8.26ms 
elapsed=19.26s tid=0x7d08bff0 nid=1161671 waiting on condition  
[0x7d098b3fd000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.calcite.sql.type.SqlTypeFixture.(SqlTypeFixture.java:28)
        - waiting on the Class initialization monitor for 
org.apache.calcite.rel.type.RelDataTypeSystem
        at 
org.apache.calcite.sql.type.RelDataTypeSystemTest$Fixture.(RelDataTypeSystemTest.java:143)
        at 
org.apache.calcite.sql.type.RelDataTypeSystemTest.testCustomDecimalModReturnTypeInference(RelDataTypeSystemTest.java:310)
        at 
java.lang.invoke.LambdaForm$DMH/0x7d0934160400.invokeVirtual(java.base@21/LambdaForm$DMH)
...
        at 
java.util.concurrent.ForkJoinWorkerThread.run(java.base@21/ForkJoinWorkerThread.java:188)
"ForkJoinPool-1-worker-8" #31 [1161672] daemon prio=5 os_prio=0 cpu=7.33ms 
elapsed=19.26s tid=0x7d08a8000ff0 nid=1161672 waiting on condition  
[0x7d098b2fc000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.calcite.rel.type.RelDataTypeSystem.(RelDataTypeSystem.java:38)
        - waiting on the Class initialization monitor for 
org.apache.calcite.rel.type.RelDataTypeSystemImpl
        at 
org.apache.calcite.sql.type.SqlTypeFixture.(SqlTypeFixture.java:28)
        at 
org.apache.calcite.sql.type.RelDataTypeSystemTest$Fixture.(RelDataTypeSystemTest.java:143)
        at 
org.apache.calcite.sql.type.RelDataTypeSystemTest.testDecimalModReturnTypeInference(RelDataTypeSystemTest.java:229)
... {noformat}
 
Occurs in current main (commit 
[a8802c7|https://github.com/apache/calcite/commit/a8802c721a2805159d166e45044c00364d1cb6c3]).

Not a high priority. Just documenting that this bug exists. Hopefully it only 
exists when running tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Towards Calcite 1.38.0

2024-09-26 Thread Julian Hyde
Where are we on 1.38? Down to 2: 4512 and 5156. Please review and merge today.

I have draft release notes. Please review them in the PR:
https://github.com/apache/calcite/pull/3979.

Julian

Case status:

1. https://issues.apache.org/jira/browse/CALCITE-6161 equalsDeep

No longer in 1.38

2. https://issues.apache.org/jira/browse/CALCITE-5156 implicit number type cast

No longer in 1.38

3. https://issues.apache.org/jira/browse/CALCITE-6560 (supportsNegativeScale)

I don't think this one is going to make it. I am seeing a classloader
deadlock when I run the tests, and that makes me very nervous. Also, I
would like to fix the related bug
https://issues.apache.org/jira/browse/CALCITE-6598 at the same time.

4. https://issues.apache.org/jira/browse/CALCITE-6300 Function
MAP_VALUES/MAP_KEYS gives exception

No longer in 1.38 because the proposed solution was hacky

5. https://issues.apache.org/jira/browse/CALCITE-4512 group by expression

https://github.com/apache/calcite/pull/3929 needs review

6. https://issues.apache.org/jira/browse/CALCITE-5156 Support implicit
number type cast for IN Sub-query

On Wed, Sep 18, 2024 at 1:21 AM Ruben Q L  wrote:
>
> I'll try to look at a few PRs in the next days.
>
> I've also updated our Jira dashboard [1] (version: 1.38) to have a clear
> overview about the status of the issues for the next release.
>
> [1]
> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12333950
>
>
> On Tue, Sep 17, 2024 at 8:16 PM Mihai Budiu  wrote:
>
> > I am now at Current, the confluent conference, but I hope to find some
> > time on Friday to do a few.
> > 
> > From: Julian Hyde 
> > Sent: Tuesday, September 17, 2024 12:08:00 PM
> > To: dev@calcite.apache.org 
> > Subject: Re: [DISCUSS] Towards Calcite 1.38.0
> >
> > I merged fixes for 4 cases yesterday. Can I get some help reviewing
> > and merging PRs (especially those in category 1 and 2)?
> >
> > There are about 6 regular committers who merge a few PRs before each
> > release. (You know who you are -- and I am profoundly grateful! Your
> > work keeps the project going.) If you could each do 2 or 3 PRs by the
> > end of the weekend, we can get this release done.
> >
> > Julian
> >
> >
> > On Mon, Sep 16, 2024 at 10:56 AM Julian Hyde 
> > wrote:
> > >
> > > Yeah, sorry, I don’t often use the GitHub ‘approve’ button. My verbal
> > approvals are often conditional. The ‘approve’ button doesn’t allow nuance.
> > >
> > > > On Sep 15, 2024, at 3:26 PM, Mihai Budiu  wrote:
> > > >
> > > > You have not listed CALCITE-4918,
> > https://issues.apache.org/jira/browse/CALCITE-4918 for which I have a PR,
> > which I think you have approved using comments
> > https://github.com/apache/calcite/pull/3947, but not using the official
> > "approve" button.
> > > >
> > > > I was hoping other people would weigh on this design and
> > implementation.
> > > >
> > > > Mihai
> > > >
> > > > 
> > > > From: Julian Hyde 
> > > > Sent: Sunday, September 15, 2024 3:12 PM
> > > > To: dev@calcite.apache.org 
> > > > Subject: Re: [DISCUSS] Towards Calcite 1.38.0
> > > >
> > > > I've logged https://issues.apache.org/jira/browse/CALCITE-6582 to
> > > > track the release.
> > > >
> > > > There are 26 open bugs [1] marked to fix for 1.38. I have organized
> > > > them into 5 categories. Can people please review and merge those in
> > > > category 1? To get a release vote next week, so let's get busy
> > > > reviewing and merging bugs this week.
> > > >
> > > > Category 1. The following have a PR available that seems close to
> > > > ready. If these are good, let's get these merged!
> > > > * CALCITE-6572 Add more tests for NULL arguments to TO_CHAR functions
> > > > * CALCITE-6554 nested correlated sub-query in aggregation does not
> > > > have inner correlation variable bound to inner projection
> > > > * CALCITE-6550 Improve SQL function overloading
> > > > * CALCITE-6343 AS alias operator strips MEASUREness from measures
> > > > * CALCITE-6161 The equalsDeep of sqlCall should compare sqlOperator's
> > sqlKind
> > > > * CALCITE-6226 Wrong ISOWEEK and no ISOYEAR on BigQuery FORMAT_DATE
> > > > * CALCITE-2067 RexLiteral cannot represent accurately floating point
> > > > values, including NaN, Infinity
> > > > * CALCITE-5156 

Re: [DISCUSS] Vote format

2024-09-26 Thread Julian Hyde
The relationship between real names, github names, email addresses,
jira ids, ASF names is complicated, but one useful tool is git's
.mailmap file. 'git log' uses it to convert email addresses to real
names, and we release managers use it when we generate the list of
unique contributors for each release.

Julian

[1] https://github.com/julianhyde/calcite/commits/6582-release-1.38/.mailmap

On Tue, Sep 24, 2024 at 2:14 AM Alessandro Solimando
 wrote:
>
> Hello everyone,
> sorry for the late reply, my vote is "+0 asolimando
> (committer/non-binding)".
>
> Making a difference on votes only for PMC vs the rest I think it's
> important to boost confidence that everyone's voice will be heard, I don't
> think it would be beneficial to state committership or anything other than
> the "binding / non-binding" we already use.
>
> For what concerns jira id vs github username vs "real" names, I have had
> some troubles in the past, and the community (luckily!) keeps growing, so
> it would be nice to have a better way to connect the three easily.
>
> Best regards,
> Alessandro
>
> On Mon, 16 Sept 2024 at 20:13, Julian Hyde  wrote:
>
> > Enrico,
> >
> > I agree that votes from committers and community members should be taken
> > seriously (especially when they highlight flaws in the release).
> >
> > I also agree that encouraging non-committers to join the community is
> > important. I would like to see them reviewing PRs and voting on releases. I
> > think that the vote format is neutral on this. Hopefully non-committers
> > will see other non-committers voting and be encouraged to participate more.
> > I intend to include language in the vote email that makes clear that votes
> > are welcome from everyone.
> >
> > Julian
> >
> >
> > > On Sep 14, 2024, at 2:01 AM, Enrico Olivelli 
> > wrote:
> > >
> > > I think that adding '(binding)' is enough to count the votes cast by the
> > > PMC.
> > >
> > > I am suggesting this because we should encourage people not being
> > > committer/PMC member to validate and cast their vote.
> > >
> > > IMHO Adding too many labels next to the name and the +1 may bias the
> > > testing and the validation. Like thinking that the validation from
> > someone
> > > is 'better' that the others because it has that label.
> > > The same is for -1. Any concerns about the goodness of the release should
> > > have the same weight, even if it is coming from a new user.
> > >
> > > It seems to me that this is an healthy community, and we know each other
> > > (no need to check the Apache id) and there is no need to show the labels
> > > too much.
> > >
> > > My 2 cents, from the peanuts gallery
> > >
> > > Enrico
> > >
> > >
> > > Il Sab 14 Set 2024, 08:57 Francis Chuang  ha
> > > scritto:
> > >
> > >> +1, this is good for consistency.
> > >>
> > >> On 14/09/2024 7:22 am, Stamatis Zampetakis wrote:
> > >>> It's easier to understand who is who without checking the index [1]
> > >>> that I do occasionally.
> > >>>
> > >>> +1 zabetak (PMC)
> > >>>
> > >>> [1] https://people.apache.org/committer-index.html
> > >>>
> > >>> On Fri, Sep 13, 2024 at 10:16 PM Julian Hyde  wrote:
> > >>>>
> > >>>> There is a discussion in the Incubator list [1] that proposes a more
> > >>>> structured format for the line that people write in their email
> > >>>> message when they vote.
> > >>>>
> > >>>> The problem is that some people use multiple names, email accounts,
> > >>>> github accounts, and that tallying votes requires some effort.
> > >>>>
> > >>>> The proposed solution is that people would use a structured line when
> > >>>> they vote, consisting of their vote (-1, 0, +1), their ASF id if they
> > >>>> have one, and their role (committer, PMC member, community member).
> > >>>> For example, my vote would be
> > >>>>
> > >>>>   +1 jhyde (PMC)
> > >>>>
> > >>>> In my opinion, Calcite should adopt this. The email starting a vote
> > >>>> would recommend that people use that format. Votes would still be
> > >>>> counted if people use a different format.
> > >>>>
> > >>>> What do you think?
> > >>>>
> > >>>> Julian
> > >>>>
> > >>>> [1] https://lists.apache.org/thread/xkfrnt8f733hhokmh0lvyhsn50x4qfw0
> > >>
> > >>
> >
> >


[jira] [Created] (CALCITE-6598) In RelDataTypeSystem, deprecate methods getMaxNumericScale and getMaxNumericPrecision

2024-09-25 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6598:


 Summary: In RelDataTypeSystem, deprecate methods 
getMaxNumericScale and getMaxNumericPrecision
 Key: CALCITE-6598
 URL: https://issues.apache.org/jira/browse/CALCITE-6598
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


In {{{}interface RelDataTypeSystem{}}}, deprecate {{getMaxNumericScale()}} in 
favor of {{getMaxScale(SqlTypeName)}} and {{getMaxNumericPrecision()}} in favor 
of {{{}getMaxPRecision(SqlTypeName){}}}.

This will make them consistent with {{getMinScale(SqlTypeName)}} and 
{{getMinPrecision(SqlTypeName)}} (both added in CALCITE-6560).

For each pair of methods (one with {{SqlTypeName}} and one without), if we 
allow people to choose which to override, things will become too complicated. 
So, in release 1.39 we will make {{getMaxNumericScale()}} and 
{{getMaxNumericPrecision()}} methods final. This will be a *breaking change* - 
people who have overridden these methods will be broken.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6596) Enable function-level cache by default

2024-09-25 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6596:


 Summary: Enable function-level cache by default
 Key: CALCITE-6596
 URL: https://issues.apache.org/jira/browse/CALCITE-6596
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


CALCITE-5914 added the function-level cache (to allow functions such as 
REGEXP_CONTAINS to avoid computing expensive state every call) with the 
intention of having a default cache size of 1,000, but the code actually set 
the maximum size to 0, effectively disabling the cache.

This change sets the default to 1,000.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Discussion: Calcite's standard for function argument conversion

2024-09-23 Thread Julian Hyde
There was a discussion a while ago where we decided that type resolution should 
decide which overload of the method was being called - and therefore the type 
that each argument needs to be converted to. And it should decide how to 
convert the argument to the parameter type. (I use the words ‘argument’ and 
‘parameter’ in their technical sense [1].)

One kind of conversion is widening, e.g. converting a SMALLINT argument to 
INTEGER parameter type. This is widening because every valid SMALLINT value 
maps to one INTEGER value.

Another kind of conversion is coercion, e.g. converting a CHAR or VARCHAR 
argument to a DATE parameter type. This is coercion because some strings (e.g. 
‘2024-09-23’ and ‘2024-9-23’) correspond to valid dates, and others (e.g. 
‘Hello’) do not.

I do not remember where that discussion ended up. I thought we’d end up 
implementing it by storing the ‘how to convert’ information in the SqlValidator 
along with the inferred parameter type.

Julian

PS Please use ‘[DISCUSS]’ rather than ‘Discussion:’ in the message subject. 
Standards are useful.

[1] https://www.educative.io/answers/parameter-vs-argument

> On Sep 21, 2024, at 9:35 AM, Cancai Cai  wrote:
> 
> Hello, everyone in the calcite community.
> 
> I encountered a problem while dealing with
> https://issues.apache.org/jira/browse/CALCITE-6300. That is, when should
> the type conversions related to map and array be performed?
> 
> At present, I can think of two solutions. One is to convert the final
> result when returning the result, such as:
> https://issues.apache.org/jira/browse/CALCITE-5948,
> e.g:
> @LibraryOperator(libraries = {SPARK})
> public static final SqlFunction ARRAY_COMPACT =
> SqlBasicFunction.create(SqlKind.ARRAY_COMPACT,
> SqlLibraryOperators::arrayCompactReturnType,
> OperandTypes.ARRAY);
> 
> The second is to convert after parameter type verification, for example,
> https://issues.apache.org/jira/browse/CALCITE-6300
> @LibraryOperator(libraries = {SPARK})
> public static final SqlFunction MAP_VALUES =
> SqlBasicFunction.create(SqlKind.MAP_VALUES,
> ReturnTypes.TO_MAP_VALUES_NULLABLE,
> OperandTypes.MAP)
> .withOperandTypeInference(InferTypes.MAP_SPARK_FUNCTION_TYPE);
> 
> Perhaps these two methods are not suitable, because they are not universal
> solutions. I don’t know how the database defines this type conversion
> standard, but Calcite does not seem to have a clear standard for this
> conversion. But I think it should be solved because many functions in Spark
> need this conversion. (p.s. I have implemented the
> adjustTypeForMapFunctionConstructor api, but I don't know where to put it.)
> 
> More discussions can be viewed in jira:
> https://issues.apache.org/jira/browse/CALCITE-6300
> 
> If there are any mistakes in my description process, please correct me. For
> solutions, if others have better suggestions or other project code
> examples, please put them forward. I am happy to learn.
> 
> Sorry for asking this question so late
> 
> Best wishes,
> Cancai Cai



Re: [DISCUSS] Towards Avatica 1.26.0

2024-09-23 Thread Julian Hyde
Istvan,

Let’s have this discussion in 
https://issues.apache.org/jira/browse/CALCITE-6590. (Even better, assign it to 
yourself!)

Julian


> On Sep 23, 2024, at 1:16 AM, Istvan Toth  wrote:
> 
> The JVM option is -Djava.security.manager, as already suggested by Julian
> in CALCITE-6590.
> 
> I think that currently that's the best option.
> Looking at the code, getSubject()  only seems to be called when Kerberos
> authentication is explicitly configured,
> which typically means Hadoop-related environments, where this option will
> be needed anyway by the other libraries.
> 
> Istvan
> 
> On Mon, Sep 23, 2024 at 9:24 AM Istvan Toth  wrote:
> 
>> Let's keep it simple: use the same, correct, security system in all
>>> environments.
>> 
>> 
>> IUC the problem is that there is no common subset of the necessary APIs
>> between the Java 8 and Java 23 API.
>> The APIs that can be used in Java 23 were not present in Java 8 yet.
>> 
>> So the only options I see are:
>> - Not supporting Java 23
>> - Not supporting Java 8
>> - Reflection based shenanigans
>> - Multi-versioned JAR or some custom classloading shenanigan to the same
>> effect
>> - Adding and requiring a compatibility JVM option to run on Java 23.
>> 
>> Adding and requiring the compatibility JVM option is the least work, and
>> it's likely necessary anyway for any Hadoop-related use case.
>> 
>> Istvan
>> 
>> 
>> 
>> On Sat, Sep 21, 2024 at 1:56 AM Julian Hyde  wrote:
>> 
>>>> Apache Pig hasn't released a new version for a long time.
>>> 
>>> We should consider removing the Pig adapter at some point.
>>> 
>>> Also upgrade the Spark adapter to a version that uses Hadoop 3.x
>>> rather than Hadoop 2.x.
>>> 
>> 
>> 
>> --
>> *István Tóth* | Sr. Staff Software Engineer
>> *Email*: st...@cloudera.com
>> cloudera.com <https://www.cloudera.com>
>> [image: Cloudera] <https://www.cloudera.com/>
>> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
>> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
>> Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera>
>> --
>> --
>> 
> 
> 
> -- 
> *István Tóth* | Sr. Staff Software Engineer
> *Email*: st...@cloudera.com
> cloudera.com <https://www.cloudera.com>
> [image: Cloudera] <https://www.cloudera.com/>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> --
> --



Re: Question regarding calcite logical algebra's ordinal place holder

2024-09-23 Thread Julian Hyde
Are you interested in changing (1) how field references are represented in 
RelNodes, (2) how field references are printed in plans, (3) how references to 
fields are created when you are building a tree of RelNodes.

(1) would be a huge disruptive change (and require us to decide what to do when 
field names are not unique); (2) is straightforward - you can write your own 
printer; (3) is already solved by RelBuilder.

Julian

PS Your message went through moderation. You should subscribe to the dev@ list 
in so that you receive timely replies, and we are saved the effort of 
moderation.

> On Sep 23, 2024, at 10:02 AM, He, Guanlin  wrote:
> 
> Hi developers of calcite.
> 
> I am wondering if there is a way to change the ordinal in RelNode's 
> relational algebra into the field name.
> 
> For example if I have logical planning like:
> LogicalProject(name=[$1])
>  LogicalJoin(condition=[=($0, $6)], joinType=[inner])
> 
> How could I change back into: and only output logcal algebra string.
> 
> LogicalProject(name=[student.name])
>  LogicalJoin(condition=[=(student.id, section.id)], joinType=[inner])
> 
> I am currently iterating through relNode and fill exps by their rowType but 
> this method is really mess and need alot of pre condition ( for example, case 
> join and case agg). I am wondering if there is a easy way that I dont know 
> could do that.
> 
> Best,
> Guanllin He
> 
> 
> 



Re: [DISCUSS] Towards Avatica 1.26.0

2024-09-20 Thread Julian Hyde
> Apache Pig hasn't released a new version for a long time.

We should consider removing the Pig adapter at some point.

Also upgrade the Spark adapter to a version that uses Hadoop 3.x
rather than Hadoop 2.x.


Re: [DISCUSS] Towards Avatica 1.26.0

2024-09-20 Thread Julian Hyde
> David Radley wrote:
>
> Is it worth fixing and releasing Avatica first then release the new
> Calcite version that depends on it?

In an ideal world, yes. But we are volunteer-driven, and a release
takes time and effort. To make a release of Avatica, a few bugs will
need to be fixed, and then several days for a release vote. It will
inevitably delay the Calcite release.

I'd rather finish the original goal, an RC of Calcite next week,
hopefully followed by a release by the end of the month.

> Looking around getSubject, is there alternative security calls we
> should be making in its absence? If the getSubject() is resulting
> in extra security (even if it is deprecated), should we remove it
> only for java 23?

I'm not an expert on Java security. But generally things get more
secure over time, and when something is deprecated there's usually a
(better) replacement that has been around for some time. I suspect
that will be the case here.

Having different code for different runtime environments is very
tricky. (We have done it before, in a few cases, using reflection.)
But I don't want to do something "tricky" when it relates to security.
(Even though it's very tempting, because I just want to get the tests
to pass, and the tests are not even about security.)

Let's keep it simple: use the same, correct, security system in all
environments.

Julian

On Fri, Sep 20, 2024 at 5:56 AM David Radley  wrote:
>
> Hi,
> Some thoughts on this.
>
> Is it worth fixing and releasing Avatica first then release the new Calcite 
> version that depends on it?
> Looking around getSubject, is there alternative security calls we should be 
> making in its absence?
> If the getSubject() is resulting in extra security (even if it is 
> deprecated), should we remove it only for java 23?
>  Kind regards, David
>
> From: Ruben Q L 
> Date: Thursday, 19 September 2024 at 19:46
> To: dev@calcite.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] Towards Avatica 1.26.0
> +1, I think the proposed plan seems reasonable.
>
> Ruben
>
>
> On Thu, Sep 19, 2024 at 5:31 PM Julian Hyde  wrote:
>
> > While working towards releasing Calcite 1.38.0, I noticed that JDK 23
> > has been released, and decided to try to upgrade Calcite to support
> > it[1]. And I found myself blocked by an Avatica bug, namely that
> > Avatica uses a getSubject method that has been deprecated for some
> > time and finally disabled in JDK 23. (Incidentally, Hadoop has the
> > same problem [3]; people don't expect them to run on the current JDK,
> > but we will need to disable our Spark and Pig adapter tests under JDK
> > 23 until they fix it.)
> >
> > There's currently just one bug, to upgrade Avatica to JDK 23. This
> > includes re-enabling the mode where we fail to compile if any
> > deprecated methods are used, and removing uses of getSubject [2].
> >
> > I think we can release Calcite 1.38.0 with limited JDK 23 support
> > (i.e. a few tests disabled), but we're going to need a new Avatica
> > release in the next month or two, and it will need to fix the
> > getSubject issue.
> >
> > What do you think?
> >
> > Julian
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-6587
> > [2] https://issues.apache.org/jira/browse/CALCITE-6588
> > [3] https://issues.apache.org/jira/browse/HADOOP-19212
> >
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


Re: [DISCUSS] Towards Avatica 1.26.0

2024-09-20 Thread Julian Hyde
I have logged https://issues.apache.org/jira/browse/CALCITE-6590 for
the security manager issue. Can we have a volunteer to take this? I'm
not an expert on security managers.

On Fri, Sep 20, 2024 at 5:56 AM David Radley  wrote:
>
> Hi,
> Some thoughts on this.
>
> Is it worth fixing and releasing Avatica first then release the new Calcite 
> version that depends on it?
> Looking around getSubject, is there alternative security calls we should be 
> making in its absence?
> If the getSubject() is resulting in extra security (even if it is 
> deprecated), should we remove it only for java 23?
>  Kind regards, David
>
> From: Ruben Q L 
> Date: Thursday, 19 September 2024 at 19:46
> To: dev@calcite.apache.org 
> Subject: [EXTERNAL] Re: [DISCUSS] Towards Avatica 1.26.0
> +1, I think the proposed plan seems reasonable.
>
> Ruben
>
>
> On Thu, Sep 19, 2024 at 5:31 PM Julian Hyde  wrote:
>
> > While working towards releasing Calcite 1.38.0, I noticed that JDK 23
> > has been released, and decided to try to upgrade Calcite to support
> > it[1]. And I found myself blocked by an Avatica bug, namely that
> > Avatica uses a getSubject method that has been deprecated for some
> > time and finally disabled in JDK 23. (Incidentally, Hadoop has the
> > same problem [3]; people don't expect them to run on the current JDK,
> > but we will need to disable our Spark and Pig adapter tests under JDK
> > 23 until they fix it.)
> >
> > There's currently just one bug, to upgrade Avatica to JDK 23. This
> > includes re-enabling the mode where we fail to compile if any
> > deprecated methods are used, and removing uses of getSubject [2].
> >
> > I think we can release Calcite 1.38.0 with limited JDK 23 support
> > (i.e. a few tests disabled), but we're going to need a new Avatica
> > release in the next month or two, and it will need to fix the
> > getSubject issue.
> >
> > What do you think?
> >
> > Julian
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-6587
> > [2] https://issues.apache.org/jira/browse/CALCITE-6588
> > [3] https://issues.apache.org/jira/browse/HADOOP-19212
> >
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


[jira] [Created] (CALCITE-6590) Remove use of Java SecurityManager in Avatica

2024-09-20 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6590:


 Summary: Remove use of Java SecurityManager in Avatica
 Key: CALCITE-6590
 URL: https://issues.apache.org/jira/browse/CALCITE-6590
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde
 Fix For: 1.26.0


Remove use of Java SecurityManager in Avatica.

Running Avatica on JDK 23 (or JRE 23) we get the following runtime errors:
{noformat}
Caused by: java.lang.UnsupportedOperationException: getSubject is supported 
only if a security manager is allowed at 
java.base/javax.security.auth.Subject.getSubject(Subject.java:347) at 
org.apache.calcite.avatica.server.SubjectPreservingPrivilegedThreadFactory.newThread(SubjectPreservingPrivilegedThreadFactory.java:43)
 {noformat}
We were warned - methods {{getSubject}} have been deprecated since JDK 18.

The message "getSubject is supported only if a security manager is allowed" 
implies that another solution would be to enable a security manager (including 
during tests, and when Avatica is used in Calcite's tests). Should we consider 
that?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Towards Avatica 1.26.0

2024-09-19 Thread Julian Hyde
While working towards releasing Calcite 1.38.0, I noticed that JDK 23
has been released, and decided to try to upgrade Calcite to support
it[1]. And I found myself blocked by an Avatica bug, namely that
Avatica uses a getSubject method that has been deprecated for some
time and finally disabled in JDK 23. (Incidentally, Hadoop has the
same problem [3]; people don't expect them to run on the current JDK,
but we will need to disable our Spark and Pig adapter tests under JDK
23 until they fix it.)

There's currently just one bug, to upgrade Avatica to JDK 23. This
includes re-enabling the mode where we fail to compile if any
deprecated methods are used, and removing uses of getSubject [2].

I think we can release Calcite 1.38.0 with limited JDK 23 support
(i.e. a few tests disabled), but we're going to need a new Avatica
release in the next month or two, and it will need to fix the
getSubject issue.

What do you think?

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-6587
[2] https://issues.apache.org/jira/browse/CALCITE-6588
[3] https://issues.apache.org/jira/browse/HADOOP-19212


Re: Questions Regarding Implementing LIMIT Clause Pushdown in Apache Calcite JDBC Driver

2024-09-17 Thread Julian Hyde
By the way, your messages keep on going to moderation. To receive replies, and 
to remove the effort/delay of moderation, please subscribe to the dev list. 

Julian

> On Sep 17, 2024, at 23:44, Julian Hyde  wrote:
> 
> FilterableTable and ProjectFikterTable were designed to make simple things 
> simple but not go any further.
> 
> It’s time to learn how to write planner rules.
> 
> Julian
> 
>> On Sep 17, 2024, at 22:42, Xiong Duan  wrote:
>> 
>> As far as I know, you may need to customize the implementation of
>> SortFilterableTable to support pushing limits. Like FilterableTable,
>> then you need to implement SortFilterTableScanRule to make sure can
>> generate the SortFilterableTable.
>> 
>> Mihai Budiu  于2024年9月17日周二 16:37写道:
>>> 
>>> Isn't limit stored in an order by clause, with potentially an empty list of 
>>> fields to sort on?
>>> 
>>> From: Vishal Kore 
>>> Sent: Tuesday, September 17, 2024 1:17:15 AM
>>> To: dev@calcite.apache.org 
>>> Cc: Harshit Dwivedi ; Dipesh 
>>> Sachdev ; Thomas Keller 
>>> 
>>> Subject: Questions Regarding Implementing LIMIT Clause Pushdown in Apache 
>>> Calcite JDBC Driver
>>> 
>>> Hi Devs,
>>> 
>>> I am currently working on implementing (or extending) a JDBC driver for my 
>>> backend storage system using Apache Calcite. I've implemented the 
>>> FilterableTable interface, which allows pushing down WHERE clause filters 
>>> to the backend using the scan(DataContext ctx, List filters) 
>>> method.
>>> However, I'm struggling to push down the LIMIT clause to ensure that only 
>>> the required number of records are retrieved from the backend. I cannot 
>>> figure out how to access the LIMIT clause within the scan method.
>>> Could you please guide me on how to retrieve or implement the LIMIT clause 
>>> in this context?
>>> Thanks for your assistance!
>>> Best regards,
>>> Vishal Kore


Re: Questions Regarding Implementing LIMIT Clause Pushdown in Apache Calcite JDBC Driver

2024-09-17 Thread Julian Hyde
FilterableTable and ProjectFikterTable were designed to make simple things 
simple but not go any further. 

It’s time to learn how to write planner rules.

Julian

> On Sep 17, 2024, at 22:42, Xiong Duan  wrote:
> 
> As far as I know, you may need to customize the implementation of
> SortFilterableTable to support pushing limits. Like FilterableTable,
> then you need to implement SortFilterTableScanRule to make sure can
> generate the SortFilterableTable.
> 
> Mihai Budiu  于2024年9月17日周二 16:37写道:
>> 
>> Isn't limit stored in an order by clause, with potentially an empty list of 
>> fields to sort on?
>> 
>> From: Vishal Kore 
>> Sent: Tuesday, September 17, 2024 1:17:15 AM
>> To: dev@calcite.apache.org 
>> Cc: Harshit Dwivedi ; Dipesh Sachdev 
>> ; Thomas Keller 
>> 
>> Subject: Questions Regarding Implementing LIMIT Clause Pushdown in Apache 
>> Calcite JDBC Driver
>> 
>> Hi Devs,
>> 
>> I am currently working on implementing (or extending) a JDBC driver for my 
>> backend storage system using Apache Calcite. I've implemented the 
>> FilterableTable interface, which allows pushing down WHERE clause filters to 
>> the backend using the scan(DataContext ctx, List filters) method.
>> However, I'm struggling to push down the LIMIT clause to ensure that only 
>> the required number of records are retrieved from the backend. I cannot 
>> figure out how to access the LIMIT clause within the scan method.
>> Could you please guide me on how to retrieve or implement the LIMIT clause 
>> in this context?
>> Thanks for your assistance!
>> Best regards,
>> Vishal Kore


[jira] [Created] (CALCITE-6588) Support JDK 23 and Guava 33.3.0 in Avatica

2024-09-17 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6588:


 Summary: Support JDK 23 and Guava 33.3.0 in Avatica
 Key: CALCITE-6588
 URL: https://issues.apache.org/jira/browse/CALCITE-6588
 Project: Calcite
  Issue Type: Improvement
  Components: avatica
Reporter: Julian Hyde
 Fix For: avatica-1.26.0


Support JDK 23 and Guava 33.3.0-jre, which are the latest available versions.

The lower bound has not changed: Avatica supports JDK down to 8, and Guava down 
to 19.0.

We will need to remove use of the {{getSubject}} method, which since JDK 23 
throws:
{noformat}
Caused by: java.lang.UnsupportedOperationException: getSubject is supported 
only if a security manager is allowed
at java.base/javax.security.auth.Subject.getSubject(Subject.java:347)
at 
org.apache.calcite.avatica.server.SubjectPreservingPrivilegedThreadFactory.newThread(SubjectPreservingPrivilegedThreadFactory.java:43)
{noformat}

The method has been deprecated for a while, but the deprecation did not cause a 
build failure, due to CALCITE-5136.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6587) Support JDK 23 and Guava 33.3.0

2024-09-17 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6587:


 Summary: Support JDK 23 and Guava 33.3.0
 Key: CALCITE-6587
 URL: https://issues.apache.org/jira/browse/CALCITE-6587
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


Support JDK 23 and Guava 33.3.0-jre, which are the latest available versions.

The lower bound has not changed: we support JDK down to 8, and Guava down to 
21.0 (since CALCITE-5763).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Towards Calcite 1.38.0

2024-09-17 Thread Julian Hyde
I merged fixes for 4 cases yesterday. Can I get some help reviewing
and merging PRs (especially those in category 1 and 2)?

There are about 6 regular committers who merge a few PRs before each
release. (You know who you are -- and I am profoundly grateful! Your
work keeps the project going.) If you could each do 2 or 3 PRs by the
end of the weekend, we can get this release done.

Julian


On Mon, Sep 16, 2024 at 10:56 AM Julian Hyde  wrote:
>
> Yeah, sorry, I don’t often use the GitHub ‘approve’ button. My verbal 
> approvals are often conditional. The ‘approve’ button doesn’t allow nuance.
>
> > On Sep 15, 2024, at 3:26 PM, Mihai Budiu  wrote:
> >
> > You have not listed CALCITE-4918, 
> > https://issues.apache.org/jira/browse/CALCITE-4918 for which I have a PR, 
> > which I think you have approved using comments 
> > https://github.com/apache/calcite/pull/3947, but not using the official 
> > "approve" button.
> >
> > I was hoping other people would weigh on this design and implementation.
> >
> > Mihai
> >
> > 
> > From: Julian Hyde 
> > Sent: Sunday, September 15, 2024 3:12 PM
> > To: dev@calcite.apache.org 
> > Subject: Re: [DISCUSS] Towards Calcite 1.38.0
> >
> > I've logged https://issues.apache.org/jira/browse/CALCITE-6582 to
> > track the release.
> >
> > There are 26 open bugs [1] marked to fix for 1.38. I have organized
> > them into 5 categories. Can people please review and merge those in
> > category 1? To get a release vote next week, so let's get busy
> > reviewing and merging bugs this week.
> >
> > Category 1. The following have a PR available that seems close to
> > ready. If these are good, let's get these merged!
> > * CALCITE-6572 Add more tests for NULL arguments to TO_CHAR functions
> > * CALCITE-6554 nested correlated sub-query in aggregation does not
> > have inner correlation variable bound to inner projection
> > * CALCITE-6550 Improve SQL function overloading
> > * CALCITE-6343 AS alias operator strips MEASUREness from measures
> > * CALCITE-6161 The equalsDeep of sqlCall should compare sqlOperator's 
> > sqlKind
> > * CALCITE-6226 Wrong ISOWEEK and no ISOYEAR on BigQuery FORMAT_DATE
> > * CALCITE-2067 RexLiteral cannot represent accurately floating point
> > values, including NaN, Infinity
> > * CALCITE-5156 Support implicit number type cast for IN Sub-query
> > * CALCITE-6071 RexCall should carry source position information for
> > runtime error reporting
> > * CALCITE-6498 Elasticsearch multi-field mappings do not work
> > * CALCITE-6522 MAP_KEYS and MAP_VALUES function should throw if a key
> > value is null
> > * CALCITE-6020 SqlToRelConverter should not replace windowed SUM with
> > equivalent expression using SUM0
> > * CALCITE-4512 GROUP BY expression with argument name same with SELECT
> > field and alias causes validation error
> >
> > Category 2. The following cases have a draft PR. Is it ready for 1.38?
> > * CALCITE-6431 Implement the SINGLE_VALUE aggregation in
> > HiveSqlDialect And SparkSQLDialect
> > * CALCITE-6300 Function MAP_VALUES/MAP_KEYS gives exception when
> > mapVauleType and mapKeyType not equals map Biggest mapKeytype or
> > mapValueType
> >
> > Category 3. Need to decide whether to accept or close:
> > * CALCITE-6207 Improve connection resource in JDBCUtils
> > * CALCITE-6411 Support Collect in ToLogicalConverter
> > * CALCITE-6352 The map_contains_key function may return true when the
> > key and mapkeytype types are different.
> >
> > Category 4. The following seem to have no PR. Are there any worth
> > holding the train for?
> > * CALCITE-6277 About the processing of +infinity parameters and
> > -infinity parameters in math function
> > * CALCITE-6203 Avg operator test in SparkAdapterTest
> > * CALCITE-6326 Add LOG10 and LN function (enabled in Mysql, Spark library)
> > * CALCITE-6259 The implementation of the Log library operator does not
> > match the actual dialect behavior.
> > * CALCITE-5541 Upgrade JavaCC
> > * CALCITE-6469 Join on condition generates wrong plan when the
> > condition is IN sub-query
> > * CALCITE-6463 alias generation issue about correlated subquery handling
> >
> > Category 5. Are there any more cases with PR ready (i.e. potential category 
> > 1)?
> >
> > Julian
> >
> > [1] 
> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20CALCITE%20AND%20fixVersion%20%3D%201.38.0%20and%20resolution%20%3D%20Unresolved%20
> >
> > On Fri, Sep 13, 2024 at 6

[jira] [Created] (CALCITE-6585) In the Postgres TO_CHAR function, improve caching

2024-09-16 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6585:


 Summary: In the Postgres TO_CHAR function, improve caching
 Key: CALCITE-6585
 URL: https://issues.apache.org/jira/browse/CALCITE-6585
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


In the Postgres TO_CHAR function (and similar functions added in CALCITE-6358), 
cache the formatter using the same technique used in CALCITE-5914.

Also pass the {{DataContext}} into the constructor of {{class 
DataFormatFunction}}, so that it does not need to be passed to each call to 
{{toCharPg}}.

In {{DateFormatFunction}} obsolete static field {{LOCAL_ZONE}} and instead 
populate a non-static field {{zoneId}} from the {{DataContext}} constructor 
parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Vote format

2024-09-16 Thread Julian Hyde
Enrico,

I agree that votes from committers and community members should be taken 
seriously (especially when they highlight flaws in the release).

I also agree that encouraging non-committers to join the community is 
important. I would like to see them reviewing PRs and voting on releases. I 
think that the vote format is neutral on this. Hopefully non-committers will 
see other non-committers voting and be encouraged to participate more. I intend 
to include language in the vote email that makes clear that votes are welcome 
from everyone.

Julian


> On Sep 14, 2024, at 2:01 AM, Enrico Olivelli  wrote:
> 
> I think that adding '(binding)' is enough to count the votes cast by the
> PMC.
> 
> I am suggesting this because we should encourage people not being
> committer/PMC member to validate and cast their vote.
> 
> IMHO Adding too many labels next to the name and the +1 may bias the
> testing and the validation. Like thinking that the validation from someone
> is 'better' that the others because it has that label.
> The same is for -1. Any concerns about the goodness of the release should
> have the same weight, even if it is coming from a new user.
> 
> It seems to me that this is an healthy community, and we know each other
> (no need to check the Apache id) and there is no need to show the labels
> too much.
> 
> My 2 cents, from the peanuts gallery
> 
> Enrico
> 
> 
> Il Sab 14 Set 2024, 08:57 Francis Chuang  ha
> scritto:
> 
>> +1, this is good for consistency.
>> 
>> On 14/09/2024 7:22 am, Stamatis Zampetakis wrote:
>>> It's easier to understand who is who without checking the index [1]
>>> that I do occasionally.
>>> 
>>> +1 zabetak (PMC)
>>> 
>>> [1] https://people.apache.org/committer-index.html
>>> 
>>> On Fri, Sep 13, 2024 at 10:16 PM Julian Hyde  wrote:
>>>> 
>>>> There is a discussion in the Incubator list [1] that proposes a more
>>>> structured format for the line that people write in their email
>>>> message when they vote.
>>>> 
>>>> The problem is that some people use multiple names, email accounts,
>>>> github accounts, and that tallying votes requires some effort.
>>>> 
>>>> The proposed solution is that people would use a structured line when
>>>> they vote, consisting of their vote (-1, 0, +1), their ASF id if they
>>>> have one, and their role (committer, PMC member, community member).
>>>> For example, my vote would be
>>>> 
>>>>   +1 jhyde (PMC)
>>>> 
>>>> In my opinion, Calcite should adopt this. The email starting a vote
>>>> would recommend that people use that format. Votes would still be
>>>> counted if people use a different format.
>>>> 
>>>> What do you think?
>>>> 
>>>> Julian
>>>> 
>>>> [1] https://lists.apache.org/thread/xkfrnt8f733hhokmh0lvyhsn50x4qfw0
>> 
>> 



Re: [DISCUSS] Towards Calcite 1.38.0

2024-09-16 Thread Julian Hyde
Yeah, sorry, I don’t often use the GitHub ‘approve’ button. My verbal approvals 
are often conditional. The ‘approve’ button doesn’t allow nuance.

> On Sep 15, 2024, at 3:26 PM, Mihai Budiu  wrote:
> 
> You have not listed CALCITE-4918, 
> https://issues.apache.org/jira/browse/CALCITE-4918 for which I have a PR, 
> which I think you have approved using comments 
> https://github.com/apache/calcite/pull/3947, but not using the official 
> "approve" button.
> 
> I was hoping other people would weigh on this design and implementation.
> 
> Mihai
> 
> ____
> From: Julian Hyde 
> Sent: Sunday, September 15, 2024 3:12 PM
> To: dev@calcite.apache.org 
> Subject: Re: [DISCUSS] Towards Calcite 1.38.0
> 
> I've logged https://issues.apache.org/jira/browse/CALCITE-6582 to
> track the release.
> 
> There are 26 open bugs [1] marked to fix for 1.38. I have organized
> them into 5 categories. Can people please review and merge those in
> category 1? To get a release vote next week, so let's get busy
> reviewing and merging bugs this week.
> 
> Category 1. The following have a PR available that seems close to
> ready. If these are good, let's get these merged!
> * CALCITE-6572 Add more tests for NULL arguments to TO_CHAR functions
> * CALCITE-6554 nested correlated sub-query in aggregation does not
> have inner correlation variable bound to inner projection
> * CALCITE-6550 Improve SQL function overloading
> * CALCITE-6343 AS alias operator strips MEASUREness from measures
> * CALCITE-6161 The equalsDeep of sqlCall should compare sqlOperator's sqlKind
> * CALCITE-6226 Wrong ISOWEEK and no ISOYEAR on BigQuery FORMAT_DATE
> * CALCITE-2067 RexLiteral cannot represent accurately floating point
> values, including NaN, Infinity
> * CALCITE-5156 Support implicit number type cast for IN Sub-query
> * CALCITE-6071 RexCall should carry source position information for
> runtime error reporting
> * CALCITE-6498 Elasticsearch multi-field mappings do not work
> * CALCITE-6522 MAP_KEYS and MAP_VALUES function should throw if a key
> value is null
> * CALCITE-6020 SqlToRelConverter should not replace windowed SUM with
> equivalent expression using SUM0
> * CALCITE-4512 GROUP BY expression with argument name same with SELECT
> field and alias causes validation error
> 
> Category 2. The following cases have a draft PR. Is it ready for 1.38?
> * CALCITE-6431 Implement the SINGLE_VALUE aggregation in
> HiveSqlDialect And SparkSQLDialect
> * CALCITE-6300 Function MAP_VALUES/MAP_KEYS gives exception when
> mapVauleType and mapKeyType not equals map Biggest mapKeytype or
> mapValueType
> 
> Category 3. Need to decide whether to accept or close:
> * CALCITE-6207 Improve connection resource in JDBCUtils
> * CALCITE-6411 Support Collect in ToLogicalConverter
> * CALCITE-6352 The map_contains_key function may return true when the
> key and mapkeytype types are different.
> 
> Category 4. The following seem to have no PR. Are there any worth
> holding the train for?
> * CALCITE-6277 About the processing of +infinity parameters and
> -infinity parameters in math function
> * CALCITE-6203 Avg operator test in SparkAdapterTest
> * CALCITE-6326 Add LOG10 and LN function (enabled in Mysql, Spark library)
> * CALCITE-6259 The implementation of the Log library operator does not
> match the actual dialect behavior.
> * CALCITE-5541 Upgrade JavaCC
> * CALCITE-6469 Join on condition generates wrong plan when the
> condition is IN sub-query
> * CALCITE-6463 alias generation issue about correlated subquery handling
> 
> Category 5. Are there any more cases with PR ready (i.e. potential category 
> 1)?
> 
> Julian
> 
> [1] 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CALCITE%20AND%20fixVersion%20%3D%201.38.0%20and%20resolution%20%3D%20Unresolved%20
> 
> On Fri, Sep 13, 2024 at 6:20 PM Xiong Duan  wrote:
>> 
>> +1 We start voting in about ten days. We will merge the almost-ready PR.
>> 
>> Cancai Cai  于2024年9月14日周六 08:55写道:
>>> 
>>> +1,There are still many valuable PRs in Calcite. We can try to merge them 
>>> before 1.38.
>>> 
>>> [1] https://github.com/apache/calcite/pulls
>>> 
>>> Best wishes,
>>> Cancai Cai
>>> 
>>>> 2024年9月14日 04:04,Julian Hyde  写道:
>>>> 
>>>> +1 we’re ready for a release, and I’m happy to be RM.
>>>> 
>>>> I think we could have a first RC and vote in about ten days. What do 
>>>> others feel about timing?
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Sep 13, 2024, at 4:58 AM, Benchao Li

Re: [DISCUSS] Towards Calcite 1.38.0

2024-09-16 Thread Julian Hyde
Pavel, I’ve aded fixVersion = 1.38 to CALCITE-6570.

Niels, Someone has already assigned fixVersion = 1.38 to CALCITE-6576.

So these will get reviewed before RC 1.

> On Sep 16, 2024, at 1:40 AM, Pavel Pereslegin  wrote:
> 
> Hello,
> 
> could you also include CALCITE-6570
> https://issues.apache.org/jira/browse/CALCITE-6570 ?
> 
> пн, 16 сент. 2024 г. в 07:21, Cancai Cai :
>> 
>> If https://issues.apache.org/jira/browse/CALCITE-6544 is not a bug, then 
>> https://issues.apache.org/jira/browse/CALCITE-6527 is ready.
>> 
>> I noticed that some of the jira cases above are my responsibility. I will 
>> follow up during this period and try to merge them into version 1.38.0.
>> 
>> Best wishes,
>> Cancai Cai
>> 
>>> 2024年9月16日 06:26,Mihai Budiu  写道:
>>> 
>>> You have not listed CALCITE-4918, 
>>> https://issues.apache.org/jira/browse/CALCITE-4918 for which I have a PR, 
>>> which I think you have approved using comments 
>>> https://github.com/apache/calcite/pull/3947, but not using the official 
>>> "approve" button.
>>> 
>>> I was hoping other people would weigh on this design and implementation.
>>> 
>>> Mihai
>>> 
>>> 
>>> From: Julian Hyde 
>>> Sent: Sunday, September 15, 2024 3:12 PM
>>> To: dev@calcite.apache.org 
>>> Subject: Re: [DISCUSS] Towards Calcite 1.38.0
>>> 
>>> I've logged https://issues.apache.org/jira/browse/CALCITE-6582 to
>>> track the release.
>>> 
>>> There are 26 open bugs [1] marked to fix for 1.38. I have organized
>>> them into 5 categories. Can people please review and merge those in
>>> category 1? To get a release vote next week, so let's get busy
>>> reviewing and merging bugs this week.
>>> 
>>> Category 1. The following have a PR available that seems close to
>>> ready. If these are good, let's get these merged!
>>> * CALCITE-6572 Add more tests for NULL arguments to TO_CHAR functions
>>> * CALCITE-6554 nested correlated sub-query in aggregation does not
>>> have inner correlation variable bound to inner projection
>>> * CALCITE-6550 Improve SQL function overloading
>>> * CALCITE-6343 AS alias operator strips MEASUREness from measures
>>> * CALCITE-6161 The equalsDeep of sqlCall should compare sqlOperator's 
>>> sqlKind
>>> * CALCITE-6226 Wrong ISOWEEK and no ISOYEAR on BigQuery FORMAT_DATE
>>> * CALCITE-2067 RexLiteral cannot represent accurately floating point
>>> values, including NaN, Infinity
>>> * CALCITE-5156 Support implicit number type cast for IN Sub-query
>>> * CALCITE-6071 RexCall should carry source position information for
>>> runtime error reporting
>>> * CALCITE-6498 Elasticsearch multi-field mappings do not work
>>> * CALCITE-6522 MAP_KEYS and MAP_VALUES function should throw if a key
>>> value is null
>>> * CALCITE-6020 SqlToRelConverter should not replace windowed SUM with
>>> equivalent expression using SUM0
>>> * CALCITE-4512 GROUP BY expression with argument name same with SELECT
>>> field and alias causes validation error
>>> 
>>> Category 2. The following cases have a draft PR. Is it ready for 1.38?
>>> * CALCITE-6431 Implement the SINGLE_VALUE aggregation in
>>> HiveSqlDialect And SparkSQLDialect
>>> * CALCITE-6300 Function MAP_VALUES/MAP_KEYS gives exception when
>>> mapVauleType and mapKeyType not equals map Biggest mapKeytype or
>>> mapValueType
>>> 
>>> Category 3. Need to decide whether to accept or close:
>>> * CALCITE-6207 Improve connection resource in JDBCUtils
>>> * CALCITE-6411 Support Collect in ToLogicalConverter
>>> * CALCITE-6352 The map_contains_key function may return true when the
>>> key and mapkeytype types are different.
>>> 
>>> Category 4. The following seem to have no PR. Are there any worth
>>> holding the train for?
>>> * CALCITE-6277 About the processing of +infinity parameters and
>>> -infinity parameters in math function
>>> * CALCITE-6203 Avg operator test in SparkAdapterTest
>>> * CALCITE-6326 Add LOG10 and LN function (enabled in Mysql, Spark library)
>>> * CALCITE-6259 The implementation of the Log library operator does not
>>> match the actual dialect behavior.
>>> * CALCITE-5541 Upgrade JavaCC
>>> * CALCITE-6469 Join on condition generates wrong plan when the
>>> condition is IN sub-query
>>> * CALCITE-6463 alias gene

Re: [DISCUSS] Towards Calcite 1.38.0

2024-09-15 Thread Julian Hyde
I've logged https://issues.apache.org/jira/browse/CALCITE-6582 to
track the release.

There are 26 open bugs [1] marked to fix for 1.38. I have organized
them into 5 categories. Can people please review and merge those in
category 1? To get a release vote next week, so let's get busy
reviewing and merging bugs this week.

Category 1. The following have a PR available that seems close to
ready. If these are good, let's get these merged!
* CALCITE-6572 Add more tests for NULL arguments to TO_CHAR functions
* CALCITE-6554 nested correlated sub-query in aggregation does not
have inner correlation variable bound to inner projection
* CALCITE-6550 Improve SQL function overloading
* CALCITE-6343 AS alias operator strips MEASUREness from measures
* CALCITE-6161 The equalsDeep of sqlCall should compare sqlOperator's sqlKind
* CALCITE-6226 Wrong ISOWEEK and no ISOYEAR on BigQuery FORMAT_DATE
* CALCITE-2067 RexLiteral cannot represent accurately floating point
values, including NaN, Infinity
* CALCITE-5156 Support implicit number type cast for IN Sub-query
* CALCITE-6071 RexCall should carry source position information for
runtime error reporting
* CALCITE-6498 Elasticsearch multi-field mappings do not work
* CALCITE-6522 MAP_KEYS and MAP_VALUES function should throw if a key
value is null
* CALCITE-6020 SqlToRelConverter should not replace windowed SUM with
equivalent expression using SUM0
* CALCITE-4512 GROUP BY expression with argument name same with SELECT
field and alias causes validation error

Category 2. The following cases have a draft PR. Is it ready for 1.38?
* CALCITE-6431 Implement the SINGLE_VALUE aggregation in
HiveSqlDialect And SparkSQLDialect
* CALCITE-6300 Function MAP_VALUES/MAP_KEYS gives exception when
mapVauleType and mapKeyType not equals map Biggest mapKeytype or
mapValueType

Category 3. Need to decide whether to accept or close:
* CALCITE-6207 Improve connection resource in JDBCUtils
* CALCITE-6411 Support Collect in ToLogicalConverter
* CALCITE-6352 The map_contains_key function may return true when the
key and mapkeytype types are different.

Category 4. The following seem to have no PR. Are there any worth
holding the train for?
* CALCITE-6277 About the processing of +infinity parameters and
-infinity parameters in math function
* CALCITE-6203 Avg operator test in SparkAdapterTest
* CALCITE-6326 Add LOG10 and LN function (enabled in Mysql, Spark library)
* CALCITE-6259 The implementation of the Log library operator does not
match the actual dialect behavior.
* CALCITE-5541 Upgrade JavaCC
* CALCITE-6469 Join on condition generates wrong plan when the
condition is IN sub-query
* CALCITE-6463 alias generation issue about correlated subquery handling

Category 5. Are there any more cases with PR ready (i.e. potential category 1)?

Julian

[1] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CALCITE%20AND%20fixVersion%20%3D%201.38.0%20and%20resolution%20%3D%20Unresolved%20

On Fri, Sep 13, 2024 at 6:20 PM Xiong Duan  wrote:
>
> +1 We start voting in about ten days. We will merge the almost-ready PR.
>
> Cancai Cai  于2024年9月14日周六 08:55写道:
> >
> > +1,There are still many valuable PRs in Calcite. We can try to merge them 
> > before 1.38.
> >
> > [1] https://github.com/apache/calcite/pulls
> >
> > Best wishes,
> > Cancai Cai
> >
> > > 2024年9月14日 04:04,Julian Hyde  写道:
> > >
> > > +1 we’re ready for a release, and I’m happy to be RM.
> > >
> > > I think we could have a first RC and vote in about ten days. What do 
> > > others feel about timing?
> > >
> > > Julian
> > >
> > >
> > >> On Sep 13, 2024, at 4:58 AM, Benchao Li  wrote:
> > >>
> > >> Last release date is 2024-05-06, sorry for the typo.
> > >>
> > >> Benchao Li  于2024年9月13日周五 19:52写道:
> > >>>
> > >>> It's been a bit more than 4 months since our last release (2023-05-06)
> > >>> [1] and there are currently almost 150 new commits in main branch. Per 
> > >>> our
> > >>> tradition of producing one release every 2-3 months, I think it's time
> > >>> to consider for next release now.
> > >>>
> > >>> According to [2], the following release managers would be:
> > >>> - 1.38.0 Julian Hyde
> > >>> - 1.39.0 Stamatis Zampetakis (or maybe 1.40.0 depending on the timing)
> > >>> - 1.40.0 Hongyu Guo
> > >>> - 1.41.0 Jing Zhang (she said 1.39.0 in the ML, but according to the
> > >>> order of the email, the next one should be 1.41.0)
> > >>>
> > >>> [1] https://calcite.apache.org/news/2024/05/06/release-1.37.0/
> > >>> [2] https://lists.apache.org/thread/kx56c4ow50g50hx2y8ybp631jmtnwnj6
> > >>>
> > >>> --
> > >>>
> > >>> Best,
> > >>> Benchao Li
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> Best,
> > >> Benchao Li
> > >
> >


[jira] [Created] (CALCITE-6582) Release Calcite 1.38.0

2024-09-15 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6582:


 Summary: Release Calcite 1.38.0
 Key: CALCITE-6582
 URL: https://issues.apache.org/jira/browse/CALCITE-6582
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde
 Fix For: 1.38.0


Release Calcite 1.38.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Vote format

2024-09-13 Thread Julian Hyde
There is a discussion in the Incubator list [1] that proposes a more
structured format for the line that people write in their email
message when they vote.

The problem is that some people use multiple names, email accounts,
github accounts, and that tallying votes requires some effort.

The proposed solution is that people would use a structured line when
they vote, consisting of their vote (-1, 0, +1), their ASF id if they
have one, and their role (committer, PMC member, community member).
For example, my vote would be

  +1 jhyde (PMC)

In my opinion, Calcite should adopt this. The email starting a vote
would recommend that people use that format. Votes would still be
counted if people use a different format.

What do you think?

Julian

[1] https://lists.apache.org/thread/xkfrnt8f733hhokmh0lvyhsn50x4qfw0


Re: [DISCUSS] Towards Calcite 1.38.0

2024-09-13 Thread Julian Hyde
+1 we’re ready for a release, and I’m happy to be RM.

I think we could have a first RC and vote in about ten days. What do others 
feel about timing?

Julian


> On Sep 13, 2024, at 4:58 AM, Benchao Li  wrote:
> 
> Last release date is 2024-05-06, sorry for the typo.
> 
> Benchao Li  于2024年9月13日周五 19:52写道:
>> 
>> It's been a bit more than 4 months since our last release (2023-05-06)
>> [1] and there are currently almost 150 new commits in main branch. Per our
>> tradition of producing one release every 2-3 months, I think it's time
>> to consider for next release now.
>> 
>> According to [2], the following release managers would be:
>> - 1.38.0 Julian Hyde
>> - 1.39.0 Stamatis Zampetakis (or maybe 1.40.0 depending on the timing)
>> - 1.40.0 Hongyu Guo
>> - 1.41.0 Jing Zhang (she said 1.39.0 in the ML, but according to the
>> order of the email, the next one should be 1.41.0)
>> 
>> [1] https://calcite.apache.org/news/2024/05/06/release-1.37.0/
>> [2] https://lists.apache.org/thread/kx56c4ow50g50hx2y8ybp631jmtnwnj6
>> 
>> --
>> 
>> Best,
>> Benchao Li
> 
> 
> 
> -- 
> 
> Best,
> Benchao Li



[jira] [Created] (CALCITE-6580) Remove Locale.setDefault

2024-09-12 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6580:


 Summary: Remove Locale.setDefault
 Key: CALCITE-6580
 URL: https://issues.apache.org/jira/browse/CALCITE-6580
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


Remove all calls to {{Locale.setDefault}} in tests and production code.

That function does not operate on the current thread; it affects all threads in 
the JVM. As such, it may affect other tests running at the same time, and 
affect other statements running at the same time.

I hope, and believe, that the production code does not depend on 
{{Locale.getDefault}}. But let's make sure by removing {{setDefault}} from all 
tests.

Add {{Locale.setDefault}} to 
[forbidden-apis|https://github.com/apache/calcite/blob/main/src/main/config/forbidden-apis/signatures.txt].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Some doubts about date functions

2024-09-12 Thread Julian Hyde
Yes and no. We do use integer arithmetic, but only at a particular
level of abstraction.

If DATE and TIMESTAMP functions are implemented in Java then we
represent them as Java int and long values and use integer operations
such as + and -.

But there are also higher level rewrites that stay within the SQL type
system. As far as I know, when we stay in the SQL type system we never
equate DATE and TIMESTAMP to SQL INTEGER and BIGINT types. So, we
would not use SqlStdOperatorTable.PLUS on DATE and TIMESTAMP values.


On Thu, Sep 12, 2024 at 8:28 AM Cancai Cai  wrote:
>
> Thank you Julian for your reply.
>
> I have been thinking about this seriously recently and I have another 
> question. The current date function calculation of calcite convert the 
> parameters, such as date or timestamp, convert to int or long types, finally 
> use the logic of operators such as SqlStdOperatorTable.PLUS to perform the 
> calculation?
>
> Is my guess correct?
>
> Best wishes,
> Cancai Cai
>
> > 2024年8月27日 05:11,Norman Jordan  写道:
> >
> > Looking into this further, there is this convertlet that will convert a 
> > TIMESTAMP_ADD to a DATETIME_PLUS.
> > https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql2rel/StandardConvertletTable.java#L2132
> >
> > As part of it, it will convert the amount to add to a unit for 
> > DATETIME_PLUS. For example, hours are converted to milliseconds.
> >
> > From there it looks like it will convert this to a "+" operation.
> > https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L3718
> >
> > I didn't see where it adds the two numbers together though.
> > 
> > From: Cancai Cai 
> > Sent: Monday, August 26, 2024 8:37 AM
> > To: dev@calcite.apache.org 
> > Subject: Re: Some doubts about date functions
> >
> > Thanks for your reply.
> >
> > Yes, I also located here, and I finally debugged to 
> > https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L3802
> >  but I don’t know how to continue.
> >
> > I find it hard to believe that 
> > https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/util/BuiltInMethod.java#L517
> >  is the processing of the data_add function, because its name is add_months 
> > not date_add
> >
> >> 2024年8月26日 23:19,Norman Jordan  写道:
> >>
> >> this leads to here:
> >> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/util/BuiltInMethod.java#L517
> >> [https://opengraph.githubassets.com/6af4884a9b80dd1ac5b81e610caea8c9e34bcc847d0e172706a503b58f4f1557/apache/calcite]
> >> calcite/core/src/main/java/org/apache/calcite/util/BuiltInMethod.java at 
> >> main ・
> >
> > Warning: The sender of this message could not be validated and may not be 
> > the actual sender.
>


Re: Nullable objects with non nullable fields not being processed as expected.

2024-09-11 Thread Julian Hyde
Well done finding that thread. 

First step, before we jump into a discussion, please log a jira case. (The 
original thread made great progress analyzing the problem, and Ruben asked the 
OP to log a case, but it never happened.)


> On Sep 11, 2024, at 5:03 AM, David Radley  wrote:
> 
> I notice https://lists.apache.org/thread/s4nd9rk0fzckoctokl7kjdbtfbvxncy7 
> that talks about this  and poses some questions on how we can get round this. 
> It sounds like this is a current limitation of Calcite, is there appetite in 
> the community for this to be changed?
> 
>> On 2024/09/11 08:27:14 David Radley wrote:
>> Hi,
>> good question. Without this method, I guess the children would always be 
>> mandatory. So for nullable parent it is making all the children optional. 
>> For our case we want to be able to honour the nullable specified in the SQL 
>> for each child, so the existing approach is not sufficient,
>>     kind regards, David.
>> 
>> 
>>> On 2024/09/10 15:15:14 Julian Hyde wrote:
>>> I haven’t read the code in a while. What is fixupNullabilty trying to 
>>> achieve?
>>> 
>>>> On Sep 10, 2024, at 3:40 AM, David Radley  wrote:
>>>> 
>>>> 
>>>> Hello,
>>>> I am using flink and have a table definition like this.
>>>> CREATE TABLE source_1
>>>> (
>>>>`order_id` STRING NOT NULL,
>>>>`order_time` STRING NOT NULL,
>>>>`buyer` ROW<
>>>>`first_name`   STRING,
>>>>`last_name`  STRING NOT NULL,
>>>>`title`   STRING NOT NULL
>>>>> 
>>>> )
>>>> 
>>>> WITH (
>>>> 'connector' = 'kafka',
>>>>   'topic' = 'mytopic',
>>>>   'properties.bootstrap.servers' = 'localhost:9092',
>>>>   'value.format' = 'avro',
>>>>   'value.fields-include' = 'ALL',
>>>>  'scan.startup.mode' = 'earliest-offset'
>>>> );
>>>> 
>>>> This errors as code in Calcite does not create the correct schema. It 
>>>> `last_name`  STRING NOT NULL,  `title`   STRING NOT NULL to both be 
>>>> nullable.
>>>> 
>>>> The cause of this is in the table planner when we convert SqlDataTypeSpec 
>>>> to the RelDataType. We push the nullable of buyer onto all of its fields, 
>>>> losing the children’s nullable status .
>>>> 
>>>> In the debugger I see that it is taking the nullable true (from buyer) and 
>>>> putting this on all the children in fixupNullability.  I can see that the 
>>>> SqlDataTypeSpec has this information but it is not used.
>>>> 
>>>> In terms of fixing this, I wonder if the type could be created with the 
>>>> correct nullable so it doesn’t need to be fixed up or whether we should 
>>>> pass down the  SqlDataTypeSpec so it has the right information to set the 
>>>> nullable status. I am new with the cee base so would appreciate your 
>>>> thoughts around this – I am happy to implement the change if we get 
>>>> consensus. WDYT?
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Unless otherwise stated above:
>>>> 
>>>> IBM United Kingdom Limited
>>>> Registered in England and Wales with number 741598
>>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>>> 
>> 


Re: Breaking up SqlValidatorImpl

2024-09-10 Thread Julian Hyde
If the validator is 11,000 lines of code in one file and you make the
inner classes top-level classes, you still have 11,000 lines of code.
Moving code around doesn't inherently make it simpler. Not saying it
shouldn't be done...

I agree with your observation that the tree of namespaces and scopes
is complex and can be separated from the other validation logic. That
tree is built as the first step of validation (before the real
validation starts), and is immutable.

If I recall correctly, some of those methods to create namespaces and
scopes have 11 arguments because that allows us to make those classes
immutable. An immutable tree is extremely valuable. We must keep that.

Regarding the handling of correlated variables. It's legacy code
(decorrelation better handled, in many or all cases, by the
decorrelation rules), the map is a mess (see immutability, above). But
you should not touch any of that logic while refactoring the
validator. You will break stuff. I would much prefer that we fully
obsolete the old decorrelation logic.

Regarding how correlating variables are represented in RelNodes. It
seems OK to me. I am prepared to be persuaded. But it has absolutely
nothing to do with the validator, and should be a separate
conversation thread.

Julian


On Fri, Aug 23, 2024 at 5:42 PM Mihai Budiu  wrote:
>
> One thing I find unpleasant about the current validator implementation is 
> that it leaves many implicit casts implicit. It would be much simpler for 
> many subsequent passes if the validator would insert explicit casts where 
> casts are needed (e.g., when adding a varchar and an int), and it would catch 
> many bugs earlier. Subsequent rewrite passes also have to make assumptions 
> about the implicit casts are that are missing, and they may make different 
> assumptions from the validator. I have this problem all the time in the code 
> generator.
>
> But this is somewhat independent on your current proposal. I hope that 
> someday this could be fixed too.
>
> Mihai
> 
> From: James Starr 
> Sent: Friday, August 23, 2024 2:40 PM
> To: dev@calcite.apache.org 
> Subject: Re: Breaking up SqlValidatorImpl
>
> I would break up the validator in 3 PRs with no functionality change:
> extract SqlQueryScopes and NamespaceBuilder, extract the
> SqlQueryScopePopulator, and final PR for extracting SqlNodeValidator.  Then
> I would apply functionalchanges on top.
>
> On Fri, Aug 23, 2024 at 2:02 PM Mihai Budiu  wrote:
>
> > So do you want to refactor the code in two steps, where the first step
> > would break the validator into multiple classes and keep the exact same
> > behavior, and in a second step rework the handling of correlation variables?
> >
> > That sounds great. If you want to do both at the same time, it may be
> > harder to review.
> >
> > Mihai
> >
> > 
> > From: James Starr 
> > Sent: Friday, August 23, 2024 12:26 PM
> > To: dev@calcite.apache.org 
> > Subject: Breaking up SqlValidatorImpl
> >
> > I am working on rebasing my for pull request (
> > https://github.com/apache/calcite/pull/3042), for CALCITE-5418(
> > https://issues.apache.org/jira/browse/CALCITE-5418) which stores the
> > correlated variable ids subquery instead of the RelNodes(join, project,
> > filter).  This change was briefly discussed here(
> > https://lists.apache.org/thread/fvf35njbddcck9hwcqnnxowl49n4nkth).
> >
> > Storing the correlated IDs directly on the subquery provides several
> > benefits:
> >
> >- The RelNode interface would no longer need to expose correlated IDs.
> >- There are several existing bugs with how correlated IDs are populated
> >in SqlToRelConverter, particularly for Project and Join operations.
> >- Rules and other rewrite logic for Project, Filter, and Join would no
> >longer need to propagate correlated IDs.
> >- Various edge cases could be removed from SqlToRelConverter.
> >- The code handling correlated variables would be localized to
> >subqueries and lateral joins, instead of being spread across the entire
> >codebase whereever expressions can occur.
> >
> >
> > However, since I started this work, there have been several changes to
> > SqlValidatorImpl. Although my initial changes to SqlValidatorImpl were not
> > overly complex, they were time-consuming due to the unwieldy nature of
> > SqlValidatorImpl. Specifically, I’ve been working on populating a map of
> > lateral joins to linked lists of lateral scopes.
> >
> > SqlValidatorImpl currently performs three main functions:
> > 1.  Building a set of data structures for resolving scopes.
> > 2.  Validates the query and resolving types.  This requires scope data
> > structure generated in 1.
> > 3.  Exposes the scoped and type data structure to SqlToRelConverter.
> >
> > Currently, SqlValidatorImpl is 7,921 lines long. The first step consists of
> > 11 methods, each with up to 9 arguments, that recursively walk the tree.
> > These methods alone acco

Re: QUESTION: SUBSTRING implementation

2024-09-06 Thread Julian Hyde
We intentionally separate validation from sql-to-rel. That has many benefits 
(such as discouraging us from mutating the AST while validating, or thinking 
about RelNodes and RexNodes), but one problem is that we end up doing some 
things twice.

OperandTypeChecker (or something like it) needs to be able to say ‘it’s ok to 
pass a decimal(5, 2) value to an integer argument, because we can do an 
implicit conversion’. Validation will succeed, and at sql-to-rel time we should 
recover (or regenerate) the verdict that an implicit conversion is needed.

Do we need to make a note of which conversion? I don’t know. Maybe it’s enough 
to know that the argument and parameter type are different, because maybe for a 
given source-target type pair only one conversion is possible.

I have noticed a profusion of overloaded methods in SqlFunctions.java (e.g. log 
has 4 overloads - one for each  combination of double and BigDecimal, and the 
BigDecimal variants ultimately just convert the BigDecimal to a double) and 
maybe we can start to reduce that profusion.
 
Julian


> On Sep 6, 2024, at 1:11 PM, Mihai Budiu  wrote:
> 
> The problem is that I don't know how to fix this. Would be glad to do the 
> work if someone helped design a way to do it.
> 
> Turns out that the OperandTypeChecker can mutate the RexNodes it receives as 
> arguments, but I am not sure that this is "safe", because it looks like the 
> API has been designed with immutability in mind. For example, these 
> combinators that can combine multiple type-checkers seem to assume 
> immutability.
> 
> Mihai
> 
> From: Julian Hyde 
> Sent: Friday, September 6, 2024 12:59 PM
> To: dev@calcite.apache.org 
> Subject: Re: QUESTION: SUBSTRING implementation
> 
> I totally agree.
> 
> Julian
> 
>> On Sep 6, 2024, at 12:57, Mihai Budiu  wrote:
>> 
>> A problem with the way Calcite handles such functions is that the 
>> type-checker and the runtime are separate and do not communicate with each 
>> other. In the type checking code that people usually write for a function 
>> they write an OperandTypeChecker, which makes some assumptions about the 
>> range of implicit casts that the function allows. For example, for SUBSTRING 
>> such a type checker would allow any numeric types, or even string types, 
>> with the assmption that they would be converted to integer. As a result of 
>> the type-checking, the function is accepted, but the implicit casts that 
>> were inferred are not inserted as explicit casts to the arguments of the 
>> RexCall.
>> 
>> So when the runtime has to implement the function it has no idea about the 
>> kinds of arguments that may be expected, and what it's supposed to do with 
>> them. The runtime has to handle many more cases, e.g,, INTEGER, TINYINT, 
>> DECIMAL, STRING. This makes it difficult to keep the runtime in sync with 
>> the type-checker as people implement new versions of the many functions from 
>> different dialects. Moreover, the runtime may incorrectly convert such 
>> values. For example, if you call decimal.intValue() you are not checking for 
>> overflow and not rounding in the same way a CAST(decimal AS INTEGER) may do.
>> 
>> Ideally the type-checker for functions would be allowed to insert implicit 
>> casts, this would simplify a lot the task of both generating code in the 
>> runtime, and of implementing the runtime itself.
>> 
>> For substring the type-checker could insert a cast from DECIMAL to INTEGER. 
>> The runtime now only has to worry about INTEGER values.
>> 
>> Mihai
>> 
>> From: Julian Hyde 
>> Sent: Friday, September 6, 2024 12:44 PM
>> To: dev@calcite.apache.org 
>> Subject: Re: QUESTION: SUBSTRING implementation
>> 
>> The trouble is, we are going beyond the standard. The standard says
>> 
>> The declared type of a , ,
>> , or 
>> shall be exact numeric with scale 0 (zero).
>> 
>> People often misunderstand the “shall” in standards. It is talking about the 
>> SQL query, not the implementation. A standards-compliant query will only 
>> have integer values of , and compliant implementation (such 
>> as Calcite) must handle such queries.
>> 
>> The standard doesn’t say anything about what should happen if the query is 
>> not standards-compliant, viz  is not an integer. In 
>> particular, it doesn’t say we should fail when we receive such queries. But 
>> we can do anything, like return -77 or rounding the integer up, or rounding 
>> it down, and still be compliant.
>> 
>> What we have decided to do in Calcite is to apply our usual rul

Re: QUESTION: SUBSTRING implementation

2024-09-06 Thread Julian Hyde
I totally agree. 

Julian

> On Sep 6, 2024, at 12:57, Mihai Budiu  wrote:
> 
> A problem with the way Calcite handles such functions is that the 
> type-checker and the runtime are separate and do not communicate with each 
> other. In the type checking code that people usually write for a function 
> they write an OperandTypeChecker, which makes some assumptions about the 
> range of implicit casts that the function allows. For example, for SUBSTRING 
> such a type checker would allow any numeric types, or even string types, with 
> the assmption that they would be converted to integer. As a result of the 
> type-checking, the function is accepted, but the implicit casts that were 
> inferred are not inserted as explicit casts to the arguments of the RexCall.
> 
> So when the runtime has to implement the function it has no idea about the 
> kinds of arguments that may be expected, and what it's supposed to do with 
> them. The runtime has to handle many more cases, e.g,, INTEGER, TINYINT, 
> DECIMAL, STRING. This makes it difficult to keep the runtime in sync with the 
> type-checker as people implement new versions of the many functions from 
> different dialects. Moreover, the runtime may incorrectly convert such 
> values. For example, if you call decimal.intValue() you are not checking for 
> overflow and not rounding in the same way a CAST(decimal AS INTEGER) may do.
> 
> Ideally the type-checker for functions would be allowed to insert implicit 
> casts, this would simplify a lot the task of both generating code in the 
> runtime, and of implementing the runtime itself.
> 
> For substring the type-checker could insert a cast from DECIMAL to INTEGER. 
> The runtime now only has to worry about INTEGER values.
> 
> Mihai
> 
> From: Julian Hyde 
> Sent: Friday, September 6, 2024 12:44 PM
> To: dev@calcite.apache.org 
> Subject: Re: QUESTION: SUBSTRING implementation
> 
> The trouble is, we are going beyond the standard. The standard says
> 
>  The declared type of a , ,
>  , or 
>  shall be exact numeric with scale 0 (zero).
> 
> People often misunderstand the “shall” in standards. It is talking about the 
> SQL query, not the implementation. A standards-compliant query will only have 
> integer values of , and compliant implementation (such as 
> Calcite) must handle such queries.
> 
> The standard doesn’t say anything about what should happen if the query is 
> not standards-compliant, viz  is not an integer. In 
> particular, it doesn’t say we should fail when we receive such queries. But 
> we can do anything, like return -77 or rounding the integer up, or rounding 
> it down, and still be compliant.
> 
> What we have decided to do in Calcite is to apply our usual rules for 
> converting decimal arguments to integers. I am saying let’s define those 
> ‘usual rules’, and apply it to all functions, whether they are in the SQL 
> standard, in a library (such as Postgres or BigQuery) or user-defined 
> functions.
> 
> 
> Julian
> 
> 
> 
>> On Sep 6, 2024, at 1:02 AM, stanilovsky evgeny  
>> wrote:
>> 
>> I understand you are talking about, but ... LEFT is just extension under 
>> standard functions collection and DB implementors are free here, they can 
>> write restrictions in documentation [1] or treat fractional types as integer 
>> implicitly. I don`t think we need to change (LEFT, RIGHT and so on string 
>> functions), they already restricted in documentation, while standard 
>> (substring, overlay) need to be improved.
>> 
>> [1] 
>> https://learn.microsoft.com/en-us/sql/t-sql/functions/left-transact-sql?view=sql-server-ver16
>> 
>>> On Thu, 05 Sep 2024 21:07:27 +0300, Julian Hyde  wrote:
>>> 
>>> Thanks for taking the time to look up the SQL standard. But
>>> unfortunately the SQL standard isn't much use here, because it doesn't
>>> say much about implicit conversions - it leaves that to the
>>> implementor, such as Calcite or Postgres.
>>> 
>>> Calcite has its own rules for implicit conversions of arguments. It
>>> also has lots of functions that take numeric and integer arguments.
>>> Calcite's users expect that the conversion rules are consistent across
>>> all functions. Picking one at random, LEFT(string, length) is enabled
>>> in several libraries including Postgres. You should implement the
>>> conversion for SUBSTRING in such a way that LEFT gets it for free.
>>> 
>>> On Thu, Sep 5, 2024 at 7:20 AM stanilovsky evgeny
>>>  wrote:
>>>> 
>>>> Thank all for reply, in section : 6.18 
>>>> I see only

Re: QUESTION: SUBSTRING implementation

2024-09-05 Thread Julian Hyde
Thanks for taking the time to look up the SQL standard. But
unfortunately the SQL standard isn't much use here, because it doesn't
say much about implicit conversions - it leaves that to the
implementor, such as Calcite or Postgres.

Calcite has its own rules for implicit conversions of arguments. It
also has lots of functions that take numeric and integer arguments.
Calcite's users expect that the conversion rules are consistent across
all functions. Picking one at random, LEFT(string, length) is enabled
in several libraries including Postgres. You should implement the
conversion for SUBSTRING in such a way that LEFT gets it for free.

On Thu, Sep 5, 2024 at 7:20 AM stanilovsky evgeny
 wrote:
>
> Thank all for reply, in section : 6.18 
> I see only OVERLAY function that not corresponds here too:
>
>  ::=
> OVERLAY   PLACING  expression>
>  FROM  [ FOR  ]
> [ USING  ] 
>
>  ::= 
>  ::= 
>
> or Julian you are talking not only about this kind of functions ?
> In such a case it consumes a lot of time for such a check.
>
> Why we can`t move here sequentially ? Just fix operand types ? What
> generic approach you are talking about ?
> SqlSingleOperandTypeChecker STRING_INTEGER is used only in : SqlFunction
> REPEAT
> SqlSingleOperandTypeChecker STRING_INTEGER_INTEGER only in
> SqlSubstringFunction
> STRING_STRING_INTEGER, STRING_STRING_INTEGER_INTEGER only in
> SqlOverlayFunction
>
>
> > Is the desired behavior specific to the SUBSTRING function? Or should it
> > be generic, for all functions that have an argument of type INTEGER?
> >
> > If it’s generic, can you give some other functions as examples where we
> > would want this behavior.
> >
> > Also, if it’s generic, the code should probably not be part of the
> > SUBSTRING function.
> >
> >
> >> On Sep 4, 2024, at 8:31 AM, Norman Jordan
> >>  wrote:
> >>
> >> This makes sense. Running a quick test with MySQL, I can see that
> >> decimal values do not give an error. It appears that MySQL will round a
> >> decimal value to the nearest integer value.
> >> 
> >> From: stanilovsky evgeny 
> >> Sent: Wednesday, September 4, 2024 5:17 AM
> >> To: dev@calcite.apache.org 
> >> Subject: QUESTION: SUBSTRING implementation
> >>
> >> Hi all, i want to discuss current SUBSTRING func implementation.
> >>
> >> Lets take a standard and found:
> >>  ::=
> >> SUBSTRING   FROM  >> position>
> >> [ FOR  ] 
> >>
> >> and further :  ::= 
> >>
> >> thus it not restrict  for only integer types
> >>
> >> Calcite documentation says:
> >> SUBSTRING(string FROM integer FOR integer) (we see restrictions here)
> >>
> >> Lets dig deeper:
> >> Calcite implementation operands checker not restrict operands too :
> >> 1. OperandTypes.STRING_NUMERIC - (1 param: substring ('asd', 2)) (not
> >> restricted params)
> >> 2. OperandTypes.STRING_INTEGER_INTEGER - (2 params: substring ('asd', 2,
> >> 3)) (only integer)
> >>
> >> So if i call "SELECT SUBSTRING('asd', 1.2)" runtime exception will
> >> occur:
> >> java.lang.RuntimeException: while resolving method 'substring[class
> >> java.lang.String, class java.math.BigDecimal]' in class class
> >> org.apache.calcite.runtime.SqlFunctions
> >>>  at
> >>> org.apache.calcite.adapter.enumerable.EnumUtils.call(EnumUtils.java:770)
> >>>  at
> >>> org.apache.calcite.adapter.enumerable.RexImpTable$MethodImplementor.call(RexImpTable.java:2866)
> >>>  at
> >>> org.apache.calcite.adapter.enumerable.RexImpTable$MethodImplementor.implementSafe(RexImpTable.java:2847)
> >>
> >> So i appeal to align (1 and 2 operands checker implementation, so for 2
> >> operands it need: STRING_NUMERIC_NUMERIC) and append appropriate
> >> implementation (with will cut off fractional numeric part) into
> >> SqlFunctions.
> >>
> >> wdyt ? if there will be no objections i will fill an issue.
> >>
> >> thanks !
> >> Warning: The sender of this message could not be validated and may not
> >> be the actual sender.


[jira] [Created] (CALCITE-6564) Support queries with measures on top of a VALUES relation

2024-09-05 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6564:


 Summary: Support queries with measures on top of a VALUES relation
 Key: CALCITE-6564
 URL: https://issues.apache.org/jira/browse/CALCITE-6564
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


A query with measures, where the underlying data set is a VALUES relational 
literal rather than a table, cannot be planned. Same test case as CALCITE-6559:
{noformat}
!set outputformat mysql
!use scott

WITH tbl_with_null_dim AS (
SELECT e.deptno, e.grade, AVG(e.grade) AS MEASURE avg_grade
FROM (VALUES (1, 70),
 (1, 50),
 (NULL, 50),
 ( 3, 82)) AS e (deptno, grade))
SELECT deptno, avg_grade FROM tbl_with_null_dim
GROUP BY deptno;
++---+
| DEPTNO | AVG_GRADE |
++---+
|  1 |60 |
|  3 |82 |
||50 |
++---+
(3 rows)

!ok
{noformat}
but different error message now that case has been fixed:
{noformat}
java.sql.SQLException: Error while executing SQL "WITH tbl_with_null_dim ...": 
There are not enough rules to produce a node with desired properties: 
convention=ENUMERABLE, sort=[].
Missing conversion is LogicalAggregate[convention: NONE -> ENUMERABLE]
There is 1 empty subset: rel#2488:RelSubset#2.ENUMERABLE.[], the relevant part 
of the original plan is as follows
2456:LogicalAggregate(group=[{0}], agg#0=[AGG_M2M($1)])
  2454:LogicalProject(subset=[rel#2455:RelSubset#1.NONE.[]], DEPTNO=[$0], 
AVG_GRADE=[V2M(CAST(/(SUM($1), COUNT($1))):INTEGER NOT NULL)])
2348:LogicalValues(subset=[rel#2453:RelSubset#0.NONE.[]], tuples=[[{ 1, 70 
}, { 1, 50 }, { null, 50 }, { 3, 82 }]])
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [Question] Improving Performance of SqlToRelConverter for Large Queries?

2024-09-04 Thread Julian Hyde
Like many performance issues, you are probably running into this because you 
have set  to a higher value than anyone has previously done. Quite 
often there is a straightforward solution to these problems. Create a test that 
varies on , set  to the smallest value where 
performance becomes noticeably bad, identify the hot-spot (often something 
simple like a cartesian loop or hash-key collision), fix the hot-spot, check in 
the fix and test with a moderately high value of .

In your case,  seems to be the number of CTEs. You should log a 
jira case with a simple, repetitive query that has N CTEs and noticeably bad 
performance.

> On Sep 1, 2024, at 11:22 PM, JinxTheKid  wrote:
> 
> Hi community,
> 
> I have a unique use case where I have very large queries that have lots of
> interconnected CTEs (90KB+, 100+ CTEs). When I run some of these queries
> through Calcite, I end up with very long compile times, anywhere from 8s to
> 40s. I cannot share the queries, but for context manually optimizing the
> query is not really an option at the moment even though it would likely
> solve the problems I'm encountering.
> 
> I've narrowed down the culprit of the slow compile times to
> *SqlToRelConverter's *method convertQueryRecursive*. *For large
> interconnected CTEs, this class ends up executing convertQueryRecursive many
> times over the course of converting a query. This scenario sounded like a
> great use case for memoization to improve the conversion performance, but I
> found that adapting this class was challenging. There is some internal
> state of  *SqlToRelConverter's *that does not allow me to simply memoize
> convertQueryRecursive unfortunately. It appears correlated variables are
> (one of) the issue but I'm not certain. Has anyone else in the community
> run into similar issues, and if so what did you do to address this? Is this
> an area the community has looked into?
> 
> Thanks,
> Logan



Re: Some doubts about date functions

2024-08-31 Thread Julian Hyde
If the functions you implement work within the supported range of Calcites DATE 
type, that is sufficient. 

Changing the range of the DATE type would be an orthogonal change. I’m not sure 
if anyone is asking for that. 

> On Aug 28, 2024, at 8:09 AM, Cancai Cai  wrote:
> 
> I tried many ways to write it. As Julian said, Maybe BigQuery's DATE data 
> type has a different range than Calcite's. I am not sure if I can fix this 
> problem.
> 
> If I adapt some other date functions of Spark in the future, do I need to fix 
> this problem first? For example, 
> https://issues.apache.org/jira/browse/CALCITE-6527
> 
> Best wishes,
> Cancai Cai
> 
>> 2024年8月27日 05:11,Norman Jordan  写道:
>> 
>> Re https://issues.apache.org/jira/browse/CALCITE-6544. Is it possible that 
>> BigQuery’s DATE_ADD can return BC dates because BigQuery’s DATE data type 
>> has a different range of allowable values than Calcite’s? In other words, 
>> it’s not a difference with the DATE_ADD function.
> 


[jira] [Created] (CALCITE-6555) RelBuilder.aggregateRex thinks aggregate functions of "GROUP BY ()" queries are NOT NULL

2024-08-29 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6555:


 Summary: RelBuilder.aggregateRex thinks aggregate functions of 
"GROUP BY ()" queries are NOT NULL
 Key: CALCITE-6555
 URL: https://issues.apache.org/jira/browse/CALCITE-6555
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


In {{RelBuilder}}, the {{aggregateRex}} method (added in CALCITE-5802) thinks 
that aggregate functions in a {{GROUP BY ()}} queries are NOT NULL. Consider 
the query

{code}
SELECT SUM(empno) AS s, COUNT(empno) AS c
FROM emp
GROUP BY ()
{code}

{{SUM(empno)}} is nullable, even though {{empno}} has type {{SMALLINT NOT 
NULL}}, because {{GROUP BY ()}} will return one row even if {{emp}} has no 
rows, and therefore {{SUM}} will be evaluated over the empty set. A test 
({{RelBuilderTest.testAggregateRex4}}) that attempts to build an equivalent 
query gets the following error stack:

{noformat}
java.lang.AssertionError: type mismatch:
ref:
SMALLINT NOT NULL
input:
SMALLINT
at org.apache.calcite.util.Litmus.lambda$static$0(Litmus.java:31)
at org.apache.calcite.plan.RelOptUtil.eq(RelOptUtil.java:2217)
at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:130)
at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:62)
at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:125)
at org.apache.calcite.rel.core.Project.isValid(Project.java:262)
at org.apache.calcite.rel.core.Project.(Project.java:107)
at 
org.apache.calcite.rel.logical.LogicalProject.(LogicalProject.java:75)
at 
org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:168)
at 
org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:144)
at 
org.apache.calcite.rel.core.RelFactories$ProjectFactoryImpl.createProject(RelFactories.java:208)
at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:2135)
at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1910)
at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1893)
at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1865)
at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1854)
at 
org.apache.calcite.tools.RelBuilder.aggregateRex(RelBuilder.java:2639)
at 
org.apache.calcite.tools.RelBuilder.aggregateRex(RelBuilder.java:2626)
at 
org.apache.calcite.test.RelBuilderTest.lambda$testAggregateRex4$96(RelBuilderTest.java:3409)
at 
org.apache.calcite.test.RelBuilderTest.testAggregateRex4(RelBuilderTest.java:3419)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)

[jira] [Created] (CALCITE-6553) In server, several CREATE and DROP commands are incorrect if schema does not exist

2024-08-28 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6553:


 Summary: In server, several CREATE and DROP commands are incorrect 
if schema does not exist
 Key: CALCITE-6553
 URL: https://issues.apache.org/jira/browse/CALCITE-6553
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


In server, several CREATE and DROP commands are incorrect if schema does not 
exist. 

CALCITE-4535 started this work, but missed some edge cases. In CALCITE-6552 I 
made the {{schema}} method return a nullable {{CalciteSchema}} (null if the 
schema of the object to be created or dropped does not exist) and saw several 
places that were not correctly handling a null value. I added a TODO in 
{{ServerDdlExecutor.java}}, and a fix for this case would remove those TODOs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6552) Enable checkerframework in 'server' module

2024-08-28 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6552:


 Summary: Enable checkerframework in 'server' module
 Key: CALCITE-6552
 URL: https://issues.apache.org/jira/browse/CALCITE-6552
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


In CALCITE-4199 we enabled CheckerFramework (via annotations and CI jobs) in 
the 'core' and 'linq4j' modules; this change further enables CheckerFramework 
in the 'server' module, and fixes all violations.

There are also a large number of 'cosmetic' modifications to improve code 
quality without changing behavior, including:
 * Replace '{{{}this.x = x; assert x != null;{}}}' by '{{{}this.x = 
requireNonNull(x);{}}}' in constructors
 * Replace {{assert}} in other code locations where it implements an invariant. 
We don't use {{requireNonNull}} because it throws {{{}NullPointerException{}}}; 
we would prefer to throw {{AssertionError}} or {{IllegalStateException}}
 * Replace {{x.equals("")}} and {{x.size() == 0}} with {{x.isEmpty()}}
 * Make fields {{final}} where possible
 * Make private methods and inner classes {{static}} where possible
 * In {{class Pair}} make the type variables {{K}} and {{V}} no longer 
{{@Nullable}} by default (you can make each of them nullable if you need)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Some doubts about date functions

2024-08-26 Thread Julian Hyde
Re https://issues.apache.org/jira/browse/CALCITE-6544. Is it possible that 
BigQuery’s DATE_ADD can return BC dates because BigQuery’s DATE data type has a 
different range of allowable values than Calcite’s? In other words, it’s not a 
difference with the DATE_ADD function.

Julian


> On Aug 26, 2024, at 8:37 AM, Cancai Cai  wrote:
> 
> Thanks for your reply.
> 
> Yes, I also located here, and I finally debugged to 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L3802
>  but I don’t know how to continue.
> 
> I find it hard to believe that 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/util/BuiltInMethod.java#L517
>  is the processing of the data_add function, because its name is add_months 
> not date_add
> 
>> 2024年8月26日 23:19,Norman Jordan  写道:
>> 
>> this leads to here:
>> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/util/BuiltInMethod.java#L517
>> [https://opengraph.githubassets.com/6af4884a9b80dd1ac5b81e610caea8c9e34bcc847d0e172706a503b58f4f1557/apache/calcite]
>> calcite/core/src/main/java/org/apache/calcite/util/BuiltInMethod.java at 
>> main ·
> 



Re: [REVIEW REQUEST]: Review Request for CALCITE-4512

2024-08-22 Thread Julian Hyde
Thanks for reviving this issue/PR. I added comments to the Jira case.  I would 
like to see more tests (especially Quidem tests) but I will push to get it into 
1.38.


> On Aug 22, 2024, at 7:41 AM, Yanjing Wang  wrote:
> 
> Hi community,
> 
> I request a review on CALCITE-4512(
> https://github.com/apache/calcite/pull/3929), which stops identifiers in a
> SQL call from being treated as aliases and expanded into select expressions.



Re: [DISCUSSION] Automatically closing stale pull requests

2024-08-22 Thread Julian Hyde
+0

I haven’t thought about the details, but it might improve our situation 
regarding pull requests. It’s a small reversible step, so I would support 
trying it. If it doesn’t help, we can change policy back again.

Julian


> On Aug 22, 2024, at 10:59 AM, Ruben Q L  wrote:
> 
> Thanks for opening the discussion, Michael.
> +1 on the idea.
> 
> 
> 
> On Thu, Aug 22, 2024 at 6:43 PM Michael Mior  wrote:
> 
>> Hi all,
>> 
>> I know the better solution here is to have more people reviewing and
>> merging PRs to keep momentum going. However, even when someone is engaged
>> in trying to help merge a PR, sometimes the original author will disappear
>> or changes become irrelevant over time. I think having a smaller number of
>> open PRs can help keep things more manageable. The goal is that regardless
>> of when the PR was opened, it should be kept open if there is still
>> interest. But PRs which have been abandoned should be closed.
>> 
>> I'm suggesting implementing (via GitHub Actions, e.g.
>> https://github.com/actions/stale) a process that will automatically close
>> PRs after some period of inactivity. This doesn't mean we lose any of the
>> work. We can also have PRs automatically be reopened if there are any
>> future comments. The idea would be that after X number of days, a comment
>> is automatically posted and a label of "stale" is applied. Then after Y
>> more days, the PR would be automatically closed. Any activity (more commits
>> on the branch or comments) will remove the stale label and reset the clock.
>> 
>> I'd propose implementing this with X=30 and Y=90. This gives four months
>> for any activity to keep a PR alive. Again, if it is closed, no work is
>> lost. But I think four months of no activity is a strong indicator that
>> nothing is likely to move forward in the near future. I will note that if
>> this policy were already in place, it would mean ~85% of our current open
>> PRs would have been closed (if there was no intervention after the initial
>> ping).
>> 
>> Here's some configuration data from a few projects which have implemented
>> this
>> 
>> Apache Age, X=60, Y=14
>> Apache Airflow, X=45, Y=5
>> Apache Beam, X=60, Y=7
>> Apache ECharts, X=730,Y=7
>> Apache Iceberg, X=30, Y=7
>> Apache Kafka, X=90, Y=-1 (never automatically close)
>> Apache Solr, X=60, Y=-1
>> Apache Spark, X=100,Y=0
>> Apache Superset, X=60, Y=7
>> 
>> --
>> Michael Mior
>> mm...@apache.org
>> 



Re: Question about overriding validation messages

2024-08-19 Thread Julian Hyde
I put the messages into .properties files so that they could be managed as 
resource bundles. In theory, you can provide a CalciteResource_fr_FR.properties 
or a CalciteResource_en_US.properties or CalciteResource_en_UK.properties and 
override particular messages for particular locales.

I’ve never tried it, but it ought to work.

Julian



> On Aug 19, 2024, at 4:25 AM, Konstantin Orlov  wrote:
> 
> Hi all,
> 
> I'm wondering if it's possible to override default validation exception
> messages in calcite.
> 
> The exhaustive enumeration of available exceptions is defined by the
> `CacliteResource` interface. Each method in this interface has an
> `@BaseMessage` annotation that defines a default message template for a
> particular exception. I seemed that all the downstream project need to do
> is to provide `CalciteResource.properties` with required overrides, but
> `calcite-core.jar` already includes properties file with given name.
> 
> Is there a way to provide overrides for default validation exception
> messages?
> 
> -- 
> Regards,
> Konstantin Orlov



[jira] [Created] (CALCITE-6538) OVER (ROWS CURRENT ROW) should return a window with one row, not all rows

2024-08-19 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6538:


 Summary: OVER (ROWS CURRENT ROW) should return a window with one 
row, not all rows
 Key: CALCITE-6538
 URL: https://issues.apache.org/jira/browse/CALCITE-6538
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


{{OVER (ROWS CURRENT ROW)}} should return a window with one row, not all rows.

For example, in the following query (in {{{}winagg.iq{}}}) each {{sum_sal}} 
value should equal the {{sal}} value (and does on Postgres), but currently 
Calcite gives {{{}8275.00{}}}:
{noformat}
# Current row
select ename, sal, sum(sal) over (rows current row) as sum_sal
from emp
where job = 'MANAGER';
+---+-+-+
| ENAME | SAL     | SUM_SAL |
+---+-+-+
| BLAKE | 2850.00 | 8275.00 |
| CLARK | 2450.00 | 8275.00 |
| JONES | 2975.00 | 8275.00 |
+---+-+-+
(3 rows)

!ok {noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6537) Add syntax to allow non-aggregated rows to be used in GROUPING SETS

2024-08-19 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6537:


 Summary: Add syntax to allow non-aggregated rows to be used in 
GROUPING SETS
 Key: CALCITE-6537
 URL: https://issues.apache.org/jira/browse/CALCITE-6537
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


{{GROUPING SETS}} is a powerful and convenient syntax that allows a query to 
return results at varying levels of aggregation - in order words, to compute 
sub-totals and grand totals - but it does not currently allow the query to 
return unaggregated rows.

This case proposes to allow {{*}} (asterisk) as a column name in the {{GROUPING 
SETS}} sub-clause of {{GROUP BY}}. It would also allow {{*}} in {{ROLLUP}} and 
{{CUBE}} sub-clauses, which are syntactic sugar for {{GROUPING SETS}}.

(We use {{*}} by analogy with {{COUNT(*)}}, which means 'include all possible 
columns, including a hypothetical row identifier, so that each row is 
considered unique', rather than by analogy with {{SELECT *}}, which means 
'expand to all non-system columns declared in the table'.)

For example,
{code}
SELECT deptno, ename, SUM(sal) AS sumSal
FROM emp
GROUP BY GROUPING SETS ((deptno), (*))
{code}
would return a row for each of the 14 employees, plus a total row for each of 
the 3 departments:
{noformat}
+++--+
| DEPTNO | ENAME  | SUM_SAL  |
+++--+
| 10 || 8750.00  |
| 10 | CLARK  | 2450.00  |
| 10 | KING   | 5000.00  |
| 10 | MILLER | 1300.00  |
| 20 || 10875.00 |
| 20 | ADAMS  | 1100.00  |
| 20 | FORD   | 3000.00  |
| 20 | JONES  | 2975.00  |
| 20 | SCOTT  | 3000.00  |
| 20 | SMITH  | 800.00   |
| 30 || 9400.00  |
| 30 | ALLEN  | 1600.00  |
| 30 | BLAKE  | 2850.00  |
| 30 | JAMES  | 950.00   |
| 30 | MARTIN | 1250.00  |
| 30 | TURNER | 1500.00  |
| 30 | WARD   | 1250.00  |
+++--+
17 rows selected (0.005 seconds)
{code}

Equivalent queries would be
{code}
SELECT deptno, ename, SUM(sal) AS sumSal
FROM emp
GROUP BY ROLLUP (deptno, *);
{code}
and
{code}
SELECT deptno, null AS ename, SUM(sal) AS sumSal
FROM emp
GROUP BY deptno
UNION ALL
SELECT deptno, ename, sal
FROM emp;
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6525) Query with one-to-many join of measure to regular table

2024-08-12 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6525:


 Summary: Query with one-to-many join of measure to regular table
 Key: CALCITE-6525
 URL: https://issues.apache.org/jira/browse/CALCITE-6525
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


The following query has one-to-many join of a table with a measure ({{deptm}}) 
to regular table ({{emp}}):
{code}
with deptm as
  (select deptno, dname, avg(char_length(dname)) as measure m
   from dept)
select aggregate(m) as m
from deptm join emp using (deptno);
{code}

The result should preserve the grain of the measure, that is, include each 
department once in the {{AVG}}, not give more weight to departments with more 
employees. To that end, the expanded query should include something like 
{{avg(char_length(d.dname)) within distinct (d.deptno)}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Support for Rewriting Count Distinct in Group By Columns for Materialized Views in Calcite

2024-08-08 Thread Julian Hyde
I certainly agree that that query should use that MV. If it does not currently, 
please log a jira case.

Since c2 is one of the GROUP BY columns in the query, I think it would be also 
be valid (and probably more efficient) to rewrite to

  select case when c2 is null then 0 else 1 end from test_mv group by c2, c3

Julian



> On Aug 6, 2024, at 11:13 PM, Da Dash  wrote:
> 
> Hello everyone, recently I have been working on materialized views using 
> Calcite, and in our use case, there are a lot of queries involving 
> CountDistinct.
> And generally, to support rewriting for Count Distinct,  we will always use 
> bitmap. However, recently, I have developed a new capability in Calcite that 
> allows rewriting of Count Distinct queries to read from the materialized view 
> table without the need for bitmap, as long as the Count Distinct is querying 
> the group by columns of the materialized view.
> 
> For example, let's assume we have the following materialized view: 
> 
> ```sql
> CREATE MATERIALIZED VIEW test_mv AS
> SELECT
>c1, c2, c3, sum(c4)
> FROM
>   t
> GROUP BY
>c1, c2, c3
> ```
> 
> After the materialized view created, the following query arrives:
> 
> ```sql
> 
> select count(distinct c2) from t group by c2, c3
> 
> ```
> 
> With the capability I've developed, the above query can be rewritten as:
> 
> ```sql
> 
> select count(distinct c2) from test_mv group by c2, c3
> 
> ```
> 
> The rewrite mentioned above, compared to calculating COUNT DISTINCT directly 
> on the original table, will significantly reduce the query time because the 
> materialized view contains a reduced amount of data.
> 
> Is anyone interested in this? I can initiate a Pull Request. :)
> 
> 
> 
> 
> 
> 
> 



[jira] [Created] (CALCITE-6519) Non-aggregate query that uses measure in ORDER BY

2024-08-06 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6519:


 Summary: Non-aggregate query that uses measure in ORDER BY
 Key: CALCITE-6519
 URL: https://issues.apache.org/jira/browse/CALCITE-6519
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


As of CALCITE-4496, a non-aggregate query can use measures in its SELECT 
clause; this change would further allow a non-aggregate query to use measures 
in its ORDER BY clause.

An example such query:
{code}
WITH empm AS
  (SELECT *, avg(sal) AS MEASURE avgSal FROM emp)
SELECT avgSal, deptno
FROM empm
ORDER BY avgSal DESC LIMIT 3;
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Misuse of the SqlKind field

2024-08-06 Thread Julian Hyde
Here’s what’s in the javadoc:

> Enumerates the possible types of SqlNode.
>
> The values are immutable, canonical constants, so you can use Kinds to find
> particular types of expressions quickly. To identity a call to a common 
> operator
> such as '=', use SqlNode.isA:
>. exp.isA(EQUALS)
>
> Only commonly-used nodes have their own type; other nodes are of type OTHER.
> Some of the values, such as SET_QUERY, represent aggregates.
>
> To quickly choose between a number of options, use a switch statement:
>
> switch (exp.getKind()) {
>  case EQUALS:
>  ...;
>  case NOT_EQUALS:
>  ...;
>  default:
>  throw new AssertionError("unexpected");
>  }
>
> Note that we do not even have to check that a SqlNode is a SqlCall.
>
> To identify a category of expressions, use SqlNode.isA with an aggregate
> SqlKind. The following expression will return true for calls to '=' and '>=‘,
> but false for the constant '5', or a call to '+’:
>
>   exp.isA(SqlKind.COMPARISON)
>
> RexNode also has a getKind method; SqlKind values are preserved
> during translation from SqlNode to RexNode, where applicable.
>
> There is no water-tight definition of "common", but that's OK. There
> will always be operators that don't have their own kind, and for
> these we use the SqlOperator. But for really the common ones, e.g.
> the many places where we are looking for AND, OR and EQUALS,
> the enum helps.
>
> (If we were using Scala, SqlOperator would be a case class, and we
> wouldn't need SqlKind. But we're not.)

I wrote most of that in 2013 and 2014 and it still holds up. As you can see, 
there is a tension in the definition of ‘common’.

On the one hand, we don’t want people to have to define a new SqlKind for every 
single operator instance. But on the other hand, rules using == or .equals on 
instances of SqlOperator seems fussy and can drag a load of dependencies into 
our library of rules. Operators have no unique identifier (remember, different 
SQL dialects can have functions with the same name that have different 
semantics).

My working definition of ‘common’ is ‘common enough to be used in a rewrite 
rule’ and that the code seems ’tidier’ if I am comparing the SqlKind rather 
than the operator instance. Subjective, but it’s got us this far. 

Julian


> On Aug 6, 2024, at 12:14 PM, Mihai Budiu  wrote:
> 
> Hello all,
> 
> I am beginning to believe that the SqlKind field is being misused in Calcite 
> by using it to denote custom function implementations (e.g. 
> SqlKind.SUBSTR_BIG_QUERY).
> 
> I have filed an issue about function name resolution when using multiple 
> libraries: https://issues.apache.org/jira/browse/CALCITE-6518. But I am 
> having difficulties solving this issue fully.
> 
> The code that does name resolution for functions (in SqlUtil.lookupRoutine) 
> expects that functions that are not standard are having the kind 
> SqlKind.OTHER_FUNCTION. But this is not true for many functions that have 
> been updated recently. If there are multiple matches for a function name, 
> SqlUtil.lookupSubjectRoutine keeps under consideration only functions with 
> the kind OTHER_FUNCTION.
> 
> If I am right, the real fix would be to remove all the newly introduced kinds.
> 
> I would appreciate a confirmation from someone who has been around long 
> enough to know what SqlKind is supposed to represent.
> 
> Alternatively, we can make the SqlKind enum have multiple fields, one of 
> which could be used to represent "OTHER_FUNCTION".
> 
> Thank you,
> Mihai



Re: S3 Build Cache

2024-08-03 Thread Julian Hyde
Let’s also examine the cost-benefit of using a build cache. Gradle is a company 
that has an interest in showcasing its enterprise features in open source 
projects. We have limited volunteer time. If the build cache is not helping us, 
we should politely say ’no’.

> On Aug 1, 2024, at 12:51 AM, Alessandro Solimando 
>  wrote:
> 
> Hi Clay,
> thanks for your analysis and for your help.
> 
> I think that the s3 bucket we used to use it's currently not available
> anymore (see CALCITE-5034
> ).
> 
> If there are apache-level caches we can use, then it would be great,
> alternatively we should disable the cache as suggested in the ticket.
> 
> Best regards,
> Alessandro
> 
> On Wed, 31 Jul 2024 at 20:56, Clay Johnson  wrote:
> 
>> Hi Calcite folks, I am a Solutions Engineer at Gradle working on build
>> optimization and the rollout of Build Scans to ge.apache.org within the
>> Apache Software Foundation.
>> 
>> I have a question regarding the usage of the S3 Build Cache, defined here
>> . Is
>> this cache currently functional? I see some data points that make me think
>> it is not:
>> 
>>   - This graph
>>   <
>> https://ge.apache.org/scans/performance?performance.metric=avoidanceSavings,avoidanceSavingsRemoteBuildCache&search.query=project:calcite&search.relativeStartTime=P90D&search.timeZoneId=America%2FChicago
>>> 
>>   shows that Calcite CI builds have not saved any time from the remote
>> cache
>>   over the last 90 days:
>>   - This graph
>>   <
>> https://ge.apache.org/scans/trends?search.query=project:calcite%20gradle.buildCache.hasError&search.relativeStartTime=P90D&search.timeZoneId=America%2FChicago
>>> 
>>   shows that ~552 Calcite builds have had cache errors in the last 90
>> days:
>>   - Looking at the details, most seem to indicate that the S3 bucket was
>>   not found, like in this build
>>   <
>> https://ge.apache.org/s/chsz2emskhndk/performance/build-cache#remote-cache-failure-0-0
>>> 
>> 
>> This makes me wonder if Calcite builds are losing time by trying the S3
>> build cache, not to mention that the "Seed build cache" job is spending CI
>> compute resources attempting to seed a cache it cannot access (15.5 hours
>> over the last 90 days, according to this graph
>> <
>> https://ge.apache.org/scans/trends?search.names=CI%20workflow&search.rootProjectNames=calcite&search.timeZoneId=America%2FChicago&search.values=Seed%20build%20cache
>>> 
>> ).
>> 
>> Is this remote cache configuration something that can just be removed? Or
>> can it be repaired?
>> 
>> An alternative may be to use the cache provided by ge.apache.org. It is
>> not
>> available today, but should be available over the next few months.
>> 
>> Thanks,
>> 
>> Clay Johnson
>> Lead Solutions Engineer
>> Gradle
>> W. gradle.com
>> 
>> --
>> *
>> CONFIDENTIALITY NOTICE*: The contents of this email message, and any
>> attachments, are intended solely for the addressee(s) and may contain
>> confidential, proprietary and/or privileged information legally protected
>> from disclosure. If you are not the intended recipient of this
>> communication, or if you received this communication by mistake, please
>> notify the sender immediately and delete this message and any attachments.
>> If you are not the intended recipient, you are hereby notified that any
>> use, retransmission, dissemination, copying or storage of this message or
>> its attachments is strictly prohibited.
>> 



Re: [ANNOUNCE] Mihai Budiu joins Calcite PMC

2024-07-26 Thread Julian Hyde
Welcome, Mihai! You have been showing leadership in Calcite for some time - 
reviewing many PRs and starting discussions about how to make our code and 
processes better - and it is appropriate that you are given a seat in the PMC. 

Congratulations, and thank you for your work. 

Julian 

> On Jul 26, 2024, at 8:17 AM, Norman Jordan 
>  wrote:
> 
> Congratulations, Mihai! Well earned, thanks for all of the help.
> 
> From: Benchao Li 
> Sent: Thursday, July 25, 2024 7:37 PM
> To: dev@calcite.apache.org 
> Subject: [ANNOUNCE] Mihai Budiu joins Calcite PMC
> 
> I am pleased to announce that Mihai has accepted an invitation to
> join the Calcite PMC. Mihai has been a consistent and helpful figure
> in the Calcite community for which we are very grateful. We look
> forward to the continued contributions and support.
> 
> Please join me in congratulating Mihai!
> 
> Benchao (on behalf of the Calcite PMC)


Re: Question about Project.getPartialMapping

2024-07-19 Thread Julian Hyde
See https://issues.apache.org/jira/browse/CALCITE-4166. 

> On Jul 19, 2024, at 6:28 AM, Gonzalo Ortiz Jaureguizar  
> wrote:
> 
> Hi there,
> 
> Fixing an issue in Apache Pinot I've found something that may be a bug in
> Apache Calcite, although probably is a misunderstanding from my side.
> 
> In Pinot we are not (yet) using the standard MetadataHandler (which I'm
> having issues to understand, but that is not the topic here) to calculate
> the distribution of each node. Instead, we add our own distribution in a
> trait and process that by ourselves in our own rules. That may not be the
> best in terms of DRY, but it is working.
> 
> We have found an issue when using projections that are not mapping.
> Probably a more expert eye will see in PinotRelDistributionTraitRule#L113
>  that our naive way to
> calculate the new distribution fails in that case
> because project.getMapping() returns null.
> 
> I planned to solve that by using project.getPartialMapping() in the same
> way done by Calcite in RelMdDistribution#L165
> ,
> but when I tried to do that, I found the following runtime exception:
> 
> java.lang.UnsupportedOperationException
> at
> org.apache.calcite.util.mapping.Mappings$AbstractMapping.getSourceOpt(Mappings.java:952)
> at
> org.apache.calcite.util.mapping.Mappings$InverseMapping.getTargetOpt(Mappings.java:1841)
> at
> org.apache.calcite.rel.RelDistributions$RelDistributionImpl.apply(RelDistributions.java:146)
> at
> org.apache.pinot.calcite.rel.rules.PinotRelDistributionTraitRule.deriveDistribution(PinotRelDistributionTraitRule.java:128)
> 
> Which makes me think there is a problem in the way
> Project.getPartialMapping is initializing the Mapping. My solution was to
> do something like:
> 
>  Mappings.TargetMapping mapping =
>  Project.getPartialMapping(input.getRowType().getFieldCount(),
> project.getProjects());
>  Mapping actualMapping =
> Mappings.create(MappingType.PARTIAL_FUNCTION,
> input.getRowType().getFieldCount(),
>  project.getRowType().getFieldCount());
>  for (IntPair intPair : mapping) {
>actualMapping.set(intPair.source, intPair.target);
>  }
>  return inputRelDistribution.apply(actualMapping);
> 
> But I don't think that is the most elegant solution. I think it is fair to
> assume that the Mapping returned by Project.getPartialMapping should be
> usable to call getTargetOpt.
> 
> Is there something I am doing or understanding wrong?



Re: Sortlimit and limit examples

2024-07-19 Thread Julian Hyde
Is there a rule that can push a limit (Sort with 0 sort keys) through Union? 
Hopefully there is, and it can be adapted for any subclass of Sort and Union. 

That rule should create the right plan. Then we can debate whether OP needs to 
implement a new union operator or reuse EnumerableUnion. 

Julian

> On Jul 19, 2024, at 08:13, Ruben Q L  wrote:
> 
> Eric, if you want to use EnumerableUnion instead of implementing your own
> Union, then I think you'll need to implement the appropriate Converter [1]
> to transform between conventions LDAP <=> ENUMERABLE
> 
> Best,
> Ruben
> 
> [1]
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/convert/Converter.java
> 
> 
>> On Fri, Jul 19, 2024 at 3:56 PM Eric Berryman 
>> wrote:
>> 
>> I implement an LDAPUnion and rule to convert from logicalunion, and of
>> course, the error goes away. But I didn’t implement the union correctly, so
>> nothing is returned in my query. Is there a way to have my LDAPUnion
>> use EnumerableUnion,
>> or a way to not have LDAPUnion and take care of this with a rule?
>> 
>> Thank you again!
>> Eric
>> 
>> On Tue, Jul 16, 2024 at 14:24 Eric Berryman 
>> wrote:
>> 
>>> Well, I was wrong. That didn’t solve my problem with :
>>> 
>>> Missing conversion is LogicalUnion[convention: NONE -> LDAP]
>>> 
>>> Do I have to implement my own union? Is there a way I could just use the
>>> EnumerableUnion?
>>> ie. NONE -> Enumerable and skip implementing one for my ldap datastore?
>>> 
>>> Thank you!
>>> Eric
>>> 
>>> On Tue, Jul 16, 2024 at 09:00 Eric Berryman 
>>> wrote:
>>> 
 When I register my tablescan object, and add my rules, I also added a
 removeRule for EnumerableRules.ENUMERABLE_MERGE_UNION_RULE, and now
 everything works with an offset also.
 
 Although, I don’t know why this is. I noticed the EnumerableMergeUnion
 object in the plan, and thought I would try to remove it, because it was
 different.
 
 Any explanation is appreciated, thank you!!!
 Eric
 
 
 On Mon, Jul 15, 2024 at 18:02 Eric Berryman 
 wrote:
 
> Hello!
> 
> I seem to have an issue with my new limit rule which pushes down to the
> datastore. It works fine, unless I add an offset to the fetch. Where I
>> end
> up with the following error:
> There are not enough rules… Missing conversion is
> LogicalUnion[convention: NONE -> LDAP]
> 
> Why would this only come up when an offset value is added?
> 
> Thank you!
> Eric
> 
> 
> On Fri, Jun 21, 2024 at 09:25 Eric Berryman 
> wrote:
> 
>> That’s perfect
>> 
>> 
>> https://github.com/apache/calcite/blob/main/cassandra/src/main/java/org/apache/calcite/adapter/cassandra/CassandraRules.java#L401
>> 
>> Thank you for such a quick response!
>> Eric
>> 
>> 
>> On Fri, Jun 21, 2024 at 09:14 Michael Mior  wrote:
>> 
>>> Eric,
>>> 
>>> Could you give a more specific example of the failure scenario you're
>>> experiencing?
>>> 
>>> For a simple example of how limits can be pushed down, this is done
>> in
>>> the
>>> Cassandra adapter with CassandraLimitRule. It matches an
>>> EnumerableLimit on
>>> top of a CassandraToEnumerableConverter and then converts that limit
>>> to a
>>> CassandraLimit which passes along the limit and offset information to
>>> CassandraToEnumerableConverter for when the query is executed. The
>>> EnumerableLimit is then replaced with a CassandraLimit effectively
>> as a
>>> placeholder to signal that the limit has been handled.
>>> 
>>> --
>>> Michael Mior
>>> mm...@apache.org
>>> 
>>> 
>>> On Fri, Jun 21, 2024 at 8:44 AM Eric Berryman <
>> eric.berry...@gmail.com
 
>>> wrote:
>>> 
 Hello!
 
 When I add limit to my relbuilder object, the planner gives up. But
>>> works
 fine without it.
 
 I wasn’t able to find any examples of using limit and pushing the
>>> limit
 values down to a data source.
 
 Could someone help with some links on this subject?
 
 Thank you!
 Eric
 
>>> 
>> 
>> 


Re: Comparing keyword handling in Babel and Core parsers

2024-07-17 Thread Julian Hyde
By the way, upgrading JavaCC is taking a whole. First, JavaCC “improved” its 
messages, which means that we have a lot of tests to change. I’m trying to 
massage the new messages so they look more like the old ones. 

Second, the parser generated for Babel is now very different from the other 
parsers. Its error messages are missing the “Was expecting …” section because 
it doesn’t have the information about tokens that can follow a rule. I fear 
that the Babel parser has a degree of ambiguity that newer JavaCC cannot 
handle. I might be wrong, and I hope I’m wrong. 

Julian

> On Jul 17, 2024, at 7:01 AM, Stamatis Zampetakis  wrote:
> 
> I agree that this is not something that should prevent us from
> upgrading to a newer JavaCC. It's interesting that the generated
> parser code is different but this is something that we could
> investigate in a follow-up. Even if Babel becomes slower after the
> upgrade we cannot stay with the 2006 release forever. For sure there
> are many more benefits in upgrading to the latest JavaCC version and
> it would be great to have this in the next Calcite release.
> 
> Best,
> Stamatis
> 
>> On Wed, Jul 10, 2024 at 8:03 PM Julian Hyde  wrote:
>> 
>> I am working on https://issues.apache.org/jira/browse/CALCITE-5541
>> (upgrading JavaCC) and the Babel parser is having problems deducing
>> whether a keyword is reserved. Investigating this, I took a look at
>> the generated code, and found something interesting.
>> 
>> Here are the NonReservedKeyWord and NonReservedKeyWord0of3 methods in
>> Babel 
>> (babel/build/javacc/javaCCMain/org/apache/calcite/sql/parser/babel/SqlBabelParserImpl.java):
>> 
>>  final public String NonReservedKeyWord() throws ParseException {
>>if (jj_2_1116(2)) {
>>  NonReservedKeyWord0of3();
>>} else if (jj_2_1117(2)) {
>>  NonReservedKeyWord1of3();
>>} else if (jj_2_1118(2)) {
>>  NonReservedKeyWord2of3();
>>} else {
>>  jj_consume_token(-1);
>>  throw new ParseException();
>>}
>>  {if ("" != null) return unquotedIdentifier();}
>>throw new Error("Missing return statement in function");
>>  }
>> 
>>  /** @see #NonReservedKeyWord */
>>  final public void NonReservedKeyWord0of3() throws ParseException {
>>if (jj_2_1119(2)) {
>>  jj_consume_token(A);
>>} else if (jj_2_1120(2)) {
>>  jj_consume_token(ACTION);
>>} else if (jj_2_1121(2)) {
>>  jj_consume_token(ADMIN);
>>...
>> 
>> And here are the same methods in Core
>> (core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java):
>> 
>>  final public String NonReservedKeyWord() throws ParseException {
>>switch ((jj_ntk==-1)?jj_ntk_f():jj_ntk) {
>>case A:
>>case ACTION:
>>case ADMIN:
>>case APPLY:
>>...
>>case YEARS:{
>>  NonReservedKeyWord0of3();
>>  break;
>>  }
>>case ABSENT:
>>   ...
>>case ZONE:{
>>  NonReservedKeyWord1of3();
>>  break;
>>  }
>>...
>>default:
>>  jj_la1[436] = jj_gen;
>>  jj_consume_token(-1);
>>  throw new ParseException();
>>}
>>  {if ("" != null) return unquotedIdentifier();}
>>throw new Error("Missing return statement in function");
>>  }
>> 
>>  /** @see #NonReservedKeyWord */
>>  final public void NonReservedKeyWord0of3() throws ParseException {
>>switch ((jj_ntk==-1)?jj_ntk_f():jj_ntk) {
>>case A:{
>>  jj_consume_token(A);
>>  break;
>>  }
>>case ACTION:{
>>  jj_consume_token(ACTION);
>>  break;
>>  }
>>case ADMIN:{
>>  jj_consume_token(ADMIN);
>>  break;
>>  }
>>...
>> 
>> Both of the above are generated using JavaCC 7.0.13. Other parsers,
>> such as Server, look similar to Core. Under JavaCC 4.0, all parsers
>> generate a 'switch'.
>> 
>> In all parsers we split the reserved keywords into 3 rules (0of3,
>> 1of3, 2of3) due to the size restrictions noted in
>> https://issues.apache.org/jira/browse/CALCITE-2405.
>> 
>> I was puzzled why one is generating a 'switch' and the other is
>> generating chained 'if'...'else-if's. At first I thought it was that
>> Babel had more keywords, but some experiments eliminated that
>> possibility. I also disproved the hypothesis that it is because Babel
>> allows extra characters in identifiers (see
>> https://issues.apache.org/jira/browse/CALCITE-5668). My current
>> hypothesis is that Babel needs to use lookahead in order to determine
>> whether a non-reserved keyword can be converted to an identifier.
>> 
>> But whatever the reason, something seems to be very different about
>> the Babel grammar. Given how frequently identifiers occur when parsing
>> SQL, I would not be surprised if the Babel parser is significantly
>> slower than the regular parser under JavaCC 7.0.13.
>> 
>> In my opinion, that is not a bug that should prevent us from upgrading
>> JavaCC. Especially given that JavaCC 4.0 has a performance bug that is
>> affecting all of our parser variants.
>> 
>> Julian


Re: SqlBabelParserImpl instantiation is slow because it instantiates a java.lang.Error

2024-07-11 Thread Julian Hyde
> PS By coincidence, Aleksey is a colleague at Google, working on BigQuery 
> performance.

Oops. I was thinking of a different Alexey. I see that Aleksey
Shipilëv is at AWS.

Julian

On Thu, Jul 11, 2024 at 7:43 AM Julian Hyde  wrote:
>
> Thanks for the article, Stamatis. I learned a lot.
>
> I believe that upgrading to JavaCC 7.0.5 (or higher) will solve the first 
> problem in Aleksey‘s article (stack trace construction) but not the second 
> (stack trace unwinding).
>
> JavaCC uses exceptions extensively in its lookahead logic. (There hundreds of 
> ‘catch (LookaheadSuccess)’ blocks in our code, and I would guess that we 
> invoke lookahead once per line in a typical query, but the number of frames 
> to be unwound each lookahead is very small.) I have to hope that the JavaCC 
> team have evaluated the performance implications.
>
> Julian
>
> PS By coincidence, Aleksey is a colleague at Google, working on BigQuery 
> performance.
>
> > On Jul 11, 2024, at 05:48, Stamatis Zampetakis  wrote:
> >
> > This reminds me of a very nice benchmark done by Aleksey Shipilёv
> > around the performance implications of Exception [1]. Among other
> > things the benchmark analyses the impact of stack depth in the
> > performance of exceptions and is very similar to the parser benchmark
> > contributed by Cyril.
> >
> > The closing meme says everything: "If you didn't want exceptions to
> > ruin your performance you shouldn't have used them for the regular
> > control flow"
> >
> > Best,
> > Stamatis
> >
> > [1] https://shipilev.net/blog/2014/exceptional-performance/
> >
> >> On Thu, Jul 11, 2024 at 10:12 AM Cyril DE CATHEU
> >>  wrote:
> >>
> >> https://github.com/apache/calcite/pull/3853
> >>
> >>> On Wed, Jul 10, 2024 at 5:29 PM Julian Hyde  
> >>> wrote:
> >>>
> >>> PS Your method of increasing stack size by making recursive calls seems
> >>> valid. Your experiments show an improvement before and after the fix.
> >>> That’s good enough for me.
> >>>
> >>> I plan to use your benchmark to check that upgrading JavaCC doesn’t
> >>> introduce other performance problems.
> >>>
> >>> Julian
> >>>
> >>>> On Jul 10, 2024, at 08:18, Julian Hyde  wrote:
> >>>>
> >>>> A PR with the benchmark would be great. If it is based on jmh I will
> >>> merge it to main, so others can run the benchmark in future, and maybe do
> >>> more performance optimizations on the parser.
> >>>>
> >>>> By the way, it can be on whichever parser variant (core, babel, server)
> >>> is most convenient. I’m pretty sure they have the same performance issues.
> >>>>
> >>>> I’m continuing to work on upgrading JavaCC to 7.0.13, and should have a
> >>> PR ready soon.
> >>>>
> >>>> Julian
> >>>>
> >>>>> On Jul 10, 2024, at 03:54, Cyril DE CATHEU 
> >>> wrote:
> >>>>>
> >>>>> Hey Mihai and Julian,
> >>>>>
> >>>>> Thanks for looking at this.
> >>>>> The performance hit depends on the size of the stack.
> >>>>> Here are the numbers I have for the moment: (*not run in a good
> >>> benchmark
> >>>>> setup* but gives an idea)
> >>>>>
> >>>>> Benchmark  Mode
> >>>>> Cnt   Score   Error  Units
> >>>>> BabelParserBenchmark.instantiateBabelParseravgt
> >>>>> 7  14.128 ± 0.041  us/op
> >>>>> BabelParserBenchmark.instantiateBabelParserErrorFixed  avgt
> >>>>> 7  12.054 ± 0.039  us/op
> >>>>>
> >>>>> BabelParserBenchmark.instantiateBabelParserBigCallStackavgt
> >>>>> 7  20.576 ± 0.069  us/op
> >>>>> BabelParserBenchmark.instantiateBabelParserBigCallStackErrorFixed  avgt
> >>>>> 7  12.479 ± 0.068  us/op
> >>>>>
> >>>>> For the smallest error stack possible with JMH, the current version is
> >>> ~15%
> >>>>> slower than with the fix.
> >>>>> With a bigger stack - instantiation that happens on depth ~100 - the
> >>>>> current version gets even slower. With the fix the time is stable.
> >>>>>
> >>>>> No

Re: SqlBabelParserImpl instantiation is slow because it instantiates a java.lang.Error

2024-07-11 Thread Julian Hyde
Thanks for the article, Stamatis. I learned a lot. 

I believe that upgrading to JavaCC 7.0.5 (or higher) will solve the first 
problem in Aleksey‘s article (stack trace construction) but not the second 
(stack trace unwinding).

JavaCC uses exceptions extensively in its lookahead logic. (There hundreds of 
‘catch (LookaheadSuccess)’ blocks in our code, and I would guess that we invoke 
lookahead once per line in a typical query, but the number of frames to be 
unwound each lookahead is very small.) I have to hope that the JavaCC team have 
evaluated the performance implications. 

Julian

PS By coincidence, Aleksey is a colleague at Google, working on BigQuery 
performance. 

> On Jul 11, 2024, at 05:48, Stamatis Zampetakis  wrote:
> 
> This reminds me of a very nice benchmark done by Aleksey Shipilёv
> around the performance implications of Exception [1]. Among other
> things the benchmark analyses the impact of stack depth in the
> performance of exceptions and is very similar to the parser benchmark
> contributed by Cyril.
> 
> The closing meme says everything: "If you didn't want exceptions to
> ruin your performance you shouldn't have used them for the regular
> control flow"
> 
> Best,
> Stamatis
> 
> [1] https://shipilev.net/blog/2014/exceptional-performance/
> 
>> On Thu, Jul 11, 2024 at 10:12 AM Cyril DE CATHEU
>>  wrote:
>> 
>> https://github.com/apache/calcite/pull/3853
>> 
>>> On Wed, Jul 10, 2024 at 5:29 PM Julian Hyde  wrote:
>>> 
>>> PS Your method of increasing stack size by making recursive calls seems
>>> valid. Your experiments show an improvement before and after the fix.
>>> That’s good enough for me.
>>> 
>>> I plan to use your benchmark to check that upgrading JavaCC doesn’t
>>> introduce other performance problems.
>>> 
>>> Julian
>>> 
>>>> On Jul 10, 2024, at 08:18, Julian Hyde  wrote:
>>>> 
>>>> A PR with the benchmark would be great. If it is based on jmh I will
>>> merge it to main, so others can run the benchmark in future, and maybe do
>>> more performance optimizations on the parser.
>>>> 
>>>> By the way, it can be on whichever parser variant (core, babel, server)
>>> is most convenient. I’m pretty sure they have the same performance issues.
>>>> 
>>>> I’m continuing to work on upgrading JavaCC to 7.0.13, and should have a
>>> PR ready soon.
>>>> 
>>>> Julian
>>>> 
>>>>> On Jul 10, 2024, at 03:54, Cyril DE CATHEU 
>>> wrote:
>>>>> 
>>>>> Hey Mihai and Julian,
>>>>> 
>>>>> Thanks for looking at this.
>>>>> The performance hit depends on the size of the stack.
>>>>> Here are the numbers I have for the moment: (*not run in a good
>>> benchmark
>>>>> setup* but gives an idea)
>>>>> 
>>>>> Benchmark  Mode
>>>>> Cnt   Score   Error  Units
>>>>> BabelParserBenchmark.instantiateBabelParseravgt
>>>>> 7  14.128 ± 0.041  us/op
>>>>> BabelParserBenchmark.instantiateBabelParserErrorFixed  avgt
>>>>> 7  12.054 ± 0.039  us/op
>>>>> 
>>>>> BabelParserBenchmark.instantiateBabelParserBigCallStackavgt
>>>>> 7  20.576 ± 0.069  us/op
>>>>> BabelParserBenchmark.instantiateBabelParserBigCallStackErrorFixed  avgt
>>>>> 7  12.479 ± 0.068  us/op
>>>>> 
>>>>> For the smallest error stack possible with JMH, the current version is
>>> ~15%
>>>>> slower than with the fix.
>>>>> With a bigger stack - instantiation that happens on depth ~100 - the
>>>>> current version gets even slower. With the fix the time is stable.
>>>>> 
>>>>> Note I'm not sure my method to make the stack bigger is realistic. I
>>> just
>>>>> do some recursive calls.
>>>>> Also to fix the issue I just copied the SqlBabelParserImpl and fixed the
>>>>> stack trace call, so the results above may not match exactly the change
>>>>> that will be observed by upgrading to 7.0.13.
>>>>> 
>>>>> I can create a PR that adds the benchmark (without the hotfix
>>> *ErrorFixed
>>>>> one) if it's okay for you.
>>>>> 
>>>>>>> On Tue, Jul 9, 2024 at 4:07 AM Julian Hyde  wrote:
>>>>>> 
>&

Comparing keyword handling in Babel and Core parsers

2024-07-10 Thread Julian Hyde
I am working on https://issues.apache.org/jira/browse/CALCITE-5541
(upgrading JavaCC) and the Babel parser is having problems deducing
whether a keyword is reserved. Investigating this, I took a look at
the generated code, and found something interesting.

Here are the NonReservedKeyWord and NonReservedKeyWord0of3 methods in
Babel 
(babel/build/javacc/javaCCMain/org/apache/calcite/sql/parser/babel/SqlBabelParserImpl.java):

  final public String NonReservedKeyWord() throws ParseException {
if (jj_2_1116(2)) {
  NonReservedKeyWord0of3();
} else if (jj_2_1117(2)) {
  NonReservedKeyWord1of3();
} else if (jj_2_1118(2)) {
  NonReservedKeyWord2of3();
} else {
  jj_consume_token(-1);
  throw new ParseException();
}
  {if ("" != null) return unquotedIdentifier();}
throw new Error("Missing return statement in function");
  }

  /** @see #NonReservedKeyWord */
  final public void NonReservedKeyWord0of3() throws ParseException {
if (jj_2_1119(2)) {
  jj_consume_token(A);
} else if (jj_2_1120(2)) {
  jj_consume_token(ACTION);
} else if (jj_2_1121(2)) {
  jj_consume_token(ADMIN);
...

And here are the same methods in Core
(core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java):

  final public String NonReservedKeyWord() throws ParseException {
switch ((jj_ntk==-1)?jj_ntk_f():jj_ntk) {
case A:
case ACTION:
case ADMIN:
case APPLY:
...
case YEARS:{
  NonReservedKeyWord0of3();
  break;
  }
case ABSENT:
   ...
case ZONE:{
  NonReservedKeyWord1of3();
  break;
  }
...
default:
  jj_la1[436] = jj_gen;
  jj_consume_token(-1);
  throw new ParseException();
}
  {if ("" != null) return unquotedIdentifier();}
throw new Error("Missing return statement in function");
  }

  /** @see #NonReservedKeyWord */
  final public void NonReservedKeyWord0of3() throws ParseException {
switch ((jj_ntk==-1)?jj_ntk_f():jj_ntk) {
case A:{
  jj_consume_token(A);
  break;
  }
case ACTION:{
  jj_consume_token(ACTION);
  break;
  }
case ADMIN:{
  jj_consume_token(ADMIN);
  break;
  }
...

Both of the above are generated using JavaCC 7.0.13. Other parsers,
such as Server, look similar to Core. Under JavaCC 4.0, all parsers
generate a 'switch'.

In all parsers we split the reserved keywords into 3 rules (0of3,
1of3, 2of3) due to the size restrictions noted in
https://issues.apache.org/jira/browse/CALCITE-2405.

I was puzzled why one is generating a 'switch' and the other is
generating chained 'if'...'else-if's. At first I thought it was that
Babel had more keywords, but some experiments eliminated that
possibility. I also disproved the hypothesis that it is because Babel
allows extra characters in identifiers (see
https://issues.apache.org/jira/browse/CALCITE-5668). My current
hypothesis is that Babel needs to use lookahead in order to determine
whether a non-reserved keyword can be converted to an identifier.

But whatever the reason, something seems to be very different about
the Babel grammar. Given how frequently identifiers occur when parsing
SQL, I would not be surprised if the Babel parser is significantly
slower than the regular parser under JavaCC 7.0.13.

In my opinion, that is not a bug that should prevent us from upgrading
JavaCC. Especially given that JavaCC 4.0 has a performance bug that is
affecting all of our parser variants.

Julian


Re: SqlBabelParserImpl instantiation is slow because it instantiates a java.lang.Error

2024-07-10 Thread Julian Hyde
PS Your method of increasing stack size by making recursive calls seems valid. 
Your experiments show an improvement before and after the fix. That’s good 
enough for me. 

I plan to use your benchmark to check that upgrading JavaCC doesn’t introduce 
other performance problems. 
 
Julian

> On Jul 10, 2024, at 08:18, Julian Hyde  wrote:
> 
> A PR with the benchmark would be great. If it is based on jmh I will merge 
> it to main, so others can run the benchmark in future, and maybe do more 
> performance optimizations on the parser.
> 
> By the way, it can be on whichever parser variant (core, babel, server) is 
> most convenient. I’m pretty sure they have the same performance issues.
> 
> I’m continuing to work on upgrading JavaCC to 7.0.13, and should have a PR 
> ready soon.
> 
> Julian
> 
>> On Jul 10, 2024, at 03:54, Cyril DE CATHEU  wrote:
>> 
>> Hey Mihai and Julian,
>> 
>> Thanks for looking at this.
>> The performance hit depends on the size of the stack.
>> Here are the numbers I have for the moment: (*not run in a good benchmark
>> setup* but gives an idea)
>> 
>> Benchmark  Mode
>> Cnt   Score   Error  Units
>> BabelParserBenchmark.instantiateBabelParseravgt
>> 7  14.128 ± 0.041  us/op
>> BabelParserBenchmark.instantiateBabelParserErrorFixed  avgt
>> 7  12.054 ± 0.039  us/op
>> 
>> BabelParserBenchmark.instantiateBabelParserBigCallStackavgt
>> 7  20.576 ± 0.069  us/op
>> BabelParserBenchmark.instantiateBabelParserBigCallStackErrorFixed  avgt
>> 7  12.479 ± 0.068  us/op
>> 
>> For the smallest error stack possible with JMH, the current version is ~15%
>> slower than with the fix.
>> With a bigger stack - instantiation that happens on depth ~100 - the
>> current version gets even slower. With the fix the time is stable.
>> 
>> Note I'm not sure my method to make the stack bigger is realistic. I just
>> do some recursive calls.
>> Also to fix the issue I just copied the SqlBabelParserImpl and fixed the
>> stack trace call, so the results above may not match exactly the change
>> that will be observed by upgrading to 7.0.13.
>> 
>> I can create a PR that adds the benchmark (without the hotfix *ErrorFixed
>> one) if it's okay for you.
>> 
>>>> On Tue, Jul 9, 2024 at 4:07 AM Julian Hyde  wrote:
>>> 
>>> I've started work on
>>> https://issues.apache.org/jira/browse/CALCITE-5541, to upgrade to
>>> 7.0.13. There are still a few errors, but the inefficiency in the
>>> subclass of Error seems to have been solved.
>>> 
>>> Can someone work on the microbenchmark? I would like to see numbers
>>> before and after this fix.
>>> 
>>> Julian
>>> 
>>> 
>>>> On Mon, Jul 8, 2024 at 1:51 PM Mihai Budiu  wrote:
>>>> 
>>>> There is an issue to upgrade JavaCC:
>>> https://issues.apache.org/jira/browse/CALCITE-5541
>>>> 
>>>> It may be a worthwhile effort to do it.
>>>> 
>>>> Mihai
>>>> 
>>>> From: Cyril DE CATHEU 
>>>> Sent: Friday, July 5, 2024 8:55 AM
>>>> To: dev@calcite.apache.org 
>>>> Subject: Re: SqlBabelParserImpl instantiation is slow because it
>>> instantiates a java.lang.Error
>>>> 
>>>> Me again,
>>>> 
>>>> So this LookaheadSuccess comes from JavaCC.
>>>> It seems this issue was solved in JavaCC some time ago, from JavaCC
>>> 7.0.5:
>>>> https://github.com/javacc/javacc/pull/99
>>>> 
>>> https://github.com/javacc/javacc/blob/46aa917fda0d0726690e384632e5cefae46bab68/docs/release-notes.md?plain=1#L163
>>>> 
>>>> 
>>>> 
>>>> On Fri, Jul 5, 2024 at 5:44 PM Cyril DE CATHEU >> <mailto:cy...@startree.ai>> wrote:
>>>> Had a deeper look, it seems the LookaheadSuccess instance is exploited
>>> for conditional branching
>>>> 
>>>> Eg:
>>>> 
>>>> try { return !jj_3_2(); }
>>>> catch(LookaheadSuccess ls) { return true; }
>>>> finally { jj_save(1, xla); }
>>>> 
>>>> So I guess lazy instantiation will not help because the LookaheadSuccess
>>> will be instantiated anyway.
>>>> If we know this error is always caught internally in the class, could we
>>> override the constructor to not do a call to Throwable.fillInStackTrace();
>>> 

Re: SqlBabelParserImpl instantiation is slow because it instantiates a java.lang.Error

2024-07-10 Thread Julian Hyde
A PR with the benchmark would be great. If it is based on jmh I will merge it 
to main, so others can run the benchmark in future, and maybe do more 
performance optimizations on the parser. 

By the way, it can be on whichever parser variant (core, babel, server) is most 
convenient. I’m pretty sure they have the same performance issues. 

I’m continuing to work on upgrading JavaCC to 7.0.13, and should have a PR 
ready soon. 

Julian

> On Jul 10, 2024, at 03:54, Cyril DE CATHEU  wrote:
> 
> Hey Mihai and Julian,
> 
> Thanks for looking at this.
> The performance hit depends on the size of the stack.
> Here are the numbers I have for the moment: (*not run in a good benchmark
> setup* but gives an idea)
> 
> Benchmark  Mode
> Cnt   Score   Error  Units
> BabelParserBenchmark.instantiateBabelParseravgt
> 7  14.128 ± 0.041  us/op
> BabelParserBenchmark.instantiateBabelParserErrorFixed  avgt
> 7  12.054 ± 0.039  us/op
> 
> BabelParserBenchmark.instantiateBabelParserBigCallStackavgt
> 7  20.576 ± 0.069  us/op
> BabelParserBenchmark.instantiateBabelParserBigCallStackErrorFixed  avgt
> 7  12.479 ± 0.068  us/op
> 
> For the smallest error stack possible with JMH, the current version is ~15%
> slower than with the fix.
> With a bigger stack - instantiation that happens on depth ~100 - the
> current version gets even slower. With the fix the time is stable.
> 
> Note I'm not sure my method to make the stack bigger is realistic. I just
> do some recursive calls.
> Also to fix the issue I just copied the SqlBabelParserImpl and fixed the
> stack trace call, so the results above may not match exactly the change
> that will be observed by upgrading to 7.0.13.
> 
> I can create a PR that adds the benchmark (without the hotfix *ErrorFixed
> one) if it's okay for you.
> 
>> On Tue, Jul 9, 2024 at 4:07 AM Julian Hyde  wrote:
>> 
>> I've started work on
>> https://issues.apache.org/jira/browse/CALCITE-5541, to upgrade to
>> 7.0.13. There are still a few errors, but the inefficiency in the
>> subclass of Error seems to have been solved.
>> 
>> Can someone work on the microbenchmark? I would like to see numbers
>> before and after this fix.
>> 
>> Julian
>> 
>> 
>>> On Mon, Jul 8, 2024 at 1:51 PM Mihai Budiu  wrote:
>>> 
>>> There is an issue to upgrade JavaCC:
>> https://issues.apache.org/jira/browse/CALCITE-5541
>>> 
>>> It may be a worthwhile effort to do it.
>>> 
>>> Mihai
>>> 
>>> From: Cyril DE CATHEU 
>>> Sent: Friday, July 5, 2024 8:55 AM
>>> To: dev@calcite.apache.org 
>>> Subject: Re: SqlBabelParserImpl instantiation is slow because it
>> instantiates a java.lang.Error
>>> 
>>> Me again,
>>> 
>>> So this LookaheadSuccess comes from JavaCC.
>>> It seems this issue was solved in JavaCC some time ago, from JavaCC
>> 7.0.5:
>>> https://github.com/javacc/javacc/pull/99
>>> 
>> https://github.com/javacc/javacc/blob/46aa917fda0d0726690e384632e5cefae46bab68/docs/release-notes.md?plain=1#L163
>>> 
>>> 
>>> 
>>> On Fri, Jul 5, 2024 at 5:44 PM Cyril DE CATHEU > <mailto:cy...@startree.ai>> wrote:
>>> Had a deeper look, it seems the LookaheadSuccess instance is exploited
>> for conditional branching
>>> 
>>> Eg:
>>> 
>>> try { return !jj_3_2(); }
>>> catch(LookaheadSuccess ls) { return true; }
>>> finally { jj_save(1, xla); }
>>> 
>>> So I guess lazy instantiation will not help because the LookaheadSuccess
>> will be instantiated anyway.
>>> If we know this error is always caught internally in the class, could we
>> override the constructor to not do a call to Throwable.fillInStackTrace();
>> ? which is what makes this slow.
>>> 
>>> On Fri, Jul 5, 2024 at 10:53 AM Cyril DE CATHEU > <mailto:cy...@startree.ai>> wrote:
>>> Hey,
>>> 
>>> I'm using Calcite to parse snippets of SQL.
>>> My (simplified) function looks like this:
>>> 
>>> public static SqlNode expressionToNode(final String sqlExpression,
>>>final SqlParser.Config config) {
>>>  SqlParser sqlParser = SqlParser.create(sqlExpression, config);
>>>  try {
>>>return sqlParser.parseExpression();
>>>  } catch (SqlParseException e) {
>>> //do something
>>>  }
>>> }
>>> 
>>> and it is called man

Re: SqlBabelParserImpl instantiation is slow because it instantiates a java.lang.Error

2024-07-08 Thread Julian Hyde
I've started work on
https://issues.apache.org/jira/browse/CALCITE-5541, to upgrade to
7.0.13. There are still a few errors, but the inefficiency in the
subclass of Error seems to have been solved.

Can someone work on the microbenchmark? I would like to see numbers
before and after this fix.

Julian


On Mon, Jul 8, 2024 at 1:51 PM Mihai Budiu  wrote:
>
> There is an issue to upgrade JavaCC: 
> https://issues.apache.org/jira/browse/CALCITE-5541
>
> It may be a worthwhile effort to do it.
>
> Mihai
> 
> From: Cyril DE CATHEU 
> Sent: Friday, July 5, 2024 8:55 AM
> To: dev@calcite.apache.org 
> Subject: Re: SqlBabelParserImpl instantiation is slow because it instantiates 
> a java.lang.Error
>
> Me again,
>
> So this LookaheadSuccess comes from JavaCC.
> It seems this issue was solved in JavaCC some time ago, from JavaCC 7.0.5:
> https://github.com/javacc/javacc/pull/99
> https://github.com/javacc/javacc/blob/46aa917fda0d0726690e384632e5cefae46bab68/docs/release-notes.md?plain=1#L163
>
>
>
> On Fri, Jul 5, 2024 at 5:44 PM Cyril DE CATHEU 
> mailto:cy...@startree.ai>> wrote:
> Had a deeper look, it seems the LookaheadSuccess instance is exploited for 
> conditional branching
>
> Eg:
>
> try { return !jj_3_2(); }
> catch(LookaheadSuccess ls) { return true; }
> finally { jj_save(1, xla); }
>
> So I guess lazy instantiation will not help because the LookaheadSuccess will 
> be instantiated anyway.
> If we know this error is always caught internally in the class, could we 
> override the constructor to not do a call to Throwable.fillInStackTrace(); ? 
> which is what makes this slow.
>
> On Fri, Jul 5, 2024 at 10:53 AM Cyril DE CATHEU 
> mailto:cy...@startree.ai>> wrote:
> Hey,
>
> I'm using Calcite to parse snippets of SQL.
> My (simplified) function looks like this:
>
> public static SqlNode expressionToNode(final String sqlExpression,
> final SqlParser.Config config) {
>   SqlParser sqlParser = SqlParser.create(sqlExpression, config);
>   try {
> return sqlParser.parseExpression();
>   } catch (SqlParseException e) {
>  //do something
>   }
> }
>
> and it is called many times.
> The parser set in the config is Babel, so the parser instantiated is 
> SqlBabelParserImpl.
> A SqlBabelParserImpl is instantiated every time this function is called. 
> >From what I understand this parser cannot be re-used easily to parse 
> different expressions but let me know if I'm incorrect.
>
> The issue I'm encountering is the instantiation of this class is pretty slow 
> because one of the instance field  jj_ls extends java.lang.Error and is 
> instantiated when the SqlBabelParserImpl is instantiated.
>
> [Screenshot 2024-07-05 at 10.37.17.png]
>
> here is the extract of the generated code in SqlBabelParserImpl:
>
> static private final class LookaheadSuccess extends java.lang.Error { }
>
> final private LookaheadSuccess jj_ls = new LookaheadSuccess();
>
> final private boolean jj_scan_token(int kind) {
>   if (jj_scanpos == jj_lastpos) {
> jj_la--;
> if (jj_scanpos.next == null) {
>   jj_lastpos = jj_scanpos = jj_scanpos.next = token_source.getNextToken();
> } else {
>   jj_lastpos = jj_scanpos = jj_scanpos.next;
> }
>   } else {
> jj_scanpos = jj_scanpos.next;
>   }
>   if (jj_rescan) {
> int i = 0; Token tok = token;
> while (tok != null && tok != jj_scanpos) { i++; tok = tok.next; }
> if (tok != null) jj_add_error_token(kind, i);
>   }
>   if (jj_scanpos.kind != kind) return true;
>   if (jj_la == 0 && jj_scanpos == jj_lastpos) throw jj_ls;
>   return false;
> }
>
>
> jj_ls is only used in an error case. (before last line of jj_scan_token)
> I'm assuming the re-use of a single error instance is done on purpose,
> but could we consider instantiate jj_ls lazily?
>
> I can try to do this but I'm a bit lost in the code generation codebase so I 
> would need some pointers.
>
> Have a nice day.
> --
>
>  [StarTree] 
> Cyril de Catheu
> Software Engineer
> +33 (684) 829-908
> Follow us: [RSS]  [LinkedIn] 
>  [Twitter] 
>  [Slack]  [YouTube] 
> 
>
> [Try StarTree Cloud 
> Today]


[jira] [Created] (CALCITE-6459) Measures inside structured types (ROW, ARRAY, MAP)

2024-07-05 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6459:


 Summary: Measures inside structured types (ROW, ARRAY, MAP)
 Key: CALCITE-6459
 URL: https://issues.apache.org/jira/browse/CALCITE-6459
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


Decide whether it is valid for measures to occur inside structured types (ROW, 
ARRAY, MAP). If invalid, throw a validation error; if valid, change the planner 
to make such queries work.

A measure can occur within a ROW (struct) type because measures are part of a 
table's row type. It is more difficult to see measures in arrays or maps, 
because they would have to be treated as values (similar to lambdas) and could 
not be expanded at compile time.

Note that {{SqlTypeUtil.fromMeasure}} only handles struct types currently.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Draft: board report for 2024 Q2

2024-07-03 Thread Julian Hyde
PS Thank you to Mihai for taking the lion’s share of reviewing activity. The 
rest of us (myself included) need to step up and help him out. 

> On Jul 3, 2024, at 6:30 AM, Julian Hyde  wrote:
> 
> +1.
> 
> Thank you for mentioning my paper.
> 
> Julian
> 
>> On Jul 3, 2024, at 5:19 AM, Francis Chuang  wrote:
>> 
>> +1, great work, Benchao!
>> 
>>>> On 3/07/2024 10:09 pm, Benchao Li wrote:
>>> Hello,
>>> Below you can find a draft of this quarter's board report. I plan to
>>> submit the final version next Tuesday (Jul 9, 2024).
>>> Please let me know if you have any additions or corrections.
>>> ## Description:
>>> Apache Calcite is a highly customizable framework for parsing and planning
>>> queries on data in a wide variety of formats. It allows database-like
>>> access,
>>> and in particular a SQL interface and advanced query optimization, for data
>>> not residing in a traditional database.
>>> Avatica is a sub-project within Calcite and provides a framework for
>>> building
>>> local and remote JDBC and ODBC database drivers. Avatica has an independent
>>> release schedule and its own repository.
>>> ## Project Status:
>>> There are no issues requiring board attention.
>>> ## Membership Data:
>>> Apache Calcite was founded 2015-10-22 (9 years ago)
>>> There are currently 74 committers and 28 PMC members in this project.
>>> The Committer-to-PMC ratio is roughly 5:2.
>>> Community changes, past quarter:
>>> - No new PMC members. Last addition was Sergey Nuyanzin on 2024-03-05.
>>> - No new committers. Last addition was Hongyu Guo on 2023-11-03.
>>> ## Project Activity:
>>> Apache Calcite Avatica 1.25.0 was released on 2024-04-05. It is a routine
>>> release featuring support for JDK 21, Gradle 8.5 and several bug fixes.
>>> Apache Calcite 1.37.0 was released on 2024-05-04. It contains contributions
>>> from 46 contributors, and resolves 138 issues. It introduces Apache Arrow
>>> adapter and StarRocks dialect, adds support for lambda expressions in SQL 
>>> and
>>> ‘Must-filter’ columns. For table function calls it is now possible to use 
>>> them
>>> without TABLE() wrapper in FROM. Furthermore, there is support for optional
>>> FORMAT of CAST operator from SQL:2016 and more than 15 new SQL functions in
>>> various libraries such as BigQuery, PostgreSQL and Spark.
>>> Besides releases, it's worth to mention that Julian Hyde presented "Measures
>>> in SQL" both in SF Distributed Systems Meetup in downtown SF on Wednesday 
>>> May
>>> 22nd, and SIGMOD 2024 conference in Santiago, Chile, which has been
>>> implemented in Calcite.
>>> ## Community Health:
>>> The community maintains a healthy status, previously it's super healthy. The
>>> reason is that we did not invite new committers for more than 6 months. 
>>> There
>>> are a few new faces in the community, hopefully we'll invite new committers 
>>> in
>>> the near future.
>>> Most of the statistics slightly decreased compared to last quarter (dev@
>>> decreased by 39%, issues@ decreased by 22%, commits decreased by 31%, code
>>> contributors decreased by 32%, active reviewers decreased by 36.8%). The
>>> reason I can see is there is less activity in supporting new SQL functions 
>>> in
>>> various libraries recently, which is much more active in previous quarters.
>>> The number of non-committer (contributor) commits per month:
>>> +--+---+-+
>>> | year | month | contributor_commits |
>>> +--+---+-+
>>> | 2024 | 4 |  13 |
>>> | 2024 | 5 |  11 |
>>> | 2024 | 6 |  15 |
>>> +--+---+-+
>>> The number of active reviewers per month:
>>> +--+---+--+
>>> | year | month | active_reviewers |
>>> +--+---+--+
>>> | 2024 | 4 |5 |
>>> | 2024 | 5 |4 |
>>> | 2024 | 6 |3 |
>>> +--+---+--+
>>> Top reviewers in the last 3 months:
>>> +--+-+
>>> | committer| reviews |
>>> +--+-+
>>> | Mihai Budiu  |  19 |
>>> | NobiGo  |   9 |
>>> | Julian Hyde|   3 |
>>> +--+-+
>>> Best,
>>> Benchao Li


Re: Draft: board report for 2024 Q2

2024-07-03 Thread Julian Hyde
+1. 

Thank you for mentioning my paper. 

Julian

> On Jul 3, 2024, at 5:19 AM, Francis Chuang  wrote:
> 
> +1, great work, Benchao!
> 
>> On 3/07/2024 10:09 pm, Benchao Li wrote:
>> Hello,
>> Below you can find a draft of this quarter's board report. I plan to
>> submit the final version next Tuesday (Jul 9, 2024).
>> Please let me know if you have any additions or corrections.
>> ## Description:
>> Apache Calcite is a highly customizable framework for parsing and planning
>> queries on data in a wide variety of formats. It allows database-like
>> access,
>> and in particular a SQL interface and advanced query optimization, for data
>> not residing in a traditional database.
>> Avatica is a sub-project within Calcite and provides a framework for
>> building
>> local and remote JDBC and ODBC database drivers. Avatica has an independent
>> release schedule and its own repository.
>> ## Project Status:
>> There are no issues requiring board attention.
>> ## Membership Data:
>> Apache Calcite was founded 2015-10-22 (9 years ago)
>> There are currently 74 committers and 28 PMC members in this project.
>> The Committer-to-PMC ratio is roughly 5:2.
>> Community changes, past quarter:
>> - No new PMC members. Last addition was Sergey Nuyanzin on 2024-03-05.
>> - No new committers. Last addition was Hongyu Guo on 2023-11-03.
>> ## Project Activity:
>> Apache Calcite Avatica 1.25.0 was released on 2024-04-05. It is a routine
>> release featuring support for JDK 21, Gradle 8.5 and several bug fixes.
>> Apache Calcite 1.37.0 was released on 2024-05-04. It contains contributions
>> from 46 contributors, and resolves 138 issues. It introduces Apache Arrow
>> adapter and StarRocks dialect, adds support for lambda expressions in SQL and
>> ‘Must-filter’ columns. For table function calls it is now possible to use 
>> them
>> without TABLE() wrapper in FROM. Furthermore, there is support for optional
>> FORMAT of CAST operator from SQL:2016 and more than 15 new SQL functions in
>> various libraries such as BigQuery, PostgreSQL and Spark.
>> Besides releases, it's worth to mention that Julian Hyde presented "Measures
>> in SQL" both in SF Distributed Systems Meetup in downtown SF on Wednesday May
>> 22nd, and SIGMOD 2024 conference in Santiago, Chile, which has been
>> implemented in Calcite.
>> ## Community Health:
>> The community maintains a healthy status, previously it's super healthy. The
>> reason is that we did not invite new committers for more than 6 months. There
>> are a few new faces in the community, hopefully we'll invite new committers 
>> in
>> the near future.
>> Most of the statistics slightly decreased compared to last quarter (dev@
>> decreased by 39%, issues@ decreased by 22%, commits decreased by 31%, code
>> contributors decreased by 32%, active reviewers decreased by 36.8%). The
>> reason I can see is there is less activity in supporting new SQL functions in
>> various libraries recently, which is much more active in previous quarters.
>> The number of non-committer (contributor) commits per month:
>> +--+---+-+
>> | year | month | contributor_commits |
>> +--+---+-+
>> | 2024 | 4 |  13 |
>> | 2024 | 5 |  11 |
>> | 2024 | 6 |  15 |
>> +--+---+-+
>> The number of active reviewers per month:
>> +--+---+--+
>> | year | month | active_reviewers |
>> +--+---+------+
>> | 2024 | 4 |5 |
>> | 2024 | 5 |4 |
>> | 2024 | 6 |3 |
>> +--+---+--+
>> Top reviewers in the last 3 months:
>> +--+-+
>> | committer| reviews |
>> +--+-+
>> | Mihai Budiu  |  19 |
>> | NobiGo  |   9 |
>> | Julian Hyde|   3 |
>> +--+-+
>> Best,
>> Benchao Li


[jira] [Created] (CALCITE-6452) Scalar sub-query that uses IS NOT DISTINCT FROM returns incorrect result

2024-07-01 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6452:


 Summary: Scalar sub-query that uses IS NOT DISTINCT FROM returns 
incorrect result
 Key: CALCITE-6452
 URL: https://issues.apache.org/jira/browse/CALCITE-6452
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


Scalar sub-query that uses an {{IS NOT DISTINCT FROM}} condition returns 
incorrect result. For example,
{code:java}
select e.ename,
  (select count(*)
from emp as f
where f.comm is not distinct from e.comm) as c
from emp as e{code}
returns
{noformat}
+++
| ENAME  | C  |
+++
| ADAMS  |  0 |
| ALLEN  |  1 |
| BLAKE  |  0 |
| CLARK  |  0 |
| FORD   |  0 |
| JAMES  |  0 |
| JONES  |  0 |
| KING   |  0 |
| MARTIN |  1 |
| MILLER |  0 |
| SCOTT  |  0 |
| SMITH  |  0 |
| TURNER |  1 |
| WARD   |  1 |
+++
(14 rows)
{noformat}
but should return
{noformat}
+++
| ENAME  | C  |
+++
| ADAMS  |  0 |
| ALLEN  | 10 |
| BLAKE  |  0 |
| CLARK  |  0 |
| FORD   |  0 |
| JAMES  |  0 |
| JONES  |  0 |
| KING   |  0 |
| MARTIN | 10 |
| MILLER |  0 |
| SCOTT  |  0 |
| SMITH  |  0 |
| TURNER | 10 |
| WARD   | 10 |
+++
(14 rows)
{noformat}
Also, and perhaps related, if we add a {{WHERE}} to the above query, like this:
{code:java}
select e.ename,
  (select count(*)
from emp as f
where f.comm is not distinct from e.comm) as c
from emp as e
where e.deptno = 10{code}
Calcite throws:
{noformat}
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 6 out of bounds for 
length 5
>   at 
> com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:77)
>   at org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:1037)
>   at 
> org.apache.calcite.sql2rel.RelDecorrelator$AdjustProjectForCountAggregateRule.lambda$onMatch2$4(RelDecorrelator.java:2679)
{noformat}

I ran into this error while trying to rewrite queries that had measures and 
used {{ROLLUP}}. (Since {{ROLLUP}} will generate group keys whose value is 
NULL, we cannot use regular {{=}} when doing a self-join.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


AS MEASURE

2024-06-30 Thread Julian Hyde
I've added a pull request that supports the AS MEASURE construct so
that you can define measures in a subquery or view (and then use them
in an enclosing query).

Can people please review https://github.com/apache/calcite/pull/3837.

Not everything works yet (I describe future changes in
https://issues.apache.org/jira/browse/CALCITE-4496) but a lot of
things do. I would be especially interested in a discussion of how to
implement the rewrites for measures - by means of planner rules,
metadata (RelMdMeasure), or some combination.

You can find the theory in our paper,
https://dl.acm.org/doi/pdf/10.1145/3626246.3653374.

Julian


Re: SqlParser.config applied to model lattices but not views

2024-06-25 Thread Julian Hyde
If you’ve encountered a bug (as seems likely) can you please log it in Jira.

> On Jun 24, 2024, at 3:07 PM, Hugh Pearse  wrote:
> 
> That makes sense.
> Calcite calls the parse function twice, first time for model parsing during
> startup, second time for ad-hoc query parsing. Calcite should use a
> normalized version of the schema/table/column names to match up the query
> to the schema.
> 
> My example is case in-sensitive for both model and query parsing, but that
> is a separate issue (maybe I made a mistake somewhere).
> 
> Regardless, the main issue here is that the "star" table has a different
> case than the "v_sales_fact_1997" view in the same model schema, even
> though they both should have gone through an identical parsing process when
> the model was loaded, with all the same rules applied.
> 
> From,
> Hugh Pearse



Re: Preserving alias/Frames in RelBuilder when extending RelBuilder?

2024-06-24 Thread Julian Hyde
I don’t want to encourage people to subclass RelBuilder. It is not designed to 
be extended by subclassing.

Having said that, it seems reasonable to provide a way to find the table 
aliases.

In https://issues.apache.org/jira/browse/CALCITE-4946 I added the method 
RelBuilder.size(); I was surprised that it was necessary, but it made a few 
things easier. A method that returns an immutable list of the N table aliases 
of the N stack entries would seem to complement size().

Julian


> On Jun 20, 2024, at 3:40 PM, Ian Bertolacci 
>  wrote:
> 
> Hello,
> We have a class which extends RelBuilder to add some extra functionality, but 
> we are having issues with aliases being erased when popping the builder 
> stack, and there isn’t much recourse since the Frame stack is all private.
> 
> For example, this is valid:
> ```
> relBuilder.scan("T101").empty().field(1, "T101", "C0_7009")
> ```
> Because the `RelBuilder.empty` implementation copies the alias in the Frame 
> from the top of the Frame stack before the `empty` call, and uses it as the 
> name in the frame created *by* the `empty` call.
> 
> But people extending RelBuilder (us) cannot access or modify those private 
> members.
> So we have no way to replace the node at the top of the stack without  
> eliminating the alias, and making referencing that alias illegal.
> 
> For example, lets say we overrode the `empty()` method do effectively replace 
> the node that would be created by the base `empty()` implementation
> ```
> @Override
> public RelBuilder empty(){
>  super.empty();
>  // pop result from stack to maintain proper stack arrangement
>  RelNode oldEmptyNode = this.build();
>  // and make new node
>  RelNode newEmptyNode = createNewEmptyNode( oldEmptyNode );
>  // push new node on the stack
>  this.push(newEmptyNode);
> }
> ```
> By popping the stack, we erase the alias information, so calling `field` with 
> an alias after `empty`throws an exception, since after the empty call 
> (specifically the `build` call) the alias in the Frame no longer exists.
> 
> The only solution to this would be to call `.as` immediately after the `push` 
> call to restore the alias:
> ```
> @Override
> public RelBuilder empty(){
>  super.empty();
>  // get the current alias
>  String currentAlias = ???;
>  // pop result from stack to maintain proper stack arrangement
>  RelNode oldEmptyNode = this.build();
>  // and make new node
>  RelNode newEmptyNode = createNewEmptyNode( oldEmptyNode );
>  // push new node on the stack
>  this.push(newEmptyNode);
>  // re-use the alias that existed before popping the node
>  this.alias(currentAlias);
> }
> ```
> But there doesn’t seem to be an existing way to ask what the *current* alias 
> is, so that it can be used in the `.as` call.
> As it stands, it appears that it is not possible to add extension to 
> RelBuilder which either (a) cannot rely on an existing method to set-up the 
> Frame or (b) would cause the frame to be popped.
> 
> Is that correct? Or is there actually a way to see what the top Frame’s alias 
> is without modifying the core RelBuilder code?
> Thanks!
> -Ian Bertolacci



Re: About CommunityOverCode Asia 2024 Share on calcite

2024-06-23 Thread Julian Hyde
Do you know the story of the blind men and the elephant? [1] Each man touches a 
different part of the elephant, so they assume they are touching a different 
animal.

Well, Calcite is like that elephant. We all experience different parts of it. 
(I don’t experience Calcite in production very much, either.)

I’m very pleased that you are going to CommunityOverCode Asia. I think you 
should talk about what *you* know of Calcite, and what *you* find interesting 
in Calcite, and other people at the conference — even if they are experienced 
Calcite users — will learn from you.

Julian

[1] https://en.wikipedia.org/wiki/Blind_men_and_an_elephant

> On Jun 18, 2024, at 9:50 AM, Cancai Cai  wrote:
> 
> Hello, everyone in the calcite community.
> 
> CommunityOverCode Asia 2024 will be held in Hangzhou, China at the end of
> July, and I am glad that I can go there to share some of my open source
> experience.
> 
> I plan to share what I have learned from streampark and calcite.
> 
> Unfortunately, I rarely use calcite at work, so I can't share calcite
> production practices, and I may be a novice in terms of calcite code
> familiarity, so I am struggling to share which interesting features of
> calcite I should share, such as the adaptation of arrow.
> 
> If there are any interesting new features that calcite is currently working
> on, please tell me, I am very willing to learn, and then share and discuss
> at CommunityOverCode Asia 2024
> 
> Best wishes,
> Cancai Cai



Function library for Amazon Redshift

2024-06-21 Thread Julian Hyde
Case https://issues.apache.org/jira/browse/CALCITE-6444 proposes a
library for Redshift. This would be welcome, but I see a potential
problem.

Redshift and Postgres are so similar that almost every function in
Postgres is also in Redshift. I'd rather not have to log bugs every
time someone adds a function in Postgres and forgets to add it to
Redshift. Would it be better if, when a function is in the Postgres
library, it is included in the Redshift library *unless explicitly
excluded* (using the 'exceptLibraries' property I propose in the
case)?

Julian


Re: SqlParser.config applied to model lattices but not views

2024-06-21 Thread Julian Hyde
Oops. I should have read your message more thoroughly.

It looks as if views are correctly using the ‘internal connection’ and lattices 
are not but should be.

> On Jun 21, 2024, at 3:52 PM, Julian Hyde  wrote:
> 
> It’s possible that the behavior you see is correct. A model should be correct 
> (or incorrect) for all users of the model. Therefore, the model should not be 
> parsed using the connection parameters (e.g. case-sensitivity of identifiers) 
> of the particular connection that uses the model, but rather some constant 
> connection parameters.
> 
> I don’t recall where the parameters of the ‘internal connection’ are stored.
> 
> This applies to views as well as lattices, by the way. 
> 
> Julian
> 
> 
>> On Jun 21, 2024, at 12:02 PM, Hugh Pearse  wrote:
>> 
>> The issue can be traced to here:
>> https://github.com/apache/calcite/blob/calcite-1.36.0/core/src/main/java/org/apache/calcite/prepare/CalcitePrepareImpl.java#L200-L203
>> 
>> Which is caused by using:
>> https://github.com/apache/calcite/blob/calcite-1.36.0/core/src/main/java/org/apache/calcite/prepare/CalcitePrepareImpl.java#L375-L378
>> https://github.com/apache/calcite/blob/calcite-1.36.0/core/src/main/java/org/apache/calcite/prepare/CalcitePrepareImpl.java#L396-L399
>> 
>> 
>> 
>> From,
>> Hugh Pearse
> 



Re: SqlParser.config applied to model lattices but not views

2024-06-21 Thread Julian Hyde
It’s possible that the behavior you see is correct. A model should be correct 
(or incorrect) for all users of the model. Therefore, the model should not be 
parsed using the connection parameters (e.g. case-sensitivity of identifiers) 
of the particular connection that uses the model, but rather some constant 
connection parameters.

I don’t recall where the parameters of the ‘internal connection’ are stored.

This applies to views as well as lattices, by the way. 

Julian


> On Jun 21, 2024, at 12:02 PM, Hugh Pearse  wrote:
> 
> The issue can be traced to here:
> https://github.com/apache/calcite/blob/calcite-1.36.0/core/src/main/java/org/apache/calcite/prepare/CalcitePrepareImpl.java#L200-L203
> 
> Which is caused by using:
> https://github.com/apache/calcite/blob/calcite-1.36.0/core/src/main/java/org/apache/calcite/prepare/CalcitePrepareImpl.java#L375-L378
> https://github.com/apache/calcite/blob/calcite-1.36.0/core/src/main/java/org/apache/calcite/prepare/CalcitePrepareImpl.java#L396-L399
> 
> 
> 
> From,
> Hugh Pearse



[jira] [Created] (CALCITE-6443) Create view based on LookML model

2024-06-20 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6443:


 Summary: Create view based on LookML model
 Key: CALCITE-6443
 URL: https://issues.apache.org/jira/browse/CALCITE-6443
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


Create a view based on LookML model (more specifically, based on an explore in 
a LookML model).

Note: I have no plans to implement this in open source, but it's helpful to set 
out the requirements.

For example, the following statement using the LOOKML_VIEW table function
{code}
CREATE VIEW OrdersCube AS
SELECT *
FROM TABLE LOOKML_VIEW('orders', '
model: model_name {
  view: orders {
sql: SELECT * FROM orders ;;
dimension: id {
  primary_key: yes
  type: number
}
dimension_group: created {
  type: time
  timeframes: [time, date, week, month]
  sql: created_at;;
}
dimension: amount {
  type: number
  value_format: “0.00”
}
measure: count
measure: total_amount {
  type: sum
  sql: amount ;;
}
  }
  view: customers {
sql_table_name: customers
label: “Customer”
dimension: id {
  primary_key: yes
}
dimension: zipcode
dimension: state
  }
  explore: orders {
from: orders
join: customers {
  sql_on: customers.id = orders.customer_id ;;
}
  }
}');
{code}
is equivalent to the following:
{code}
CREATE VIEW OrdersCube AS
SELECT *
FROM (
  SELECT `orders.id`, // PK
  `orders.customer_id`,
  TIME(created_at) AS `orders.created_time`,
  DATE(created_at) AS `orders.created_date`,
  WEEK(created_at) AS `orders.created_week`,
  MONTH(created_at) AS `orders.created_month`,
  amount AS `orders.amount`, // value_format: “0.00”
  COUNT(*) AS MEASURE `orders.count`,
  SUM(amount) AS MEASURE `orders.total_amount`
  FROM orders) AS orders
JOIN (
  SELECT id AS `customers.id`, // PK
  zip_code AS `customers.zip_code`,
  state AS `customers.state`
  FROM customers) AS customers // label: “Customer”
ON `customers.id` = `orders.customer_id`;
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6441) Add BOOLAGG_AND, BOOLAGG_OR aggregate functions (enabled in Snowflake library)

2024-06-17 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6441:


 Summary: Add BOOLAGG_AND, BOOLAGG_OR aggregate functions (enabled 
in Snowflake library)
 Key: CALCITE-6441
 URL: https://issues.apache.org/jira/browse/CALCITE-6441
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


Add BOOLAGG_AND, BOOLAGG_OR aggregate functions. These functions are the 
Snowflake equivalent to the standard EVERY and SOME aggregate functions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


SIGMOD

2024-06-12 Thread Julian Hyde
I'm at the SIGMOD 2024 conference this week in Santiago, Chile,
presenting a paper "Measures in SQL", which I co-authored with John
Fremlin from Google [1][2][3]. It has received a lot of interest &
discussion.

If you're in the Calcite community (contributor or just interested
bystander) and are in Santiago please let me know -- it would be great
to meet in person.

Julian

[1] Poster: 
https://github.com/julianhyde/share/blob/main/slides/measures-sigmod-2024-poster.pdf
[2] Case: https://issues.apache.org/jira/browse/CALCITE-4496
[3] Paper: https://doi.org/10.1145/3626246.3653374


Re: [Question] Long Runtime for SubqueryRemoveRule, vs Expanding Subqueries

2024-06-12 Thread Julian Hyde
It’s possible thst you’re running into a performance bug - such as a loop that 
is O[n ^ 2) in the number of SqlNodes - that has never been noticed before. 

Can you do some investigation? Say, double the size or depth of the query and 
see if the running time more than doubles. Or run the conversion repeatedly, 
send sig-kill to your JVM to generate stack traces, and see whether it is often 
inside the same piece of code. 

If those approaches yield results, log a bug. We would rather fix the bug, and 
have people use SubqueryRemoveRule, rather than turning on isExpand. 

Julian

> On Jun 11, 2024, at 7:28 PM, JinxTheKid  wrote:
> 
> Hi Calcite community,
> 
> I'm running into some challenges using SubqueryRemoveRule and was hoping
> the community could help me understand any gaps in my understanding.
> 
> I am attempting to remove subqueries from row expressions in a relational
> plan. My motivation for doing this is that I want to call metadata queries
> such GetColumnOrigins on my relational plan, but these tools don't seem to
> work unless subqueries are removed. To remove subqueries, I am...
> - Using SqlToRelConverter to convert from SqlNode to RelRoot.
> - After obtaining the RelRoot, using a HepProgram with subquery remove
> rules to remove the subqueries.
> 
> For small programs, this process works fine, however when testing against
> larger queries, ~2MB, the process of removing subqueries takes about as
> much time as converting a SqlNode to a RelRoot, effectively doubling the
> time to analyze a query. I've tried using a deprecated setting,
> SqlToRelConverter.Config.isExpand, which seems to provide results similar
> to what I want with minimal performance impact.
> 
> Since SqlToRelConverter.Config.isExpand is deprecated, I assume that the
> non-deprecated way to expand subqueries would be as performant or better,
> but I am experiencing a worse performance. This leads me to believe that
> I'm using the APIs incorrectly, but I'm not sure what I am missing.
> 
> Q: Is there a more efficient way in Calcite to remove subqueries than what
> I've described above? Is there risk to using
> SqlToRelConverter.Config.isExpand even if it's deprecated?
> 
> Q: Are there any other resources besides Java docs and GitHub projects I
> could refer to to learn more about my problem?
> 
> Thanks for any help,
> Logan


Re: Questions about numeric type conversion

2024-06-09 Thread Julian Hyde
I believe that there are many overlapping problems, many of which are that each 
DBMS has its own semantics.

In order to emulate a particular DBMS, we need to understand its semantics, 
document that semantics by writing tests, and then follow that semantics (if 
Calcite is in a mode that should follow those semantics).

In my opinion, some DBMS overuse binary floating point types (which are often 
lossy when converting integer or decimal literals). Ideally Calcite would never 
convert to a binary floating point unless specifically asked to do so. But if 
we are emulating another DBMS, maybe we would need to do that.

By default, Calcite’s numeric literals are arbitrary-precision decimal values. 
But some recent threads/issues have convinced me that we might need arbitrary 
precision binary numeric literals in some cases (e.g. after applying constant 
reduction).



> On Jun 9, 2024, at 12:21 PM, Cancai Cai  wrote:
> 
> Hello calcite community,
> I would like to ask a question. Actually, I am not sure whether it is
> appropriate to ask this question in the calcite community. It is mainly
> about the database's handling of numeric types.
> 
> [10:53:45]TiDB root:test> explain select id = 8145308033243873280 from test;
> +-+--+---+--++
> | id  | estRows  | task  | access object|
> operator info  |
> +-+--+---+--++
> | Projection_3| 1.00 | root  |  |
> eq(cast(test.test.id, double BINARY), 8.145308033243873e+18)->Column#3 |
> | └─IndexReader_7 | 1.00 | root  |  |
> index:IndexFullScan_6  |
> |   └─IndexFullScan_6 | 1.00 | cop[tikv] | table:test, index:id(id) |
> keep order:false, stats:pseudo |
> +-+--+---+--++
> 3 rows in set
> Time: 0.004s
> [10:54:08]TiDB root:test> select cast('8145308033243873281' as double) ;
> +---+
> | cast('8145308033243873281' as double) |
> +---+
> | 8.145308033243873e+18 |
> +---+
> 1 row in set
> Time: 0.008s
> [10:54:15]TiDB root:test> select cast('8145308033243873280' as double) ;
> +---+
> | cast('8145308033243873280' as double) |
> +---+
> | 8.145308033243873e+18 |
> +---+
> 1 row in set
> Time: 0.003s
> 
> As shown above, some numeric types are converted to double type by default
> in the underlying database, but this is not compatible with bigdecimal and
> decimal types. Why is this?
> 
> These behaviors exist in both MySQL
> (I put the
> above example into mysql and it is also reproduced) and Spark. For example,
> Spark's map_contains_key function also converts numeric types to double for
> comparison.
> 
> I am not a database developer, I am curious about where the difficulty is
> here, if someone can tell me, I will be very grateful and happy that I will
> learn something new.
> 
> Best wishes,
> Cancai Cai



[jira] [Created] (CALCITE-6425) Attributes

2024-06-02 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-6425:


 Summary: Attributes
 Key: CALCITE-6425
 URL: https://issues.apache.org/jira/browse/CALCITE-6425
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


Add operators and metadata so that columns can have attributes.

For example, we wish to associate a FORMATTED_VALUE attribute to the SALARY 
column. The following example defines an attribute in a CTE and then uses it in 
a query.
{code:java}
CREATE VIEW EnhancedEmp AS
  SELECT deptno, ename,
  DEFINE_ATTRIBUTE(
DEFINE_ATTRIBUTE(revenue, 'FORMATTED_VALUE',
CAST(revenue AS TO_CHAR(revenue, 'L99D99')),
'SORT_KEY', revenue)
  FROM Emp);

SELECT ename,
sal,
GET_ATTRIBUTE(sal, 'FORMATTED_VALUE') AS formatted_sal,
DESCRIBE_ATTRIBUTES(sal) AS attributes
FROM EnhancedEmp
WHERE ename = 'SCOTT';

ENAME  SAL FORMATTED_SAL ATTRIBUTES
=  = ==
SCOTT 1200 $1200.00. FORMATTED_VALUE: VARCHAR, SORT_KEY: NUMBER

SELECT deptno,
  SUM(sal) AS sum_sal,
  GET_ATTRIBUTE(SUM(sal), 'FORMATTED_VALUE') AS formatted_sum_sal
FROM EnhancedEmp
WHERE ename = 'SCOTT';

DEPTNO SUM_SAL FORMATTED_SUM_SAL
== === =
105000 $5,000.00
203750 $3,750.00{code}
 

Here are the functions:
 * {{DEFINE_ATTRIBUTE(targetExpression, attributeName, expression)}} defines an 
attribute on targetExpression; the return value is the same but has an extra 
attribute
 * {{GET_ATTRIBUTE(targetExpression, attributeName)}} evaluates an attribute of 
an expression
 * {{DESCRIBE_ATTRIBUTES(targetExpression)}} returns a string describing the 
names and types of available attributes

What is the goal of this facility? To be able to attach "semantic" metadata, 
such as format string, formatted value, sort key, to values in a way that can 
easily be transmitted over JDBC.

An alternative approach would have been to convert values into records. But the 
consuming expression and JDBC would have to deal with those records. In the 
proposed approach, you can find what attributes are available as part of an 
'extended type' using {{DESCRIBE_ATTRIBUTES}}.

Also required is a set of rules for how particular attributes propagate through 
queries and expressions. For example, the {{FORMATTED_VALUE}} attribute 
propagates through subquery ({{SELECT}}), filter ({{WHERE}}), aggregate 
({{GROUP BY}}), and through the {{SUM}} aggregate function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: SqlParser Lexical State

2024-06-02 Thread Julian Hyde
It is not supposed to work for Quoting.SINGLE. 

I don’t know any SQL dialect where identifiers (table and column names and 
such) are enclosed in single quotes (apostrophes) so why should we support 
that? 

Julian 

> On Jun 2, 2024, at 3:59 PM, Yogendra Sharma  wrote:
> 
> Hello,
> 
> I am using Calcite 1.36; and I stumbled upon an issue and seeking some 
> explanation on the code below in Calcite:
> 
> While instantiating SqlParser, the constructor has following line:
> 
> parser.switchTo(SqlAbstractParserImpl.LexicalState.forConfig(config));
> 
> Essentially it tries to acquire an initial lexical state. However the code 
> written inside LexicalState does not handle all the possible values of 
> Quoting. For instance, it does not handle Quoting.SINGLE_QUOTE which results 
> in assertion error from below code.
> 
> 
>  /** Returns the corresponding parser state with the given configuration
>   * (in particular, quoting style). */
>  public static LexicalState forConfig(SqlParser.Config config) {
>switch (config.quoting()) {
>case BRACKET:
>  return DEFAULT;
>case DOUBLE_QUOTE:
>  return DQID;
>case BACK_TICK_BACKSLASH:
>  return BQID;
>case BACK_TICK:
>  if (config.conformance().allowHyphenInUnquotedTableName()
>  && config.charLiteralStyles().equals(
>  EnumSet.of(CharLiteralStyle.BQ_SINGLE,
>  CharLiteralStyle.BQ_DOUBLE))) {
>return BQID;
>  }
>  if (!config.conformance().allowHyphenInUnquotedTableName()
>  && config.charLiteralStyles().equals(
>  EnumSet.of(CharLiteralStyle.STANDARD))) {
>return BTID;
>  }
>  // fall through
>default:
>  throw new AssertionError(config);
>}
>  }
> }
> 
> So how is it supposed to work for Quoting values other than specified in 
> above switch case?
> 
> I am sure I am missing something basic here. Please help.
> 
> Thanks,
> Yogi
> 
> 
> 


Re: [Question] Left Semi Join and SqlToRelConverter

2024-05-31 Thread Julian Hyde
The Babel parser contains the stuff that some people want to be able
to parse (because it's in some other engine) but Calcite doesn't want
to implement.

On Wed, May 22, 2024 at 8:16 PM JinxTheKid  wrote:
>
> Hi all,
>
> I'm exploring extensions to Calcites parser and saw that the Babel parser
> impl has some nice things I want to work with, namely support for
> Spark-style LEFT_SEMI_JOINs syntax. Using this parser works for me, but I
> noticed that when using SqlToRelConverter, converting LEFT_SEMI_JOIN is not
> implemented. This seems strange to me, since there is both a
> JoinType.LEFT_SEMI_JOIN, and a RelJoinType.SEMI. I'm aware that I could
> always write the query using a different syntax (EXISTS, or IN), but was
> curious if there is a "Calcite way" of achieving my desired behavior before
> writing my own logic. Is there a way to convert queries with
> LEFT_SEMI_JOINs into a RelNode, and is this gap intentional?
>
> Thanks,
> Logan


Re: EnumerableTableScan array/multiset handling

2024-05-30 Thread Julian Hyde
Thanks for doing these experiments, Christian, and documenting what you found.

I think you’re running into the limitations of ReflectiveSchema. It works with 
POJOs (java classes with public fields) but hasn’t been tested for richer 
variations and therefore just doesn’t work. In many cases, it can be fixed (see 
https://issues.apache.org/jira/browse/CALCITE-6244 for example).

I’m uneasy about extending RelDataType to return JavaRowFormat. That seems to 
be introducing a physical property into a logical data type; also, it is 
surfacing the details of one particular adapter (enumerable). Maybe there is a 
way to surface this information via annotations or the java.sql.Wrapper 
interface without extending RelDataType.

Julian


> On May 24, 2024, at 11:33 AM, Christian Beikov  
> wrote:
> 
> Hello,
> 
> in my recent experiments I ran into some issues when trying to unnest an 
> array of struct.
> 
> The query is roughly this: select t.id, e.value1 from MyTable t, 
> unnest(t.structArray) e
> 
> EnumerableTableScan#fieldExpression will then try to generate code that 
> converts the value of the "structArray" column to a List, which is 
> where the problems start to arise. This code does not seem to be tested at 
> all, because currently generates a compile error, due to missing a cast to 
> "Iterable". It also assumes the data is already available in the 
> JavaRowFormat.CUSTOM representation, but AFAIU, it could be in any format.
> 
> When using RelRecordType for structs, regular struct columns seem to expect 
> JavaRowFormat.ARRAY, but struct arrays don't seem to behave the same way.
> 
> What is the expected data format that an enumerator should return for struct 
> arrays that are typed as RelRecordType?
> 
> To support formats per type it might be nice to allow specifying the 
> JavaRowFormat on RelDataType. Wdyt?
> 
> Also, is there a way to allow working with custom Java types for table/struct 
> rows? From looking at AbstractCursor#createAccessor, it seems the Aviatica 
> code currently only works with classes that expose public fields.
> 
> Regards,
> 
> Christian



Re: GROUP BY ordinals with expand star

2024-05-29 Thread Julian Hyde
OK, please log a jira case. Please describe cases where it should give an 
error. Describe any pertinent flags.

> On May 29, 2024, at 8:58 AM, Itiel Sadeh  
> wrote:
> 
> Thanks for the response Julian.
> 
> I agree that it is a pretty unuseful use case, but it is a valid SQL as far
> as I can tell. PostgreSQL does support it.
> Maybe Calcite shouldn't support it, but I think that the error message is a
> bit unintuitive in this case.
> 
> Thanks again,
> Itiel
> 
> On Tue, May 28, 2024 at 10:02 PM Julian Hyde  wrote:
> 
>> It doesn’t seem particularly useful to use “*” in a “GROUP BY” query. The
>> query will be invalid if you don’t include all the columns in the GROUP BY
>> clause.
>> 
>> (In a few databases, one of which is MySQL, the query “SELECT empno,
>> deptno, sal FROM emp GROUP BY empno” would be valid if “empno” is a primary
>> key and the DBMS recognizes functionally dependent columns. But Calcite
>> does not do this.)
>> 
>> Do any other DBMSs support “*” in “GROUP BY” queries?
>> 
>>> Is the issue above a bug?
>> 
>> No, it is not a bug. It may be a missing feature. This discussion will
>> decide whether it is desirable feature.
>> 
>>> If so, can we do the star expansion before the GROUP BY validation?
>>> And when can a select item is depends on the GROUP BY list?
>> 
>> 
>> Probably not. If you change the order of expansion, a lot of things that
>> used to work will stop working. I think that this fix will be tricky to
>> accomplish.
>> 
>> Julian
>> 
>> 
>>> On May 28, 2024, at 12:22 AM, Itiel Sadeh 
>> wrote:
>>> 
>>> Hello calcite team,
>>> 
>>> Consider the following:
>>> Let's say I have a table "t" with two columns:
>>> "CREATE TABLE t(x int, y int);"
>>> 
>>> Now, the following query will result in validation error:
>>> "SELECT * FROM t GROUP BY 1,2",
>>> But if I'm not using "*" it will work:
>>> "SELECT x, y FROM t GROUP BY 1,2"
>>> 
>>> The issue is that the group by validation happens before we do star
>>> expansion.
>>> Note that ORDER BY doesn't have the same issue, as the ORDER BY
>> validation
>>> happens after star expansion.
>>> 
>>> The star expansion is done inside `ValidateSelectList`, and there is a
>>> comment above stating that:
>>> 
>>>>   // Validate the SELECT clause late, because a select item might
>>>>   // depend on the GROUP BY list, or the window function might
>> reference
>>>>   // window name in the WINDOW clause etc.
>>>> 
>>> 
>>> Given that I have 3 questions: Is the issue above a bug?
>>> If so, can we do the star expansion before the GROUP BY validation?
>>> And when can a select item is depends on the GROUP BY list?
>>> 
>>> Thanks,
>>> Itiel
>> 
>> 



Re: GROUP BY ordinals with expand star

2024-05-28 Thread Julian Hyde
It doesn’t seem particularly useful to use “*” in a “GROUP BY” query. The query 
will be invalid if you don’t include all the columns in the GROUP BY clause.

(In a few databases, one of which is MySQL, the query “SELECT empno, deptno, 
sal FROM emp GROUP BY empno” would be valid if “empno” is a primary key and the 
DBMS recognizes functionally dependent columns. But Calcite does not do this.)

Do any other DBMSs support “*” in “GROUP BY” queries?

>  Is the issue above a bug?

No, it is not a bug. It may be a missing feature. This discussion will decide 
whether it is desirable feature.

> If so, can we do the star expansion before the GROUP BY validation?
> And when can a select item is depends on the GROUP BY list?


Probably not. If you change the order of expansion, a lot of things that used 
to work will stop working. I think that this fix will be tricky to accomplish.

Julian


> On May 28, 2024, at 12:22 AM, Itiel Sadeh  
> wrote:
> 
> Hello calcite team,
> 
> Consider the following:
> Let's say I have a table "t" with two columns:
> "CREATE TABLE t(x int, y int);"
> 
> Now, the following query will result in validation error:
> "SELECT * FROM t GROUP BY 1,2",
> But if I'm not using "*" it will work:
> "SELECT x, y FROM t GROUP BY 1,2"
> 
> The issue is that the group by validation happens before we do star
> expansion.
> Note that ORDER BY doesn't have the same issue, as the ORDER BY validation
> happens after star expansion.
> 
> The star expansion is done inside `ValidateSelectList`, and there is a
> comment above stating that:
> 
>>// Validate the SELECT clause late, because a select item might
>>// depend on the GROUP BY list, or the window function might reference
>>// window name in the WINDOW clause etc.
>> 
> 
> Given that I have 3 questions: Is the issue above a bug?
> If so, can we do the star expansion before the GROUP BY validation?
> And when can a select item is depends on the GROUP BY list?
> 
> Thanks,
> Itiel



Re: CoreFilters.FILTER_REDUCE_EXPRESSIONS issue

2024-05-21 Thread Julian Hyde
You say that

SELECT code, CASE WHEN code = 'test_val' THEN ABS(val) ELSE NULL END AS absval
 FROM TEST
   WHERE CASE WHEN code = 'test_val' THEN ABS(val) > 0 ELSE FALSE END

isn't valid SQL. What's wrong with it? It looks valid and correct to
me. (I acknowledge that it's neither pretty nor optimal.)

To make the query more pretty and optimal, it would be nice if Calcite
did a couple of simplifications:

1. Simplify CASE to AND.  "CASE WHEN code = 'test_val' THEN ABS(val) >
0 ELSE FALSE END" --> "code = 'test_val' AND ABS(val) > 0". This
rewrite is valid only if the expressions are pure, can't throw, and
NULL values don't matter, all true in this case.

2. Reduce constants. "CASE WHEN code = 'test_val' THEN ABS(val) ELSE
NULL END AS absval" becomes "ABS(val) AS absval" because the WHERE
clause has ensured that "code = 'test_val'" is always true.

I believe there's a rule for 2 but not for 1.

Julian

On Tue, May 21, 2024 at 11:18 AM  wrote:
>
> I'm having an issue with CoreFilters.FILTER_REDUCE_EXPRESSIONS.  I'm 
> wondering if this is a problem with the rule, or with how I've set up my 
> logical tree.
>
> Conceptually I am trying to use the equivalent of a computed column in a 
> WHERE.  Since this isn't legal:
>
>SELECT code, CASE WHEN code = 'test_val' THEN ABS(val) ELSE NULL END AS 
> absval
> FROM TEST
>   WHERE absval > 0
>
> I need to move that same CASE logic into the WHERE clause, so:
>
>SELECT code, CASE WHEN code = 'test_val' THEN ABS(val) ELSE NULL END AS 
> absval
> FROM TEST
>  WHERE CASE WHEN code = 'test_val' THEN ABS(val) ELSE NULL END > 0
>
> I'm able to set up my logical RelNode tree and generate exactly this SQL 
> directly from the logical tree.  The builder logic has:
>
>RelNode relNode = builder
>.projectPlus(builder.alias(caseNode, "ABSVAL"))
>.filter(builder.greaterThan(caseNode, builder.literal(0)))
>.build();
>
> However, when I try to generate a physical tree from this, I can see the 
> FILTER_REDUCE_EXPRESSIONS rule coerses the CASE statement in the filter into 
> a BOOLEAN expression, and it produces:
>
>SELECT code, CASE WHEN code = 'test_val' THEN ABS(val) ELSE NULL END AS 
> absval
>  FROM TEST
>WHERE CASE WHEN code = 'test_val' THEN ABS(val) > 0 ELSE FALSE END
>
> Which isn't even valid SQL.   If I remove that rule from the planner 
> altogether:
>
>planner.removeRule(FILTER_REDUCE_EXPRESSIONS)
>
> The SQL looks correct (effectively the same as that translated directly from 
> the logical tree above.)


Re: Apache at Visa Summit

2024-05-14 Thread Julian Hyde
I shall also be speaking at the SF Distributed Systems Meetup in
downtown SF on Wednesday May 22nd [3]. This talk will be on a
different topic, SQL with Measures. (A preview of the talk that I
shall give at SIGMOD in Santiago, Chile, in June.)

SF Distributed Systems is a new meetup that I have helped to set up.
We plan to have talks at the intersection of distributed systems and
data, focusing on nerdy approaches like declarative languages, and
with speakers from both industry and academia. Please help spread the
word.

Julian

[3] https://lu.ma/t6r4mi4v

On Tue, May 14, 2024 at 1:22 PM Julian Hyde  wrote:
>
> Visa is a major sponsor of the ASF and is hosting a one-day conference
> in Foster City, CA, on Thursday (May 16th) to showcase Apache projects
> [1]. I am giving a talk "Open Source Software Innovation in Advanced
> Databases and Analytics" [2].
>
> I believe registration is free for ASF committers, and there are still
> spaces available. If you are in the Bay Area, please consider joining.
> It should be a good chance to meet other ASF people.
>
> Julian
>
> [1] https://web.cvent.com/event/d6a7ec3c-7363-40c7-ad7e-765f95d32acb/summary
> [2] 
> https://web.cvent.com/event/d6a7ec3c-7363-40c7-ad7e-765f95d32acb/websitePage:645d57e4-75eb-4769-b2c0-f201a0bfc6ce


  1   2   3   4   5   6   7   8   9   10   >